The Budapest Quantum Optics Group
Search:
The Budapest Quantum Optics Group

Developing and Running Programs


Torque

To submit a job to the cluster, please use the Torque queue manager. The queue manager ensures that all nodes are used at any time, if there are requests, and that no node is running more than 1 job at any time.

The preferred way of using the manager is

  1. write a script for your job, example: script.pbs
  2. copy the script to somewhere on the cluster, for example scp script.pbs asboth@optics.szfki.kfki.hu:/h/kakas/h/asboth/
  3. log in to kakas using, e.g., ssh asboth@kakas.szfki.kfki.hu
  4. use qsub to the script to the cluster: qsub script.pbs
  5. once the job is finished, you will find the standard output in the file myjob.o1234, where 1234 is the numerical id of your job assigned by torque, and "myjob" is the name you assigned to the job in the script. The standard error is saved in the file myjob.e1234.

To share computational resources in a fair way, we prefer if you submit jobs that terminate in a short time (short meaning less than 12 hours). Submit short scripts to the "short" queue (see below for how): jobs in this queue can use any number of nodes, but will be terminated after 12 hours.

If you have a code that takes a long time to run, try to cut it up into shorter chunks. If that is not possible, you can submit it into the "long" queue (see below). A job in the "long" queue will have priority over a job in the "short" queue, as long as less than 60% of the nodes is running "long" jobs. This means, that whenever a node finishes a job, the job in the "long" queue will be submitted rather than the "short" job. However, if 60% of the nodes are already running "long" jobs, a job in the "long" queue has to wait until one of the "long" jobs is finished.

At the heart of the above process is the script. An example is here, including the decision to send the job to the "short" queue:

### Set the job name
##PBS -N myjob
### Run in the queue named "short", or "long". More on this later. 
#PBS -q short
### (optional): To send email when the job is completed:
##PBS -M your@email.address
### Specify the number of cpus for your job.  This example will allocate 4 cores
### using 2 processors on each of 2 nodes.
##PBS -l nodes=2:ppn=2
### Another example will allocate 8 cores on a single node
##PBS -l nodes=1:ppn=8
### This example allocate 16 cores on one and 8 cores on two machines
##PBS -l nodes=1:ppn=16+2:ppn=8
### For submitting a single "threaded" job on the poultry farm, use this resource spec
#PBS -l nodes=1:rmki
### (optional):  Tell PBS how much memory you expect to use. Use units of 'b','kb', 'mb' or 'gb'.
##PBS -l mem=256m
### (optional): Tell PBS the anticipated run-time for your job, where walltime=HH:MM:SS
##PBS -l walltime=1:00:00
### (optional): Tell PBS to use a specific node for your job
###PBS -l nodes=csirke
### Switch to the working directory; by default TORQUE launches processes
### from your home directory.
cd /h/kakas/h/asboth/topological_walk
### Display the job context
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
### Launch the actual program:
python python/2D.py -te 1 -t 20 -disp 0
### 
echo Job ended at `date`


The #PBS directives are read by the Torque system. The lines starting with ##PBS have been commented out.

A useful site for background information is here.

For techniques on submitting multiple processing jobs, see the Job Submission page of the TORQUE documentation. Notable possibilities are requesting multiple processors on a node (ppn), and the submission of array jobs. OpenMPI might be useful for making the most out of the latter.

Program development

GNU compilers

The standard compilers (gcc, g++, gfortran) are installed. By default all generate 64 bit code.

Python

Python2 and python3 is installed along with some common libs, such as numpy, scipy, matplotlib.

Installing more tools

There's no restriction policy on what tools can be installed. If you don't have admin rights, approach one to get a certain package installed. For tools that exist in the Debian repository the best practice for installation is via puppet, i.e. add to the configuration file /etc/puppet/code/environments/production/manifests/site/30_compnodes.pp


Page last modified on September 23, 2017, at 01:53 PM