Developing and Running Programs
The job management engine PBS Pro is installed to allow convenient computations. If you use the job manager you don't have to care about choosing the machines yourself (check which one is free etc...).
The preferred way of using the manager is the following:
- write a script for your job, example:
- copy the script to somewhere on the cluster, for example
scp myjob.sh firstname.lastname@example.org:/nfshome/asboth/, use sshfs or Dropbox etc.
- log in to bird using, e.g.,
- use qsub to submit the script to the cluster:
- once the job is finished, you will find the standard output in the file
myjobname.o1234, where 1234 is the numerical id of your job assigned by torque, and "myjobname" is the name you assigned to the job in the script. The standard error is saved in the file
- you must make sure inside your script that your program writes and reads data from suitable disk spaces: e.g. shared data on NFS (
/scratch), and local working data to the local
To share computational resources in a fair way, we prefer having jobs that terminate in a short time (short meaning less than 12 hours). These short jobs will have priority over long jobs. If you have a code that takes a long time to run, make it parallel or cut it up into shorter pieces.
Try to estimate well how long your job will run, and specify a walltime accordingly. If your walltime is less than 12h it'll be queued to the short queue, otherwise to the long queue. If your jobs exceeds the specified walltime your process will be terminated. Use
qstat -Qf to check default values on the queues.
The job submission script is just like an ordinary shell script, however, it may contain various settings on how to submit and run the job. An example is here, which sets the walltime to 1 hour. This is probably too short for any practical purpose so make sure you adjust it to your own needs:
### Set the job name (Optional: if not set the script file name is used) #PBS -N myjobname ### (optional): To send email when the job is completed: #:PBS -M email@example.com ### Specify the number of cpus for your job. This example will allocate 1 core #PBS -l select=1:ncpus=1 ### Another example will allocate 8 cores on a single node #:PBS -l select=1:ncpus=8 ### This example allocates 16 cores on one and 8 cores on two machines #:PBS -l select=1:ncpus=16+2:ncpu=8 ### (optional): Tell PBS how much memory you expect to use. Use units of 'b','kb', 'mb' or 'gb'. #:PBS -l select=1:ncpus=8:mem=256m ### Tell PBS the anticipated run-time for your job, where walltime=HH:MM:SS ### if walltime exceeds 12h your job will have a lower priority #PBS -l walltime=1:00:00 ### (optional): Tell PBS to use a specific node for your job #:PBS -l select=1:ncpus=8:host=csirke ### Switch to the working directory; by default PBS Pro launches processes ### from your home directory. cd /nfshome/asboth/topological_walk ### Display the job context echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` ### Launch the actual program: python python/2D.py -te 1 -t 20 -disp 0 ### echo Job ended at `date`
#PBS directives are read by the PBS Pro system. The lines starting with
#:PBS are commented out.
For the complete information see the PBS Professional user guide.
To run a job on multiple nodes OpenMPI might be useful.
The standard compilers (
gcc, g++, gfortran) are installed. By default all generate 64 bit code.
Python2 and python3 is installed along with some common libs, such as
numpy, scipy, matplotlib.
There's no restriction policy on what tools can be installed. If you don't have admin rights, approach one to get a certain package installed. For tools that exist in the Debian repository the best practice for installation is via puppet, i.e. add to the configuration file