Batch Queuing System
A Batch Queuing System (BQS) manages all of the shared compute resources on a cluster. It allocates processors (and sometimes memory and other resources) to specific jobs in a specific order based on Queuing policies. When you have a job to run on the cluster, you prepare a short script file, describing the resources you need, and the commands to run, then submit it to the BQS. The BQS will then launch the job as soon as the necessary resources are available, and collect the standard output and error output into files in the folder where the job was launched from.
The system we use (Maui/Torque) is a descendent of a very traditional BQS known as PBS (and later OpenPBS). Some variant of this system is used on a majority of the Linux clusters around the world.
The main programs used to access the queuing systems are:
qsub <your job file> |
Submit a new job |
qdel <job id> |
Kill or remove a queued job |
qstat |
Info on running jobs |
qstat -a |
Info on ALL jobs in the queue |
qstat -an |
All jobs with node allocation information |
qstat -q |
List all available queues |
pbsnodes -a |
List status of all system nodes |
checkjob <job id> |
Gives useful details about a job |
showq |
Another way of looking at the queues |
The job file submitted with qsub looks like this:
#!/bin/sh #PBS -N NAME_OF_THE_JOB #PBS -l nodes=10:ppn=4 #PBS -l walltime=24:00:00 #PBS -q longjob # This job's temporary working directory. You may also work in any of your # home directories echo Working directory is $PBS_O_WORKDIR echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` echo This job runs on the following processors: echo "PBS_NODEFILE=" $PBS_NODEFILE cd <some other folder> my_first_program my_second_program mpirun my_mpi_program
This example script asks for 10 nodes with 4 processors on each, for a total of 40 processors. Both CIBR clusters are equipped with 4 GB of RAM for each processor
my_first_program and my_second_program are the command-lines you would normally type if running a program interactively. These programs will be executed on the first processor you are assigned.