Batch Queuing System

A Batch Queuing System (BQS) manages all of the shared compute resources on a cluster. It allocates processors (and sometimes memory and other resources) to specific jobs in a specific order based on Queuing policies. When you have a job to run on the cluster, you prepare a short script file, describing the resources you need, and the commands to run, then submit it to the BQS. The BQS will then launch the job as soon as the necessary resources are available, and collect the standard output and error output into files in the folder where the job was launched from.

The system we use (Maui/Torque) is a descendent of a very traditional BQS known as PBS (and later OpenPBS). Some variant of this system is used on a majority of the Linux clusters around the world.

Programs for Job Management

The main programs used to access the queuing systems are:

qsub <your job file>

Submit a new job

qdel <job id>

Kill or remove a queued job

qstat

Info on running jobs

qstat -a

Info on ALL jobs in the queue

qstat -an

All jobs with node allocation information

qstat -q

List all available queues

pbsnodes -a

List status of all system nodes

checkjob <job id>

Gives useful details about a job

showq

Another way of looking at the queues

Running jobs

This is the sequence of events that takes place when you run a job on the cluster:

  1. Run showq or qstat -a to see the current cluster load level and determine how many cores you will request for your job.

  2. Create a ".pbs" file containing your job (see below)
  3. qsub file.pbs -q queuename

  4. wait for your job to run
  5. showq or qstat -a can be used to monitor.

    • "Q" means the job is waiting to run
    • "R" means the job is running
    • "C" or "A" mean the job has failed for some reason
  6. Look at the file.pbs.eXXXXX and file.pbs.oXXXXX to see the output generated by your program

Making a .pbs File

The job file submitted with qsub looks like this:

#!/bin/sh

#PBS -N NAME_OF_THE_JOB
#PBS -l nodes=10:ppn=4      
#PBS -l walltime=24:00:00
#PBS -q longjob

# This job's temporary working directory. You may also work in any of your
# home directories
echo Working directory is $PBS_O_WORKDIR

echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This job runs on the following processors:
echo "PBS_NODEFILE=" $PBS_NODEFILE

cd <some other folder>
my_first_program
my_second_program
mpirun my_mpi_program

Interactive Jobs (eg - matlab)

Sometimes a user will want to run an interactive session on one of the cluster nodes, taking advantage of the large memory or large number of cores available on these machines. Matlab is one software package people commonly use this way. To do this, you still must use the batch queuing system, but you make a special request for an interactive session. Requesting such an interactive batch job is quite straightforward. For example to allocate all 16 cores on one node interactively:

qsub -I -l nodes=1:ppn=16 -q dque

After typing this command, the job will be queued. If a node is available immediately, you will get a terminal prompt directly on the node you've been allocated. If the cluster is busy, the command will sit and wait until a node is free, then return with a prompt when available.

Be warned, however, that with the example above, for instance, you will be charged for 16 CPU-hr for every hour you leave this shell running. If you accidentally forget to log out of this shell and go home for the weekend, you will come back monday having used ~1000 CPU-hr of your allocation. For this reason, you may wish to consider adding a :walltime=4:00:00 (for example) to automatically close the process (killing anything you might be doing) after 4 hours.

You can, of course, also just request a single processor with :ppn=1. If you do this, then you are responsible for insuring that your job uses only a single core. Some software may detect that there are 16 cores available on the machine and attempt to automatically use more. If you request all (16 on Prism, 12 on Torus) of the cores on a single node, then you are free to use the full RAM, and all of the cores, including hyperthreading on the node.

Hyperthreading

For jobs that can run in parallel using threads, you can gain an additional performance boost by taking advantage of hyperthreading. If the node has 16 physical compute cores, you can actually pull out 10-20% more performance from each one if you run, for example, 32 threads rather than 16 threads on that node. However, if you want to do this there are a few things you need to keep in mind: