Diff for "CIBRClusters/SlurmQueue"

Differences between revisions 10 and 11

General Policy

Please try to be considerate to other cluster users.
If the cluster is heavily occupied, we request that no single user have more than 240 cores (10 nodes) in the queue (running or waiting). The debug queue is an exception.
If the cluster has been running below capacity for 8 hours or more, you may submit additional jobs, up to an additional 240 cores, but only to the 'shortjob' queue (max runtime of 8 hours)
Please also try to insure that your jobs get distributed among as few nodes as possible. ie- if running many small jobs, try to group jobs with similar runtimes together.
You are not allowed to write batch scripts which submit additional jobs to the cluster. Small bugs in your scripts can easily lead to runaway behavior, and may result in your account being banned from the cluster.
Similarly, use of cron/at to submit jobs at a later time is also not permitted.
If you need to run a sequence of commands, with the second command running only after the first completes, simply put all of the commands into a single batch script and make sure the total runtime is sufficient.
Similarly, if you have, say, 480 jobs to run, simply put 2 commands in sequence in each of 240 batch scripts. This way all of the jobs will run and you won't over-utilize the cluster.
When using the 'bynode' queue, you may use the full RAM available on each node.
If running other jobs, you are responsible for insuring that each 'task' uses no more than 5 GB/core and runs only a single thread of execution. If you have a single-threaded process which requires (for example) 20 GB of RAM, you must request 4 CPUs and run only a single job.

Launching Jobs on the Clusters using SLURM

SLURM is a much more flexible queuing system than the previous Torque/Maui system (used on the other CIBR clusters). Some general tips to get you started:

Partition - this was called a Queue under the old system
Note that unlike the old system, where it was difficult to monitor jobs, STDOUT is written to slurm-jobid.txt and updated in real-time with SLURM, so you can see exactly what is going on.
sinfo -al - will give a detailed list of the available partitions
squeue -l - will give a detailed list of running and queued jobs
sbatch <script> - will submit a batch job to the system for scheduling

Available Queues (24 tasks/node):

Name	Max Time	Max Nodes	Notes
dque	3 days	1	For jobs using fewer than 24 cores. Multiple jobs allocated per node
bynode	2 days	10	For jobs using multiples of 24 cores. Whole nodes allocated to single job.
longjob	7 days	5	Like bynode, but for longer running jobs. Limited to fewer nodes. Lower priority.
interactive	12 hours	1	For interactive sessions. High priority on at most a single node
debug	30 minutes	20	For testing/debugging scripts or other problems only. Highest priority, but very short time limit

Slurm is a highly flexible system, and even permits you to have a single job which varies the number of processors it uses while it runs through a sequence of operations. See many good guides on the Web for information on advanced use of SLURM. Basic usage of SLURM is quite easy, so we provide a few simple examples here.

Here is a basic SLURM script for a single-core job (submit with sbatch):

#!/bin/sh
#SBATCH --time=1-10:15 -n1 -p dque

# The above options are for a single core job which will run
# for 1 day, 10 hours and 15 minutes in the default queue

cd /home/stevel
myprogram

This job will allocate 24 CPUs (1 node), in an exclusive fashion, so all 24 cores are one node. This allows you to use threading, and run (for example) 30 threads, if you like on this one node. It also gives you exclusive access to 124 GB of RAM (4 GB reserved for system use):

#!/bin/sh
#SBATCH --time=16:00:00 -n24 -p bynode

# A 16 hour job allocating 1 node. Use of the 'bynode' queue insures that you
# will have access to the entire node. If you have access to the entire node, you
# are free to use more than 24 threads on this node
cd /home/stevel/data

myfirstprogram
mysecondprogram

This job will run an MPI job on 72 processors across 3 nodes

#!/bin/sh
#SBATCH --time=4:00:00 -n72 -p bynode

cd /home/stevel/data/

# note that you do not need to pass any info to mpirun, SLURM will pass
# the node list and number of processes automatically
mpirun myprogram

To run an interactive job (that is, you get a shell-prompt on a node where you can run what you like)

srun --pty -p interactive bash

You can also specify time limits and other parameters. Note that this gives you access to only a single core on the node. If you need access to the entire node, use the 'bynode' queue instead of the 'interactive' queue.

If you are using EMAN2.1, it launches mpirun itself. The correct usage is simply:

#!/bin/sh
#SBATCH --time=8:00:00 -n48 -p bynode

cd /home/stevel/data

# note the --parallel=mpi specification matches the number of allocated processors,
# but --threads is set to 30, taking advantage of hyperthreading on the first node
# when appropriate.
e2refine_easy.py --input=sets/all-bad__ctf_flip.lst --model=initial_models/model_01_01.hdf --targetres=7.0 --speed=5 --sym=d7 --iter=3 --mass=800.0 --apix=1.07 --classkeep=0.9 --m3dkeep=0.8 --parallel=mpi:48:/scratch/stevel --threads=30 --classrefsf

-  ⇤ ← Revision 10 as of 2016-08-09 16:04:24 → 
  Size: 4756
  Editor: SteveLudtke
  Comment:
+   ← Revision 11 as of 2016-08-09 16:13:17 → ⇥
  Size: 5528
  Editor: SteveLudtke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
-Line 8:
+Line 9:
+ * If you need to run a sequence of commands, with the second command running only after the first completes, simply put all of the commands into a single batch script and make sure the total runtime is sufficient.
 * Similarly, if you have, say, 480 jobs to run, simply put 2 commands in sequence in each of 240 batch scripts. This way all of the jobs will run and you won't over-utilize the cluster.
 * When using the 'bynode' queue, you may use the full RAM available on each node. 
 * If running other jobs, you are responsible for insuring that each 'task' uses no more than 5 GB/core and runs only a single thread of execution. If you have a single-threaded process which requires (for example) 20 GB of RAM, you must request 4 CPUs and run only a single job.