Return to the CIBR Cluster Home Page

Introduction to Clusters and CIBR Philosophy

Glossary & Basic Concepts

First, a little terminology. Here is the hierarchy of computing resources on a modern cluster (you may find some disagreement about some of these specific definitions):

Non-CPU terms:

CPU resources

If we consider the Prism cluster, for example, it contains (as of the time of writing) 11 servers, each of which has 4 nodes, each of which has 2 processors, each of which has 8 cores. 11*4*2*8 = 704 cores. Technically this could be interpreted as 1408 threads, but whereas 2 cores provide 2x performance, 2 threads on a single core typically provides ~1.1-1.2x performance, and can only be used in very specific situations.

Network and Storage

Our clusters are typically configured with QDR infiniband interconnects, meaning nodes can exchange data among each other at a rate of ~800 MB/sec. Note that the RAID6 storage has a single QDR connection, so the total file i/o bandwidth available to all of the nodes is ~800 MB/sec. While this number may seem higher than the ~150 MB/sec you can get from the scratch hard drive inside each node, it is shared among 44 nodes. If all nodes try to do file I/O at once, each will only have ~18 MB/sec of bandwidth available (10x slower than the scratch space).

Memory (RAM)

We currently configure our clusters with 4GB of memory per core. This means a node on Prism has 64 GB of total RAM, shared among 16 cores.

Using Resources on a Cluster

Running jobs on a cluster is a little bit art and a little bit science. For any project you are considering running on a cluster, you should do the following pre-analysis:

Take the data size and divide by 100 MB/sec. How does this amount of time compare to the CPU time estimate you made. If the CPU time estimate is an order of magnitude or more larger than the number you got from the data, then your job may be well suited for a cluster. If your memory requirements are larger than 4GB/core, and your job does not support threaded parallelism, then you may need to consider a non-CIBR cluster where they have focused more $$$ on large memory configurations. If the project is small, we do have 4 specialized "bioinformatics nodes" on the Torus cluster which may suit your needs.

Again, compute clusters excel at running very CPU-intensive tasks with low file I/O requirements. Tasks such as molecular dynamics or other types of simulation are good examples of this. A task at the other extreme, such as searching a genome for possible primer sequences, is probably not something that should even be attempted on a cluster. Most tasks are somewhere between these extremes. For example, CryoEM single particle reconstruction does work with many GB of data, but the amount of processing required per byte of data is high enough that clusters can be efficiently used.

If your project is very data intensive, it may be worth considering an in-lab workstation configured with a high performance RAID array instead. Such a machine can be purchased for well under $10k, and (in 2014) can provide as much as ~1.5 GB/second read/write speeds. For rapidly processing large amounts of sequence data, machines like this can be much more time and cost-efficient than any sort of cluster.

If you have any questions about this for your specific project, please just email sludtke@bcm.edu and we can chat about it.