CIBRClusters/Introduction

Introduction to Clusters and CIBR Philosophy

Glossary & Basic Concepts

First, a little terminology. Here is the hierarchy of computing resources on a modern cluster (you may find some disagreement about some of these specific definitions):

Server - a single physical box mounted in a rack in the cluster
Node - One server often contains multiple independent computers, or nodes. A 'node' is an independently operating computer.
Processor or CPU - A physical microprocessor chip inserted into the motherboard on the computer. Most linux cluster nodes have 2 processors, though there are exceptions
Core - A semi-independent computing unit inside a processor. The cores within a single physical package do share some resources (like cache), but can largely be treated as independent. Modern processors typically have 2-12 cores.
Thread - A fairly recent concept. Each core can do several non-overlapping things at the same time. Sometimes some of these resources are idle due to conflicts between sequential operations. threads allow you to recover some of this wasted capacity by running 2 jobs on each core. On current generation processors, using this thread system can recoup as much as 10-20% of otherwise wasted capacity. However, using it is tricky. You will also often see vendors (like Amazon) advertising threads as if they were cores. This is patently false.

Non-CPU terms:

RAM - The memory on a single node. This memory is shared among all of the cores on the node, and often not explicitly allocated to specific tasks.
Swap - A concept used to avoid memory exhaustion on computers. When the RAM becomes full, the computer may temporarily store some information on the hard drive in specially allocated swap space. Note, however, that a typical hard drive can read/write information at ~150 MB/sec. RAM can read/write information at ~75,000 MB/sec. Forcing a modern cluster to swap is a very bad thing, and the CIBR clusters have little or no swap space available.
Infiniband - A high speed network used to help make many separate computers seem more like a single computer. Allows nodes to exchange information at up to 800MB/sec on our clusters(QDR single channel).
Gigabit - Normal network used between nodes (and in the computers in your lab), can transfer ~100 MB/sec. The connection into the cluster from the outside world is a Gigabit connection.
RAID - A type of disk storage combining multiple hard drives into a single larger 'virtual' drive which is both faster, and provides some safety against drive failure.
- RAID5 - With N hard drives, provides read performance improvement of up to N times, and provides N-1 drives worth of storage space. If any single hard drive fails at a time, no data is lost.
- RAID6 - Same as RAID5, but can tolerate 2 drive failing at once, and provides N-2 drives worth of space.
NFS - Networked FileSystem. Makes a hard drive/RAID on one computer available on other computers. Network communications may be a bottleneck, and the total bandwidth is shared among all computers.

CPU resources

If we consider the Prism cluster, for example, it contains (as of the time of writing) 11 servers, each of which has 4 nodes, each of which has 2 processors, each of which has 8 cores. 11*4*2*8 = 704 cores. Technically this could be interpreted as 1408 threads, but whereas 2 cores provide 2x performance, 2 threads on a single core typically provides ~1.1-1.2x performance, and can only be used in very specific situations.

Using Resources on a Cluster

Running jobs on a cluster is a little bit art and a little bit science. Let's start with a few example cases before generalizing:

Large scale simple sequence manipulation

Consider, for example, user X who is running some simple genome sequence manipulation software. The job doesn't require many CPU resources, but needs to read and write very large files, each of which is accessed only once or twice. In most cases, such jobs are I/O limited, and likely should not be run on a cluster at all. Why ? Before processing, you have to get the data into the cluster over the 100 MB/sec network connection. This is slower than the speed of even a single normal desktop hard drive. Once the data is on the cluster, running the job on multiple CPUs will not gain you any overall speedup because even a single CPU will just spend all of its time waiting for the data to be read from or written to disk. For such problems an in-lab workstation with a high-performance RAID is usually a much better solution.

Sequence alignments

This sort of task can arguably be run efficiently on a cluster, as long as the computation required is substantially more than the I/O requirements. With 64 GB of ram on a single node with 16 cores, large chunks of sequence could be read into RAM, and an efficient multi-threaded algorithm could make use of shared memory to process this data very efficiently, for example. However, optimizing this and finding exactly the right software to run optimally may take some effort.

Molecular dynamics/simulations

One of the tasks at which clusters excel. These tasks involve very little I/O, and are very compute-intensive. While there is a need...

Network and Storage

Our clusters are typically configured with QDR infiniband interconnects, meaning nodes can exchange data among each other at a rate of ~800 MB/sec. Note that the RAID6 storage has a single QDR connection, so the total file i/o bandwidth available to all of the nodes is ~800 MB/sec. While this number may seem higher than the ~150 MB/sec you can get from the scratch hard drive inside each node, it is shared among 44 nodes. If all nodes try to do file I/O at once, each will only have ~18 MB/sec of bandwidth available (10x slower than the scratch space).