Threaded Parallelism in EMAN2

Most modern computers have at a minimum 2 'cores'. A decade ago this would have been called 2 CPUs, however, now you may still have 2 physical CPUs in your computer, but each may have multiple 'cores', effectively multiple CPUs in a single package. For that reason, we often make the distinction between the CPU (the physical package) and the number of cores (the effective number of CPUs). Typical computers in 2010 will have 4 cores, but then total number can be 12 (or even higher with some AMD configurations).

This page explains how to make use of multiple compute cores on your efficiently when running EMAN2 jobs. This parallelism mode is the most efficient by far, but will only work on a SINGLE computer at a time. If you need to make use of multiple computers or clusters, please see the main Parallelism page. For most jobs you should be able to achieve close to an N-fold speedup when using this mechanism (ie- a job using 4 cores will run close to 4x faster than a job running on one core), however, this will vary with the size of the job. In general the larger the size of the project, the more efficiently you will be able to make use of multiple cores.

Please make sure you are familiar with this warning

Quickstart

Programs with parallelism support will take the --parallel command line option as follows:

--parallel=thread:<n>

where <n> should be replaced by the number of cores you wish to use. That's it. Quite simple.

If using the project manager, any parallel boxes should contain thread:<N>. Any threads boxes should just contain the number of threads.

Large /tmp file problem

This will put (sometimes large) temporary files in /tmp. On some systems now /tmp is a ramdisk, which can cause real problems. You can use an alternate folder for these temporary files, but make sure they are on the local computer, not a remote filesystem shared among machines for the same account:

--parallel=thread:<n>:<tmp_path>

for example:

--parallel=thread:32:/home/stevel/tmp

The --threads option should not have this problem.

Details

As above, in essence all you need to do is say, for example:

--parallel=thread:4

to make use of 4 cores on your computer. However, if you are running, for example, on your desktop computer or workstation, you might wish to consider using 1 less core than you actually have to help make your machine more responsive for normal interactive use while the job is running. This is completely up to you.

As mentioned on the previous page you should also specify:

--threads=4

for any programs that support it. This option is for cases where --parallel (which also supports MPI and other types of parallelism) cannot be used.

Specifying a number of threads larger than the number of cores your computer has will quite probably cause the job to run more slowly, and in some cases may cause it to run disastrously slowly.

What about hyperthreading - Some computers support the concept of hyperthreading. This is when a CPU pretends to have more cores than it actually has, and tries to run 2 jobs using the same core. Sometimes this can result in improved efficiency, as there may be pieces of the core which can work semi-independently. So, for example, if one job is trying to do floating point math and another job is trying to do integer math, it may be that they can both do their computations at the same time on the same core. This sort of coincidence is rare in most programs. With current generation Intel chips, you can get some benefit out of hyperthreading, but only a little. If your computer has 16 physical cores, and thus 32 "threads", it may be useful to specify as many as 20 or even 24 threads. This will not make the job run 50% faster, but may make it run 5 or 10% faster. Using 32 threads is not generally useful, and will likely be counterproductive.

How do I know how many cores my machine has ? - This depends on what OS you are using. On a Mac, simply use the 'About this Mac' item on the apple menu. It may say something like "2 x 2.66 6 core Xeons" or somesuch (meaning, in this case, 12 cores). Under linux, you can 'cat /proc/cpuinfo', and it will give information on each core. Processors are numbered starting with 0, so if you see 'Processor : 3' as the last entry, you have 4 cores on your machine.

Caveat - Under linux there is a possibility that this number may be 2x larger than your actual number of cores. Intel has a technology called 'hyperthreading' which they use to market their chips. This will make the machine appear to have 2x as many cores as it actually physically has, and can give a performance advantage under some specific situations (like word-processing, etc.), but is actually quite detrimental for something like large computational jobs. Again, if you have only 4 physical cores, with hyperthreading making it look like you have 8 cores, you should only specify 4 threads to EMAN2, or you will almost certainly make your job run slower, and perhaps even crash your machine under certain situations.

Note about disk space: - This parallelism option will put a bunch of scratch files in /tmp. These files can get quite large, so if your /tmp filesystem is small, you may wish to put the scratch files elsewhere. You can just specify --parallel=thread:<n>:</path/to/scratch> to do this, but be warned: The scratch directory MUST be on a local hard drive, NOT a shared filesystem from another computer !!! Violating this could lead to database corruption !

IMPORTANT WARNING ABOUT MEMORY - For most tasks, if you specify thread:4, that job will use 4x as much memory (RAM, NOT disk space) as if you use only a single thread. If you have, for example, only 2 gigs of RAM, and your job was using 1 gig when you ran it on a single processor, if you then specify thread:4, you will probably exhaust your system memory, and likely cause either excessive swapping (making your machine seem to run like molasses), or possibly even crash the entire machine. This is particularly important if you have a machine with, say, 12 cores. This effect can be mitigated to some extent through use of the '--lowmem' option provided for commands like 'e2refine.py', but it will not eliminate the problem. This issue is extremely problem dependent, though. If you are refining something like the demo groel data set with a box size <200, and only ~5,000 particles, memory isn't likely to even approach being exhausted. However, if you're processing a virus with a box size of 800x800 to very high resolution, you will almost certainly have issues unless you have a LOT of RAM.

Note that not all programs will run in parallel. If a program does not accept the --parallel option, then it is not parallelized.

EMAN2/Parallel/Threaded (last edited 2022-08-15 03:30:39 by SteveLudtke)