Diff for "EMAN2/Parallel"

Differences between revisions 13 and 29 (spanning 16 versions)

Parallel Processing in EMAN2

EMAN2 uses a modular strategy for running commands in parallel. That is, you can choose different ways to run EMAN2 programs in parallel, depending on your environment.

Which option is best ? If you are running on a single machine/node, then Threaded is by far the most efficient option, and the easiest to use as well. If you are running on a few nodes on a single cluster, use MPI. In many cases a single cluster node has enough cores that using Threaded parallelism on one cluster node at a time isn't a bad choice. MPI setup can be painful for people not familiar with clusters, and Threaded can be used without any extra configuration.

Please follow the appropriate link:

Threaded - This is for use on a single computer with multiple processors (cores) or a single node of a cluster. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer.
MPI - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details
Distributed - This was the original parallelism method developed for EMAN2. Having said that, it hasn't really been developed or actively used for at least 10 years, with MPI now preferred for clusters and threaded preferred for individual computers. May not even work any more...
--threads option - In addition to --parallel, some commands have a --threads option. There are a few commands which cannot be run using the generic multi-computer parallelism provided by --parallel. These commands may still be able to take advantage of multiple cores on a single machine. --threads is the number of available processors on a single computer. It should be specified in addition to --parallel when both are available.

Note : All 3 parallelism options have been fully supported and stable since early 2011. Both MPI and DC have been tested on jobs using at least 256 cores, for multiple days, and are in routine use on large refinement jobs at multiple sites. That said, DC and MPI can both take a little effort to establish on a new system, particularly if you have no past experience with cluster computing. We are happy to help if you have difficulties.

-  ⇤ ← Revision 13 as of 2010-07-15 19:17:02 → 
  Size: 8677
  Editor: SteveLudtke
  Comment:
+   ← Revision 29 as of 2023-04-15 02:03:25 → ⇥
  Size: 2514
  Editor: SteveLudtke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 4:
-environment. Unfortunately, as of April, 2010, there is still only one available parallelism strategy. This should be gradually fleshed out over
2010. Also unfortunately, it isn't trivial to use for simple multithreaded execution (but it does work). We hope to rectify this soon.
+environment.
-Line 7:
+Line 6:
-Programs with parallelism support will take the --parallel command line option as follows:
+Which option is best ?  If you are running on a single machine/node, then Threaded is by far the most efficient option,
and the easiest to use as well. If you are running on a few nodes on a single cluster, use MPI. In many cases a single cluster node has enough cores that using Threaded parallelism on one cluster node at a time isn't a bad choice. MPI setup can be painful for people not familiar with clusters, and Threaded can be used without any extra configuration.
 Line 9:
---parallel=<type>:<option>=<value>:<option>=<value>:...
+Please follow the appropriate link:
 Line 11:
-for example, for the distributed parallelism model: ''--parallel=dc:localhost:9990''
for the local multicore threaded model: ''--parallel=thread:4''  (where 4 is the number of cores to use)
+ * '''[[EMAN2/Parallel/Threaded|Threaded]]''' - This is for use on a single computer with multiple processors (cores) or a single node of a cluster. EMAN2 can make very efficient use of all of these cores, but this mode will ONLY work if you want to run on a single computer.
 * '''[[EMAN2/Parallel/Mpi|MPI]]''' - This is the standard parallelism method used on virtually all large clusters nowadays. It will require a small amount of custom installation for your specific cluster, even if you are using a binary distribution of EMAN2. Follow this link for more details
 * '''[[EMAN2/Parallel/Distributed|Distributed]]''' - This was the original parallelism method developed for EMAN2. Having said that, it hasn't really been developed or actively used for at least 10 years, with MPI now preferred for clusters and threaded preferred for individual computers. May not even work any more...
-Line 14:
+Line 15:
+ * '''--threads''' option - In addition to --parallel, some commands have a --threads option. There are a few commands which cannot be run using the generic multi-computer parallelism provided by --parallel. These commands may still be able to take advantage of multiple cores on a single machine. --threads is the number of available processors on a single computer. It should be specified in addition to --parallel when both are available.
-Line 15:
+Line 17:
-=== GPGPU Computing ===
While not precisely a parallelism methodology, this technique makes use of the GPU (graphics processing unit) common in most modern PC's, to dramatically
accelerate many image processing algorithms. At present (summer 2009) we are at the initial stages of implementing GPGPU support using Nvidia's CUDA 
infrastructure. We will likely move to OpenCL in future as it becomes a stable platform. We have only implemented a few algorithms using this methodology
to date, and we will need to implement and optimize virtually all of them before this becomes a viable platform for day-to-day use. However, we have demonstrated
speedups of as much as 100x in select algorithms, meaning a desktop PC with a GPU could easily become the equivalent of a small Linux cluster. While all of
the GPGPU code is available in the nightly source snapshots, you are encouraged to contact sludtke@bcm.edu if you are interested in experimenting with this
technology.

=== Local Machine (multiple cores) ===
As of 7/15/2010 this is now supported !  If you just want to use multiple cores on your local machine (no using cycles from other machines, etc.), just
put 'thread:<ncpu>' in the 'Parallel' box, or specify the '--parallel=thread:<ncpu> option on the command line. <ncpu> should of course be replaced
with the number of threads you wish to use.


=== Distributed Computing ===

==== Quickstart ====
For those not wanting to read or understand the parallelism method, here are the basic required steps:

 1. on the machine with the data, make a scratch directory on a local hard drive, cd to it, and run e2parallel.py dcserver --port=9990 --verbose=2
 1. make another scratch directory on a local hard drive, cd to it, and run e2parallel.py dcclient --host=<server hostname>
 1. repeat #2 for each core or machine you want to run tasks on
 1. run your parallel job, like 'e2refine.py' with the --parallel=dc:localhost:9990

Notes
 * If you need to kill the server and restart it for some reason, that's fine. As long as it is restarted within about 5 minutes, it should be harmless
 * Make sure the same version of EMAN2 on all machines, if multiple machines are being used as clients
 * If you need to stop the 'e2refine' program, you can run 'e2parallel.py killall' to cancel any pending jobs on the server after stopping e2refine.
 * You can add or remove clients at any time during a run
 * When you are done running jobs, kill the server, then run 'e2parallel.py dckillclients' from the server directory, and let it run for a minute or two. This will tell the clients to shut down. If you plan to do another run relatively soon, you can just leave the server and clients running.

You should really consider reading the detailed instructions below, though :^)

==== Introduction ====
This is the sort of parallelism made famous by projects like SETI-at-home and Folding-at-Home. The general idea is that you have a list of small jobs to do,
and a bunch of computers with spare cycles willing to help out with the computation. The number of computers willing to do computations may vary with time, and
possibly may agree to do a computation, but then fail to complete it. This is a very flexible parallelism model, which can be adapted to both individual computers
with multiple cores as well as linux clusters or sets of workstations laying around the lab.

There are 3 components to this system:

User Application (customer) <==> Server <==> Compute Nodes (client)

The user application (e2refine.py for example) builds a list of computational tasks that it needs to have completed, then sends the list to the server. Compute nodes with nothing to do then
contact the server and request tasks to compute. The server sends the tasks out to the clients. When the client finishes the requested computation, results are sent
back to the server. The user application then requests the results from the server and completes processing. As long as the number of tasks to complete is larger than the
number of clients servicing requests, this is an extremely efficient infrastructure.

Internally things are somewhat more complicated and tackle issues such as data caching on the clients, how to handle clients that die in the middle of processing, etc., but
the basic concept is quite straightforward.

With any of the e2parallel.py commands below, you may consider adding the --verbose=1 option to see more of what it's doing.

==== How to use Distributed Computing in EMAN2 ====
To use distributed computing, there are three basic steps:
 * Run a server on a machine that the clients can communicate with
 * Run some number of clients pointing at the server
 * run an EMAN2 program with the --parallel option

What follows are specific instructions for doing this under 3 different scenarios.

===== Using DC on a single multi-core workstation =====
 * Ideally your data will be stored on a hard drive physically connected to the workstation (not on a shared network drive)
 * make an empty directory on a local hard drive
 * Run a server on the workstation ''e2parallel.py dcserver'' from the empty directory you just created
 * The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
 * Run one client for each core you want to use for processing : ''e2parallel.py dcclient --server=localhost --port=9990'' (replace the port with the correct number if necessary)
 * Run your EMAN2 programs with the option ''--parallel=dc:localhost:9990'' (again, use the right port number)

===== Using DC on a linux cluster =====
 * The server should run on the node (often the head node or a specialized 'storage node') with a direct physical connection to the storage
 * If you want to use clients from multiple clusters, then remember all of the clients must be able to make a network connection to the server machine
 * Run a server on the head-node ''e2parallel.py dcserver'' in an empty directory on the local hard drive
 * The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
 * Run one client for each core you want to use for processing on each node : ''e2parallel.py dcclient --server=<server> --port=9990'' (replace the server hostname and port with the correct values)
 * Run your EMAN2 programs with the option ''--parallel=dc:<server>:9990'' (again, use the right port number and server hostname)

===== Using DC on a set of workstations =====
 * The server should run on a computer with a direct physical connection to the storage
 * All of the clients must be able to make a network connection to the server machine
 * Run a server on the desired machine ''e2parallel.py dcserver'' in an empty directory on the local hard drive
 * The server will print a message saying what port it's running on. This will usually be 9990. If it is something else, make a note of it.
 * Run one client for each core you want to use for processing on each computer : ''e2parallel.py dcclient --server=<server> --port=9990'' (replace the server hostname and port with the correct values)
 * Run your EMAN2 programs with the option ''--parallel=dc:<server>:9990'' (again, use the right port number and server hostname)

For all of the above, once you have finished running your jobs, kill the server, then run 'e2parallel.py dckillclients' from the same directory.
When it stops spewing out 'client killed' messages, you can kill this server.

'''''IF THIS IS NOT WORKING FOR YOU, PLEASE FOLLOW [[EMAN2/Parallel/Debug|THESE DEBUGGING INSTRUCTIONS]]'''''


=== MPI ===
Sorry, we haven't had a chance to finish this yet. For the moment you will have to use the Distributed Computing mode on clusters.
+Note : All 3 parallelism options have been fully supported and stable since early 2011. Both MPI and DC have been tested on jobs using at least 256 cores,
for multiple days, and are in routine use on large refinement jobs at multiple sites. That said, DC and MPI can both take a little effort to establish on 
a new system, particularly if you have no past experience with cluster computing. We are happy to help if you have difficulties.