Parallel Processing in EMAN2

EMAN2 uses a modular strategy for running commands in parallel. That is, you can choose different ways to run EMAN2 programs in parallel, depending on your environment. Unfortunately, as of May, 2009, the parallelism infrastructure is just beginning to come together. This should be gradually fleshed out over summer 2009. At the moment, only one parallelism infrastructure is fully functional.

Programs with parallelism support will take the --parallel command line option as follows:

--parallel=<type>:<option>=<value>:<option>=<value>:...

for example, for the distributed parallelism model: --parallel=dc:localhost:9990

Local Machine (multiple cores)

Not yet implemented, please use Distributed Computing

Distributed Computing

Introduction

This is the sort of parallelism made famous by projects like SETI-at-home and Folding-at-Home. The general idea is that you have a list of small jobs to do, and a bunch of computers with spare cycles willing to help out with the computation. The number of computers willing to do computations may vary with time, and possibly may agree to do a computation, but then fail to complete it. This is a very flexible parallelism model, which can be adapted to both individual computers with multiple cores as well as linux clusters, or sets of workstations laying around the lab.

There are 3 components to this system:

User Application (customer) <==> Server <==> Compute Nodes (client)

The user application builds a list of computational tasks that it needs to have completed, then sends the list to the server. Compute nodes with nothing to do then contact the server and request tasks to compute. The server sends the tasks out to the clients. When the client finishes the requested computation, results are sent back to the server. The user application then requests the results from the server and completes processing. As long as the number of tasks to complete is larger than the number of clients servicing requests, this is an extremely efficient infrastructure.

Internally things are somewhat more complicated and tackle issues such as data caching on the clients, how to handle clients that die in the middle of processing, etc., but the basic concept is quite straightforward.

How to use Distributed Computing in EMAN2

To use distributed computing, there are three basic steps:

What follows are specific instructions for doing this under 3 different scenarios.

Using DC on a single multi-core workstation

Using DC on a linux cluster

Using DC on a set of workstations

MPI

Sorry, we haven't had a chance to finish this yet. For the moment you will have to use the Distributed Computing mode on clusters.