EMAN2 Tomography mini-Workflow Tutorial

This version of the EMAN2 Tomography Pipeline tutorial is designed to run on well equipped laptops or standard workstations, unlike the full tutorial which requires a well-equipped tomography workstation. It should be possible to complete this tutorial in a reasonable time on a computer with 16 GB of ram and 4 cores, but resolution will be limited to ~15 A, not the subnanometer resolution provided by the main tutorial.

Computer Requirements

Note: Anyplace in EMAN2 where you are requested to enter the number of threads to use, you should specify the number of cores your machine has. Computers are often advertised as 4 core/8 thread or 8 core/16 thread. Trying to run image processing using this advertised number of threads will usually make processing run slower, not faster. You may optionally increase the number of cores by ~25%, ie - on a 4 core machine, 5 may be a reasonable number to specify.

Download Data

Prepare input files (~2 minutes)

e2projectmanager.py&

Project Manager

When working with your own data:

Tiltseries Alignment and Tomogram Reconstruction (10 min)

Alignment of the tilt-series is performed iteratively in conjunction with tomogram reconstruction. Tomograms are not normally reconstructed at full resolution, generally limited to 1k x 1k or 2k x 2k, but the tilt-series are aligned at full resolution. For high resolution subtomogram averaging, the raw tilt-series data is used, based on coordinates from particle picking in the downsampled tomograms. On a typical workstation reconstruction takes about 4-5 minutes per tomogram.

For the tutorial tilt-series:

If you opted to run with notmp on one or more tilt series:

Tomogram reconstruction

When working with your own data:

Handedness Check (can be skipped in the tutorial)

EMAN2 will automatically locate the tilt axis in a tilt series if it is not provided. Unfortunately there is a +- ambiguity in this determination. An incorrect choice will lead to structures with the incorrect handedness, and may produce suboptimal CTF correction.

EMAN2 includes a novel procedure for resolving this ambiguity from a tilt series based on defocus estimates across the tilted images. The tutorial data set comes out correctly without running this check, but when working with your own data, this step is highly recommended. Once you know the correct tilt axis direction to use for a given microscope/camera, you shouldn't need to run this test on every data set, but it may not be a bad idea even then, as there are various possible configuration/software errors on the instrument which could potentially cause inconsistent results, particularly with a change of magnification.

For the tutorial tilt-series:

You will need to look at the console where you launched e2projectmanager to see the results of the test. It should look something like:

Average score: Current hand - 4.133, flipped hand - 3.290
Defocus std: Current hand - 0.110, flipped hand - 0.165
Current hand is better than the flipped hand in 86.4% tilt images
The handedness (--tltax=-4.1) seems to be correct. Rerun CTF estimation without the checkhand option to finish the process.

If you run this check on multiple images and it seems that they indicate a consistent tilt axis/handedness error, then you need to return to the previous step (Tomogram Reconstruction) and redo the reconstruction for all tomograms, with the correct tilt axis entered in the corresponding box. The same tilt axis should be used for all tilt series collected under the same conditions on the same instrument.

Note: This method removes almost all of the ambiguity about particle handedness. The one potential issue is that the MRC file format uses a non-conventional origin for images. If the data collection software doesn't take this into account, the images may be flipped when written to disk. The easiest way to check the software would be to collect 2 images of the same target and save them directly into different file formats, then checking (in different software) whether the two images appear to have the same handedness

CTF Estimation (<10 min)

For the tutorial tilt-series:

When working with your own data:

Note: this program is only estimating CTF parameters, taking tilt into account. It is not performing any phase-flipping corrections on whole tomograms. CTF correction is performed later as a per-particle process. This process requires metadata determined during tilt-series alignment, so it cannot be used with tomograms reconstructed using other software packages.

Note: In >2022 snapshots of EMAN2 it is possible after CTF correction to return to the 3-D reconstruction step and produce CTF corrected whole tomograms, but this does nothing useful when following the EMAN2 pipeline. If you wish to compare EMAN2 tomograms with other software doing CTF correction, this could potentially be useful

Tomogram reconstruction evaluation (optional)

Tomogram evaluation

Analysis and visualization -> Evaluate tomograms can be used to evaluate the quality of your tilt series alignments and tomogram reconstructions. This tool will show more information as you progress through the tutorial, but can be used already at this point to make various assessments of your tomograms. Note that some of this information may not be available if you had notmp checked during the reconstruction.

Particle Picking Choices

There are 4 different tools you can use for particle picking in EMAN2 as of Feb 2022:

  1. Abusing the deep-learning based segmentation tool for particle picking purposes
  2. Manual particle picking
  3. Template based picking (usually seeded with some manual picking results)
  4. A new deep-learning based 3-D picker, not available in 2.91, must use a recent snapshot. New automatic particle picking

For live versions of this tutorial, we use the older manual+template based approach as it requires no specific hardware, and is a good learning experience, but the deep learning 3-D picker is a good choice for many situations. For cellular tomograms, the annotation tool approach may still be a good choice.

(1) Tomogram annotation (GPU recommended)

2D particle picking

For a detailed description of how to use the annotation tool, see: TomoSeg

Here is a brief summary of the annotation-based approach:

Manual particle picking (10-15 min)

3D particle picking

Particle extraction (2 min)

In this pipeline, the full 1k or 2k tomograms are used only as a reference to identify the location of the objects to be averaged. Now that we have particle locations, the software returns to the original tilt-series, extracts a per-particle tilt-series, and reconstructs each particle in 3-D independently.

For the tutorial tilt-series:

For your own data

Initial model generation (10 - 60 min)

While intuitively it seems like (since the particles are already in 3-D) the concept of an "initial model" should not be necessary. Unfortunately, due to the missing wedge, and the low resolution of one individual particle (particularly from cells), it is actually critical to make a good starting average. Historically it has been challenging to get a good starting model, depending on the shape of the molecule. This new procedure based on stochastic gradient descent has proven to be quite robust, but it is difficult for the computer to tell when it has converged sufficiently. For this reason, the default behavior is to run much longer than is normally required, and have a human decide when it's "good enough" and terminate the process. If you use a small shrink value and let it run to completion, it can take some time to run. This is harmless, but unnecessary. While the section below the solid line remains fully functional, a new program available since 2021 does a much more efficient job of making initial models. It hasn't been integrated into e2projectmanager yet, but it is enough of an improvement, we will go ahead and use it here regardless. The original instructions are preserved below the horizontal line if you prefer the older approach.

If you didn't launch e2projectmanager with an & at the end of the line, you will need to exit it (close the windows) to run the following command. Replace the 4 in thread:4 with the appropriate number of threads for your computer. If you called your set something other than initribo, you may need to change that as well.

e2spt_sgd_new.py sets/initribo.lst --res 50 --parallel thread:4

The second program will produce output like:

Gathering metadata...
 69/69
iter 0, class 0:
17 jobs on 4 CPUs
iter 1, class 0:
17 jobs on 4 CPUs
iter 2, class 0:
17 jobs on 4 CPUs

Once it gets past 3-4 iterations, you can use the browser to look in sptsgd_00, and double-click on output_cls0.hdf. This file will change as more iterations complete. It contains the results of the most recent iteration. If you double click on it again later, it will load another map into the same 3-D display. You can then open the control-panel for the 3-D display (middle-click) and use the Seq slider to cycle through the maps. When you are satisfied with the quality of the initial model, press ctrl-C, which will kill the initial model generating job.

At this point you can also close the browser window and relaunch e2projectmanager.py.


Initial model generation

This section is the older program, which is still functional, and is integrated into the project manager. If you completed the section above the line, you can skip to the Template Matching section.

For the tutorial tilt-series:

For your own data:

Template matching (5 min)

In this step, we will use the initial model you just produced as a template for finding all of the ribosomes in all 4 tomograms. If you completed the Tomogram Annotation step above, and have already extracted a full set of 1000+ particles, then you can skip this step, as we already have all of the particles. Note that here, and everywhere else in the tomography pipeline, reconstructed particles have positive contrast (look white in projection) and tomograms/tilt series have negative contrast (look dark in projection). If you wish to use a reference volume from the PDB or somesuch, then it should have positive contrast as is normal in the single particle CryoEM field.

Particle extraction (~15 min)

Again, if you already did Tomogram Annotation above, this step isn't necessary. It is only required if you just did Template Matching.

Since this involves several thousand particles instead of 30-50, it will take quite a lot longer to run. The actual time will depend partially on the speed of your storage.

For the tutorial tilt-series:

New integrated refinement program

There is a new refinement program which implements both traditional subtomogram averaging and subtilt refinement in a single program. Like the other new software referenced above, this new program is not yet integrated into e2projectmanager, and must be run from the command line. This is an alternative to the next two major sections (Subtomogram Refinement and Subtilt Refinement). The full tutorial on the new program is here.

You may need to replace sets/ribo.lst with whatever you named your set. Replace both 4s with the number of threads for your machine. If you didn't use the "new" style initial model generation above, you may also need to alter --ref. Run the following with necessary changes:

e2spt_refine_new.py --ptcls sets/ribo.lst --ref sptsgd_01/output_cls0.hdf --iters p,p,p,t,p,t,r --parallel thread:4 --threads 4

The results of this newer command

Old Subtomogram refinement (~1 hr/iteration)

3D refinement As an alternative to the new integrated tool above, the older pair of programs is still available. You shouldn't need to do both approaches. This step is similar to the "p" iterations above, though it uses an older algorithm.

This step performs a conventional iterative subtomogram averaging using the full set of particles. Typically it will achieve resolutions in the 15-25 A range with a reasonable number of particles. As it involves 3-D alignment of the full set of particles multiple times, it takes a significant amount of compute time. Higher resolutions are achieved in the next stage after this (subtilt refinement).

For the tutorial tilt-series:

Results will gradually appear in spt_XX/ Feel free to look at intermediate results with the EMAN2 file browser as they appear.

For your own data:

Old Subtilt refinement (~9 hr/iteration)

Subtilt refinement directory This is the second half of the old refinement strategy. It is conceptually similar to the t,p and r iterations in the newer integrated program above.

With the results of a good subtomogram alignment/average, we are now ready to switch to alignment of the individual particle images in each tilt, along with per-particle-per-tilt CTF correction and other refinements. This is effectively a hybrid of single particle analysis and subtomogram averaging, and can readily achieve subnanometer resolution IF the data is of sufficient quality. The tutorial data set is, but many cellular tomograms, for example, are not collected with high resolution in mind, and even with this sort of refinement will be unable to achieve resolutions better than 10-30 A, depending on the data. This process is completely automatic, based on all of the metadata collected up to this point. While it is possible to perform "subtomogram refinement" with subtomograms from any tomogram, Subtilt Refinement cannot operate properly unless all preceding steps occurred within EMAN2.

For the tutorial tilt series:

For your own data:

Congratulations! The final result of the tutorial will be found in "subtlt_00/". The final 3-D map will be "threed_04.hdf" with the default parameters. The final gold standard resolution curve will be "fsc_maskedtight_04.txt". The optional steps below are tools you can use to evaluate your results in more detail.

Refinement evaluation (optional)

Refinement evaluation This tool helps visualize and compare results from multiple subtomogram refinement runs.