Last update: February 10, 2016.

Single Particle Tomography in EMAN2

RECOMMENDATIONS

Get comfortable with the command-line. The pipeline through e2projectmanager.py is very limited and less robust.

You can try to average subtomograms with EMAN2's stable release (EMAN2.12 from October 19, 2015); however, I recommend using the daily build. Download the easy-to-install binaries from EMAN2's download page:

http://ncmi.bcm.tmc.edu/ncmi/software/software_details?selected_software=counter_222

SPT can be very computationally intensive, both in terms of memory and processing speed. However, illustrative exercises and tutorials can be carried out on a laptop with 2-4GB of RAM. For more realistic SPT on full 3D alignments of large sets (hundreds or thousands of subtomograms) comprising large subtomograms (like viruses), 8GB of memory as a minimum and the use of multiple processing units or GPU technology are advised.

If you use any EMAN2 program with the 'e2spt_' prefix in it, or e2symsearch3d.py, please cite the following paper:

Galaz-Montoya, J.G., Flanagan, J., Schmid, M.F. and Ludtke, S.J., 2015. Single particle tomography in EMAN2. Journal of structural biology, 190(3), pp.279-290.

We have submitted a paper describing CTF correction for cryoSPT but no tutorial is yet available (might write one in the upcoming weeks, June 2016).

Galaz-Montoya, J.G., Hecksel, C.W., Baldwin, P.R., Wang, E., Weaver, S.C., Schmid, M.F., Ludtke, S.J. and Chiu, W., 2016. Alignment algorithms and per-particle CTF correction for single particle cryo-electron tomography. Journal of structural biology, 194(3), pp.383-394.

PATCH FOR e2spt_classaverage.py in the 'stable' EMAN2.12 release

*If* using the stable EMAN2.12 release from October 9, 2015, I recommend replacing the e2spt_classaverage.py script inside your /EMAN2/bin directory with the file attached here e2spt_classaverage_patch.py You might need to do right-click followed by 'save as' to download it (or control click, or whatever; clicking on it directly with the left mouse click might just open the text file in a separate browser window). You'll have to change the name to e2spt_classaverage.py, then open the file with any text editor, and change the first line of the script so that it matches the first line of all other EMAN2 scripts. This is easy: 1) Open any other EMAN2 script with a text editor (for example, e2.py, or e2version.py) and copy the first line, which might say something like "#!/usr/bin/python2.7" 2) Open the e2spt_classaverage_patch.py file you just downloaded 3) The first line of e2spt_classaverage_patch.py should be "#!/usr/bin/env python" 4) Delete such first line and paste the line you copied in step 1. 5) Rename e2spt_classaverage_patch.py to e2spt_classaverage.py

This patch fixes some issues that really should only come up when you're doing weird things like aligning a stack of subtomograms against a reference for only 1 iteration [in most cases, you actually want to *iteratively refine* the subtomograms against the reference, to prevent model bias].

DATA

VEEV virus data (coming soon).

Epsilon 15 virus test data, used in the EMAN2 Workshop in 2011.

e2spt_data.zip

TRiC chapeornin test data (coming soon).

e2spt_data_apoTRiC.zip

COMMANDS

(For easy "copy-pasting" into the command line; to see the entire list of parameters for each program, type the program name followed by -h at the command line). All the commands below assume you're running the daily build).

SPT BOXER

We only support ZSHORT tomograms now to minimize confusion and the --yshort option will be deprecated. ALWAYS rotate your tomograms so that Z is the shortest side (and XY are the plane of the camera). You can do this trivially with IMOD from the command line if you didn't select the option in the ETOMO pipeline (also IMOD's):

clip rotx input.rec input_ZSHORT.rec

Then, to start picking subtomograms from your proper, ZSHORT tomogram with EMAN2, type the following command at the command line:

e2spt_boxer.py input_ZSHORT.rec --inmemory

(Of course, you'll need to replace the name of the tomogram with your actual filename :p).

The boxer has internal options that are helpful, such as averaging slices, and lowpass filtering dynamically.

BASIC SPT ALIGN/AVERAGE COMMANDS

If/when you have a reference, and you simply wish to align particles to the reference, and average the best subset together, there is a new, simplified pipeline for doing this. In this method, an spt_XX folder is created, and within each folder will be a sequence of .json files containing alignment information and 3-D volumes with population averages. This process can be iterated multiple times within one spt_XX folder.

e2spt_align

e2spt_align.py <subtomo_stack> <reference> --threads <nthreads> [--sym <sym>] [--path <spt_xx>]

If --path isn't specified, a new folder will be automatically created. If iterating manually, it makes sense to use the same spt_XX folder repeatedly. New alignments will be stored with increasing iteration numbers.
If --sym is specified, the reference volume MUST be aligned to the canonical EMAN2 symmetry axes. See e2symsearch3d.py and e2proc3d.py --sym to do this.
For octahedral and icosahedral symmetry, alignments may take significantly longer than asymmetric refinements. This is due to the fact that many such structures can be somewhat smooth, and finer initial search is required to obtain an accurate alignment.
the only output of this command is a .json file, containing the alignment information

e2spt_stat

e2spt_stat.py --path <spt_XX> --gui [--iter <N>]

This command will plot a histogram of similarity values between the particles and the reference. SMALLER IS BETTER!

e2spt_average

e2spt_average.py [--path <spt_XX>] [--iter <N>] [--simthr <threshold>] [--threads <nthreads>]

This will take the already determined alignments and generate an average of particles with a similarity score below the threshold
Uses even/odd semantics to produce an FSC curve. Note that no phase-randomization is done here, so this is not a "gold standard" FSC.
Note that this process does NOT impose symmetry. This must be done manually with e2proc3d.py --sym if desired. This also means the FSC curves do not take symmetry/masking into account.

SPT ITERATIVE REFINEMENT

Command for alignment with e2spt_classaverage.py

The alignment tools for SPT in EMAN2 have changed dramatically and use the "tree aligner" by default (type 'e2help.py aligners' at the command line to see all available aligners and the options they take). This aligner automates ALMOST ALL preprocessing options for you, so there are very few parameters you *need* to specify (unless, of course, the default settings aren't working and you need to figure out something clever for your particular data). The algorithm also converges pretty quickly (usually) so on good data with a high signal to noise ratio (SNR) and not too much structural heterogeneity, 4-8 iterations of refinement might be more than enough. A basic example of a command to perform iterative refinement on a subtomogram stack using an initial reference would be:

e2spt_classaverage.py --input stack.hdf --ref model.hdf

To consider more than one potential answer during the different stages of the complex default alignment algorithm (citation pending), add:

--npeakstorefine n

(you have to replace 'n' with the number of potential answers to consider; for example, 10. Analogously, wherever 'n' or any other letter appears in a suggested command [e.g., 'x','y','r', etc.], you'll have to replace it with a sensible number). This will increase alignment time but will also increase the probability of improved alignments.

To turn gold-standard refinement OFF (it's on by default), add:

--goldstandardoff

To add parallelization on a single workstation, add:

--parallel thread:n

To define the number of iterations to refine the data for, add:

--iter n

To define the name of the subdirectory where results will be compartmentalized, add:

--path whateverdirectoryname

If your particle has symmetry, then explicitly tell the aligner to consider this, by adding:

--align rotate_translate_3d_tree:sym=N

(replace N with cn, dn, icos or tet. For example, d8 for MMCPN, d7 for GroEL, icos for icosahedral viruses)

To impose symmetry on the final average (literally, symmetrize it), add:

--sym N

(replace N with cn, dn, icos or tet).

To save the final stack of aligned particles, add:

--saveali

To save a stack containing the updated average from each and all iterations, add:

--savesteps

So, in summary, if I were refining GroEL using an external model, I would run the following command:

e2spt_classaverage.py --input stack.hdf --ref model.hdf --parallel thread:24 --npeakstorefine 4 --sym d7 --align rotate_translate_3d_tree:sym=d7 --iter 4 --path groel_test --saveali --savesteps

There are many more options to this program but you probably do NOT need them. If you want to apply additional masks or filters other than what the program does internally already, just add whichever of the following preprocessing options seems like a good idea:

--lowpass filter.lowpass.tanh:cutoff_freq=F ('F' is 1/resolution; e.g., 0.01 filters to 100 angstroms) --highpass filter.lowpass.gauss:cutoff_freq=F --mask mask.soft:outer_radius=R --threshold threshold.belowtozero:minval=N (this is just an EXAMPLE; I don't suggest you use this) --normproc normalize.edgemean --preprocess AnyEMAN2processor

To see all EMAN2 processors, type the following at the command line:

e2help.py processors

The --lowpass parameter can take any lowpass processor. The --highpass parameter any highpass processor. So on and so forth. The format is usually --option_name processor_name:parameter1=value1:parameter2=value2:parameter3=value3

You can have a space or an '=' sign between '--option_name' and the rest. The format is a bit obnoxious (I know), but it is what it is :-/ . It might make more sense once you get used to it

BUILDING AN INITIAL MODEL

If you do not have a suitable external reference or want to avoid model bias altogether, e2spt_classaverage.py can build an initial model for you 'ab initio' (from scratch). Just don't supply the --ref parameter. That is, run:

e2spt_classaverage.py --input stack.hdf

And add as many other options as you like as explained above. If you do not wish to remain agnostic to the initial model generation step, there are 3 ways an initial model can be built in EMAN2: 1) by "binary tree" alignment (BTA; this is the default method that e2spt_classaverage.py uses, 2) Hierarchical Ascendant Classification (HAC or "all vs all"; this method takes *A LOT* of time compared to the others, 3) by Self-Symmetry Alignment (SSA). [Read Galaz-Montoya et al 2015 referenced at the top of the page]. You can tell e2spt_classaverage.py which initial model generation method to use, and how many particles to use to generate the initial model. For example, if you have a stack of 10,000 particles, you might still be able to build a good initial model with just 100 or fewer. BTA is performed by default on the entire stack. To limit the number of particles, add the following to the command above:

--btaref N

where N is the number of particles to use for initial model generation.

To select HAC instead of BTA, add:

--hacref N

To select SSA, add:

--ssaref

These three parameters are mutually exclusive. Only supply one. For example:

e2spt_classaverage.py --input stack.hdf --btref 10 --goldstandardoff --saveali --savesteps --iter 4

Command for building an initial model with e2spt_hac.py

This program takes almost the same parameters as e2spt_classaverage.py. Type e2spt_hac.py -h to see all the parameters. They should be mostly self-descriptive. The main purpose of this program is just to build initial models using a "all vs all" approach (this essentially computes the similarity matrix of the entire dataset and progressively averages unique best pairs). There are many complicated options that you most likely don't need and which would have to be explained in a full-length, proper tutorial.

Command for building an initial model with e2spt_binarytree.py

This program takes almost the same parameters as e2spt_classaverage.py. Type e2spt_binarytree.py -h to see all the parameters. They should be mostly self-descriptive. The main purpose of this program is just to build initial models using a tree approach. The program starts with a subset equal to the largest power of 2; (for example, for a set of 100 subtomograms, the largest subset that is a power of 2 would be 64). The subset is averaged in pairs, iteratively, until all subtomograms converge to one average. E.g., if the program starts with 64 subtomograms, it will align and average 1+2, 3+4, 5+6, etc; effectively yielding 32 new averages. In the next iteration, these averages of 2 particles take the place of new subtomograms. So, starting with 32 new subtomograms, the program again will average 1+2, 3+4, 5+6... etc., until 16 new averages are produced. So on and so forth... 8, 4, 2... 1. All particles will converge into one average.

Command for building an initial model with e2symsearch3d.py

This program for initial model generation works on particles that have symmetry. Some parameters in this program are the same as in the alignment programs previously described. A basic command would look something like this:

e2symsearch3d.py --input stack.hdf --sym N

Usually, answers are better if you specify a large number of iterations (--steps) and preprocess the particles heavily, as follows:

To change the number of iterations from default, add:

--steps N

(10-20 might be enough for icosahedral particles with good SNR; 50-100 or more might be needed for smaller, lower-symmetry particles)

To shrink the subtomograms:

--shrink N

To mask:

--mask mask.soft:outer_radius=R

To lowpass:

--lowpass filter.lowpass.gauss:cutoff_freq=0.01 (e.g., this low pass filters to 100 angstroms).

To impose symmetry on the images you have aligned to the symmetry axis, add the following parameter:

--symmetrize

To get the average of the images (whether symmetrized or not), add the following parameter:

--average

As with other programs, you can provide practically any processor listed when you type e2help.py processors.

Command for e2spt_refinemulti.py (coming soon)

Command for e2spt_ctf.py (coming soon)

Command for e2tomo_ctf.py (coming soon)

DEPRECATED

OTHER programs (use at your own risk; some parameters in these programs are under experimental development. Tun the program at the command line followed by -h to see current available parameters for each program)

Command for e2spt_resolutionplot.py

e2spt_resolutionplot.py --vol1=half1avg.hdf --vol2=half2avg.hdf --output=whatever3.txt --npeakstorefine=1 --verbose=0 --shrink=3 --shrinkfine=2 --mask=mask.sharp:outer_radius=36 --lowpass=filter.lowpass.gauss:cutoff_freq=.02:apix=4.401 --align=rotate_translate_3d:search=4:dphi=30:delta=30:sym=icos --parallel=thread:8 --falign=refine_3d_grid:delta=15:range=30:search=2 --aligncmp=ccc.tomo --faligncmp=ccc.tomo --normproc=normalize --sym=icos

Command for e2spt_rotationalplot.py

e2spt_rotationalplot.py --input=initModel.hdf --output=toAs129avsaAVG.txt --daz=1 --shrink=1 --dalt=180 --mask=mask.sharp:outer_radius=28

Command for e2spt_radialdensityplot.py

e2spt_radialdensityplot.py --vols=volA_aligned.hdf,volB_aligned.hdf --normproc=normalize.edgemean --lowpass=filter.lowpass.gauss:cutoff_freq=0.02:apix=4.401 --singleplot --output=volAali_VS_volBali.png

Command for e2spt_simulation.py

e2spt_simulation.py --input=groel.pdb --snr=5 --nptcls=8 --tiltstep=5 --tiltrange=60 --transrange=10 --saveprjs --addnoise --simref --path=TESTsimREF --pad=3 --shrink=2 --finalboxsize=96 --negativecontrast

Command for e2spt_tomosimjobs.py

e2spt_tomosimjobs.py --input=groel.pdb --nptcls=8 --saveprjs --addnoise --simref --path=TESTsimREF --pad=3 --shrink=2 --finalboxsize=96 --snrlowerlimit=0 --snrupperlimit=1 --snrchange=1 --tiltsteplowerlimit=0 --tiltstepupperlimit=1 --tiltstepchange=1 --tiltrangelowerlimit=60 --tiltrangeupperlimit=61 --tiltrangechange=1 --negativecontrast --testalignment

Command for e2spt_autoboxer.py

e2spt_autoboxer.py --tomogram=tomo_inv.rec --ptclradius=8 --path=whatever --concentrationfactor=1 --output=subtomostack.hdf --outputboxsize=36 --verbose=10 --goldstack=gold_ptcls_s05_inv.hdf --pruneprj --goldthreshtomo --keepn=150 --lowpass=filter.lowpass.gauss:cutoff_freq=0.02

Command for e2spt_refinemulti.py

e2spt_refinemulti.py -v 0 --path=RF --input= --nrefs=2 --refgenmethod=binarytree --shrink=3 --shrinkfine=2 --iter= --mask=mask.sharp:outer_radius= --npeakstorefine= --lowpass=filter.lowpass.gauss:cutoff_freq=.02:apix=4.401 --highpass=filter.highpass.gauss:cutoff_freq=0.002:apix=4.401 --parallel=thread:24 --averager=mean.tomo --aligncmp=ccc.tomo --faligncmp=ccc.tomo --saveali --savesteps --normproc=normalize --radius=150

Command for e2spt_fftamps.py

e2spt_fftamps.py --input=file.hdf

Command for e2spt_wedge.py (GUI available)

ANCIENT E2SPT USERS' GUIDE

The PDF Users' Guide is extremely deprecated (the instructions for the boxer therein should still work though). The format of this tutorial is dauntingly inconvenient and therefore I will write modular tutorials, one for each step (and each program) in the workflow to follow. * Single particle tomography USER'S GUIDE (updated summer 2016; under major refactoring due to extensive changes in e2spt capabilities).

e2spt_users_guide_tutorial_april_2016_alpha.pdf