Transitioning from EMAN1 to EMAN2 with command line tools
While the workflow is the correct choice for many users, we realize that some prefer to manipulate their data more directly. Many users who wish to use the command line are doing so because they are trying to update existing scripts currently using EMAN1. Note that this tutorial was written for EMAN2.03 or above.
Please note that for EMAN2 it is important to use HDF or BDB: files for all EMAN2 processing, as other formats do not support a flexible enough metadata model (header information). Chimera is able to directly read EMAN2 HDF files. When you have a structure you need to get into a different format for use in another program, feel free to convert to whatever you like (EMAN2 still supports virtually all formats). Just be aware that some header info will be lost in the conversion. Please be sure to read this Important Warning.
Introduction to the EMAN2 Commmand-Line
The first thing to note is that EMAN2 uses standard unix-style command-line arguments rather than EMAN1 style arguments. For example: proc2d abc.hed def.spi clip=64,64 --> e2proc2d.py abc.hed def.spi --clip=64,64
Basic Image Processing
In direct analog to EMAN1, EMAN2 has e2iminfo.py, e2proc2d.py, e2proc3d.py. As with EMAN1, both of the proc commands can be used to convert among any of EMAN2's supported file formats, simply by using the correct extension for the output file. There is also an --outtype option to manually specify the output format.
It is important to understand how both programs deal with image stacks (a single file with multiple images). In EMAN1, only 2-D images could exist in stacks, simply because there were no 3-D file formats that supported multiple 3-D volumes at the time. That is no longer true, and HDF and BDB both support stacks of volumes. However, with 2-D images, stacks are used extensively, and with 3-D, only rarely. As a result e2proc2d.py and e2proc3d.py handle stacks differently by default. e2proc2d.py will append images to the output stack unless --inplace is specified. e2proc3d.py on the other hand will overwrite images by default, unless the --append option is specified. Finally, e2proc2d.py is capable of treating a single 3-D image as a stack of 2-D images in various ways. so:
e2proc2d.py file2d.hdf file2d.hdf --mult=-1 - would make an inverted copy of all of the images in file.hdf and append them to the end, doubling the size of the stack
e2proc3d.py file3d.hdf file3d.hdf --mult=-1 - would invert all of the 3-D volumes in file3d.hdf in-place. The file would be the same size when complete
Beyond this, a number of options have changed. In both e2proc2d.py and e2proc3d.py, it is now possible to construct filter chains; that is, a sequence of filters or other processors to apply to the image(s). This is done via the --process directive in either program. There are currently over 175 different processors in 10 different categories, such as filters, masks, thresholds, mathematical, etc. Any of these processors can be applied in sequence using the directive --process=procname:option=value:option=value. You can get a list of available processors using e2help.py processor or e2help.py processor -v 2 for more detail. For example:
e2proc3d.py file3d.hdf file3d_filt.hdf --process=filter.lowpass.gauss:cutoff_freq=0.1 --process=mask.sharp:outer_radius=64
If you prefer to adjust filter parameters in a GUI, the new e2filtertool.py program allows you to graphically construct and edit filter chains.
For comparative tasks, such as computing a FSC between two maps, note that the option order has changed, so the second parameter is an output not an input:
e2proc3d.py file3d.hdf fsc.txt --calcfsc=file3d_2.hdf
Note that like EMAN1, the --scale=scale_factor --clip=x[,y,z] operation performs sampling or interpolation, and does not do any averaging. If you wish to use scale factors <1.0, you may wish to consider the --meanshrink= or --medianshrink= options.
All CTF processing is handled by the e2ctf.py program. To display the graphical interface, you must specify the --gui option, but this requires first having run automatic parameter determination. For example, if you have particle data in the ptcl directory, with particles from each frame in a separate HDF file:
Important note: EMAN2's CTF correction requires a larger box size than EMAN1, both to improve correction accuracy and permit better background assessment. In EMAN2, the box size should be 1.5x - 2x the maximum dimension of a particle projection. See EMAN2/BoxSize for more.
Make sure your input particles have been edgenormalized, and have the correct contrast (bright on a darker background). If not, correct all particle stacks with e2proc2dmulti.py with the --process=normalize.edgemean or --mult=-1 options.
e2ctf.py --cs=4.1 --ac=15 --voltage=300 --apix=2.12 --oversamp=2 --autofit --gui ptcl/*hdf
- You may wish to manually check the CTF results quickly.
- If you find a defocus is significantly off, adjust it manually to near the correct value and press 'refit'.
- Be sure to press
Compute a structure factor (you may opt to use a subset of the data instead): e2ctf.py --oversamp=2 --computesf ptcl/*hdf
rerun : e2ctf.py --cs=4.1 --ac=15 --voltage=300 --apix=2.12 --oversamp=2 --autofit --gui ptcl/*hdf
optionally: e2ctf.py --refinebysnr ptcl/*hdf
generate phase-flipped and other files (using oversamp=1 is important here): e2ctf.py --oversamp=1 --phaseflip --phasefliphp --wiener --storeparm ptcl/*hdf
Note that like EMAN1, these are used for initial evaluation of the data and initial model generation only. They are not used for final 3-D refinements.
Get a list of the available wiener filtered stacks in bdb notation: e2bdb.py -s particles --filt=wiener
Make a virtual-stack (like an EMAN1 .lst file) containing the particles you wish to use (don't have to be Wiener filtered) : e2bdb.py --makevstack=bdb:sets#stack_for_2d bdb:particles#dh3962_ctf_wiener bdb:particles#dh3965_ctf_wiener bdb:particles#dh3986_ctf_wiener bdb:particles#dh3997_ctf_wiener bdb:particles#dh4017_ctf_wiener
Run 2-D refinement: e2refine2d.py --input=bdb:sets#stack_for_2d --iter=6 --ncls=24 --naliref=6 --nbasisfp=6 --parallel=thread:4
if generating a lot of classes (more than 100), use the --fastseed option
--parallel= option is common to many programs. Not all operations will run in parallel. See: EMAN2/Parallel
You may wish to consider shrinking the particle data (e2proc2d.py in out --meanshrink=2) before class-averaging for better speed.
- Many other options. This is just a representative example.
Results will be in r2d_xx in a variety of database files. For the example above, final class-averages will be in bdb:r2d_01#classes_05
Use e2display.py to look at the results.
Initial Model Generation
- If you don't already have an appropriate initial model EMAN2 can generate 1 (or more) for you.
- First, select a subset of the class averages. The selected averages should all be 'good' high contrast averages, and should represent as diverse a range of orientations as possible.
There are various ways of extracting good particles. The approach I use is to display the set of class-averages using e2display.py, middle-clicking for a control-panel, then using Del mode to delete the averages I don't want. When I'm done, I Save the results to a new file.
Run the initial model generator: e2initialmodel.py --input=good_classes.hdf --iter=8 --sym=c6 --tries=5
- Clearly you at least need to change the symmetry to match your structure.
The initial model generator works using the recommended approach in EMAN1. It starts with a randomized blobby model, then runs a very rapid sequence of --iter= iterations of standard refinement. It does this --tries= times, producing --tries= possible initial models.
- When this is complete, you will find several files in the initial_models directory. There are 4 types of files:
- model_xx_yy : This is the refined model you would use as a starting model
- model_xx_yy_aptcl : Contains the class-averages alternating with the corresponding projections of the model after the final iteration. Poor agreement between pairs is an indication of a bad initial model.
- model_xx_yy_proj : Projections from the final round of refinement covering the asymmetric triangle.
- model_xx_yy_init : The initial model for this refinement. Just in case you want to see it.
- xx is the run number. If you run the initial model generator a second time, this number will be incremented.
- yy is the model number. In theory these are sorted in order of quality at the end of the run, so 01 will be the best. However, this isn't very reliable. You should check all of the models, and all of the _aptcl files to find the best one.
- If you didn't get a good model, run it again, until you're happy. Note that these models are not going to be perfect. They just need to be vaguely the correct shape, so when you run a 'real' refinement it will converge to the right thing. If you have no success with this approach, random conical tilt and single particle tomography are both viable alternatives, though both will require additional data collection.
The main refinement command in EMAN2 is e2refine.py. It has many more options than EMAN1, which gives you more precise control over how the refinement runs internally. This flexibility is not required in most cases, but is available if you need it. The graphical Workflow interface tries to explain what these options mean and how to use them. You can find some details here: EMAN2/Programs/e2refine.
Here is a typical command: e2refine.py --input=bdb:sets#set-all-filt1_phase_flipped-hp --parallel=thread:10 --mass=800.0 --apix=2.12 --automask3d=0.7,21,8,8,21 --iter=5 --sym=d7 --model=bdb:/refine3/refine3/eman1v2/eman2/refine_01#threed_filt_02 --path=refine_02 --orientgen=eman:delta=2.0:inc_mirror=0 --projector=standard --simcmp=frc:zeromask=1:snrweight=1 --simalign=rotate_translate_flip --simaligncmp=frc:zeromask=1:snrweight=1 --simralign=refine --simraligncmp=frc:zeromask=1:snrweight=1 --twostage=2 --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=frc:zeromask=1:snrweight=1 --classralign=refine --classraligncmp=frc:zeromask=1:snrweight=1 --classiter=1 --classkeep=0.8 --classnormproc=normalize.edgemean --classaverager=ctf.auto --sep=5 --m3diter=2 --m3dkeep=0.8 --recon=fourier --m3dpreprocess=normalize.edgemean --m3dpostprocess=filter.lowpass.gauss:cutoff_freq=.125 --pad=256 --classkeepsig --m3dkeepsig --m3dsetsf
- Here are some common sets of the most important options:
This set is very fast for initial refinement to get the quaternary structure correct. : --classiter=4 --simcmp=ccc --simalign=rotate_translate_flip --simaligncmp=ccc --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=ccc However, in some cases using 'ccc' can lead to deterministic errors in particle orientation. If this does not seem to converge to a reasonable answer, you may consider one of the 'fsc' based option sets.
This set is still fairly fast, and is good for improving resolution once the correct structure has been achieved: --classiter=1 --simcmp=frc:zeromask=1:snrweight=1 --simalign=rotate_translate_flip --simaligncmp=ccc --simralign=refine --simraligncmp=ccc --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=ccc --classralign=refine --classraligncmp=ccc
This set would be used for final refinement targeting high resolution : --classiter=1 --simcmp=frc:zeromask=1:snrweight=1 --simalign=rotate_translate_flip --simaligncmp=frc:zeromask=1:snrweight=1 --simralign=refine --simraligncmp=frc:zeromask=1:snrweight=1 --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=frc:zeromask=1:snrweight=1 --classralign=refine --classraligncmp=frc:zeromask=1:snrweight=1
The convergence FSC plots are stored in a database in the refine_xx directory with the other output files (documented here). The workflow provides a mechanism for plotting these directly. To extract them as text files you can e2bdb.py --extractplots bdb:refine_01#convergence.results. You can plot them with e2display.py --plot or any other plotting program you like.
If you are interested in seeing how the convergence data can be accessed from python, there is also a script in the examples directory called extractfsc.py.
e2eotest.py will perform a standard 'split the data and compare' approach like eotest in EMAN1. Note that unlike EMAN1, e2eotest.py DOES work properly when the --sep option is used for refinement.
e2eotest.py can be run with exactly the same options as e2refine.py. It will simply ignore options it doesn't use.
- The FSC curve for the eotest can be extracted as described above for convergence plots.