= Transitioning from EMAN1 to EMAN2 with command line tools = While the workflow is the correct choice for many users, we realize that some prefer to manipulate their data more directly. Many users who wish to use the command line are doing so because they are trying to update existing scripts currently using EMAN1. Note that this tutorial was written for EMAN2.03 or above. Please note that for EMAN2 it is important to use HDF or BDB: files for all EMAN2 processing, as other formats do not support a flexible enough metadata model (header information). Chimera is able to directly read EMAN2 HDF files. When you have a structure you need to get into a different format for use in another program, feel free to convert to whatever you like (EMAN2 still supports virtually all formats). Just be aware that some header info will be lost in the conversion. Please be sure to read this [[EMAN2/DatabaseWarning|Important Warning]]. === Introduction to the EMAN2 Commmand-Line === The first thing to note is that EMAN2 uses standard unix-style command-line arguments rather than EMAN1 style arguments. For example: ''proc2d abc.hed def.spi clip=64,64'' --> ''e2proc2d.py abc.hed def.spi --clip=64,64'' === Basic Image Processing === In direct analog to EMAN1, EMAN2 has ''e2iminfo.py'', ''e2proc2d.py'', ''e2proc3d.py''. As with EMAN1, both of the ''proc'' commands can be used to convert among any of EMAN2's supported [[Eman2DataStorage|file formats]], simply by using the correct extension for the output file. There is also an ''--outtype'' option to manually specify the output format. It is important to understand how both programs deal with image stacks (a single file with multiple images). In EMAN1, only 2-D images could exist in stacks, simply because there were no 3-D file formats that supported multiple 3-D volumes at the time. That is no longer true, and HDF and BDB both support stacks of volumes. However, with 2-D images, stacks are used extensively, and with 3-D, only rarely. As a result ''e2proc2d.py'' and ''e2proc3d.py'' handle stacks differently by default. ''e2proc2d.py'' will append images to the output stack unless ''--inplace'' is specified. ''e2proc3d.py'' on the other hand will overwrite images by default, unless the ''--append'' option is specified. Finally, ''e2proc2d.py'' is capable of treating a single 3-D image as a stack of 2-D images in various ways. so: * ''e2proc2d.py file2d.hdf file2d.hdf --mult=-1'' - would make an inverted copy of all of the images in file.hdf and append them to the end, doubling the size of the stack * ''e2proc3d.py file3d.hdf file3d.hdf --mult=-1'' - would invert all of the 3-D volumes in file3d.hdf in-place. The file would be the same size when complete Beyond this, a number of options have changed. In both e2proc2d.py and e2proc3d.py, it is now possible to construct filter chains; that is, a sequence of filters or other ''processors'' to apply to the image(s). This is done via the ''--process'' directive in either program. There are currently over 175 different ''processors'' in 10 different categories, such as filters, masks, thresholds, mathematical, etc. Any of these ''processors'' can be applied in sequence using the directive ''--process=procname:option=value:option=value''. You can get a list of available processors using ''e2help.py processor'' or ''e2help.py processor -v 2'' for more detail. For example: * ''e2proc3d.py file3d.hdf file3d_filt.hdf --process=filter.lowpass.gauss:cutoff_freq=0.1 --process=mask.sharp:outer_radius=64'' If you prefer to adjust filter parameters in a GUI, the new ''e2filtertool.py'' program allows you to graphically construct and edit filter chains. For comparative tasks, such as computing a FSC between two maps, note that the option order has changed, so the second parameter is an output not an input: * ''e2proc3d.py file3d.hdf fsc.txt --calcfsc=file3d_2.hdf'' Note that like EMAN1, the ''--scale=scale_factor --clip=x[,y,z]'' operation performs sampling or interpolation, and does not do any averaging. If you wish to use scale factors <1.0, you may wish to consider the ''--meanshrink='' or ''--medianshrink='' options. === CTF determination/correction === All CTF processing is handled by the ''e2ctf.py'' program. To display the graphical interface, you must specify the ''--gui'' option, but this requires first having run automatic parameter determination. For example, if you have particle data in the ''ptcl'' directory, with particles from each frame in a separate HDF file: * '''Important note:''' EMAN2's CTF correction requires a larger box size than EMAN1, both to improve correction accuracy and permit better background assessment. In EMAN2, the box size should be 1.5x - 2x the maximum dimension of a particle projection. See [[EMAN2/BoxSize]] for more. * Make sure your input particles have been edgenormalized, and have the correct contrast (bright on a darker background). If not, correct all particle stacks with ''e2proc2dmulti.py'' with the ''--process=normalize.edgemean'' or ''--mult=-1'' options. * ''e2ctf.py --cs=4.1 --ac=15 --voltage=300 --apix=2.12 --oversamp=2 --autofit --gui ptcl/*hdf'' * You may wish to manually check the CTF results quickly. * If you find a defocus is significantly off, adjust it manually to near the correct value and press 'refit'. * Be sure to press * Compute a structure factor (you may opt to use a subset of the data instead): ''e2ctf.py --oversamp=2 --computesf ptcl/*hdf'' * rerun : ''e2ctf.py --cs=4.1 --ac=15 --voltage=300 --apix=2.12 --oversamp=2 --autofit --gui ptcl/*hdf'' * optionally: ''e2ctf.py --refinebysnr ptcl/*hdf'' * generate phase-flipped and other files (using oversamp=1 is important here): ''e2ctf.py --oversamp=1 --phaseflip --phasefliphp --wiener --storeparm ptcl/*hdf'' === Reference-free Class-averages === Note that like EMAN1, these are used for initial evaluation of the data and initial model generation only. They are not used for final 3-D refinements. * Get a list of the available wiener filtered stacks in bdb notation: ''e2bdb.py -s particles --filt=wiener'' * Make a virtual-stack (like an EMAN1 .lst file) containing the particles you wish to use (don't have to be Wiener filtered) : ''e2bdb.py --makevstack=bdb:sets#stack_for_2d bdb:particles#dh3962_ctf_wiener bdb:particles#dh3965_ctf_wiener bdb:particles#dh3986_ctf_wiener bdb:particles#dh3997_ctf_wiener bdb:particles#dh4017_ctf_wiener'' * Run 2-D refinement: ''e2refine2d.py --input=bdb:sets#stack_for_2d --iter=6 --ncls=24 --naliref=6 --nbasisfp=6 --parallel=thread:4'' * if generating a lot of classes (more than 100), use the ''--fastseed'' option * ''--parallel='' option is common to many programs. Not all operations will run in parallel. See: [[EMAN2/Parallel]] * You may wish to consider shrinking the particle data (''e2proc2d.py in out --meanshrink=2'') before class-averaging for better speed. * Many other options. This is just a representative example. * Results will be in r2d_xx in a variety of database files. For the example above, final class-averages will be in ''bdb:r2d_01#classes_05'' * Use ''e2display.py'' to look at the results. === Initial Model Generation === * If you don't already have an appropriate initial model EMAN2 can generate 1 (or more) for you. * First, select a subset of the class averages. The selected averages should all be 'good' high contrast averages, and should represent as diverse a range of orientations as possible. * There are various ways of extracting good particles. The approach I use is to display the set of class-averages using e2display.py, middle-clicking for a control-panel, then using ''Del'' mode to delete the averages I don't want. When I'm done, I ''Save'' the results to a new file. * Run the initial model generator: ''e2initialmodel.py --input=good_classes.hdf --iter=8 --sym=c6 --tries=5'' * Clearly you at least need to change the symmetry to match your structure. * The initial model generator works using the recommended approach in EMAN1. It starts with a randomized blobby model, then runs a very rapid sequence of ''--iter='' iterations of standard refinement. It does this ''--tries='' times, producing ''--tries='' possible initial models. * When this is complete, you will find several files in the initial_models directory. There are 4 types of files: * model_xx_yy : This is the refined model you would use as a starting model * model_xx_yy_aptcl : Contains the class-averages alternating with the corresponding projections of the model after the final iteration. Poor agreement between pairs is an indication of a bad initial model. * model_xx_yy_proj : Projections from the final round of refinement covering the asymmetric triangle. * model_xx_yy_init : The initial model for this refinement. Just in case you want to see it. * xx is the run number. If you run the initial model generator a second time, this number will be incremented. * yy is the model number. In theory these are sorted in order of quality at the end of the run, so 01 will be the best. However, this isn't very reliable. You should check all of the models, and all of the _aptcl files to find the best one. * If you didn't get a good model, run it again, until you're happy. Note that these models are not going to be perfect. They just need to be vaguely the correct shape, so when you run a 'real' refinement it will converge to the right thing. If you have no success with this approach, random conical tilt and single particle tomography are both viable alternatives, though both will require additional data collection. === Refinement === * The main refinement command in EMAN2 is ''e2refine.py''. It has many more options than EMAN1, which gives you more precise control over how the refinement runs internally. This flexibility is not required in most cases, but is available if you need it. The graphical Workflow interface tries to explain what these options mean and how to use them. You can find some details here: [[EMAN2/Programs/e2refine]]. * Here is a typical command: ''e2refine.py --input=bdb:sets#set-all-filt1_phase_flipped-hp --parallel=thread:10 --mass=800.0 --apix=2.12 --automask3d=0.7,21,8,8,21 --iter=5 --sym=d7 --model=bdb:/refine3/refine3/eman1v2/eman2/refine_01#threed_filt_02 --path=refine_02 --orientgen=eman:delta=2.0:inc_mirror=0 --projector=standard --simcmp=frc:zeromask=1:snrweight=1 --simalign=rotate_translate_flip --simaligncmp=frc:zeromask=1:snrweight=1 --simralign=refine --simraligncmp=frc:zeromask=1:snrweight=1 --twostage=2 --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=frc:zeromask=1:snrweight=1 --classralign=refine --classraligncmp=frc:zeromask=1:snrweight=1 --classiter=1 --classkeep=0.8 --classnormproc=normalize.edgemean --classaverager=ctf.auto --sep=5 --m3diter=2 --m3dkeep=0.8 --recon=fourier --m3dpreprocess=normalize.edgemean --m3dpostprocess=filter.lowpass.gauss:cutoff_freq=.125 --pad=256 --classkeepsig --m3dkeepsig --m3dsetsf'' * Here are some common sets of the most important options: * This set is very fast for initial refinement to get the quaternary structure correct. : ''--classiter=4 --simcmp=ccc --simalign=rotate_translate_flip --simaligncmp=ccc --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=ccc'' However, in some cases using 'ccc' can lead to deterministic errors in particle orientation. If this does not seem to converge to a reasonable answer, you may consider one of the 'fsc' based option sets. * This set is still fairly fast, and is good for improving resolution once the correct structure has been achieved: ''--classiter=1 --simcmp=frc:zeromask=1:snrweight=1 --simalign=rotate_translate_flip --simaligncmp=ccc --simralign=refine --simraligncmp=ccc --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=ccc --classralign=refine --classraligncmp=ccc'' * This set would be used for final refinement targeting high resolution : ''--classiter=1 --simcmp=frc:zeromask=1:snrweight=1 --simalign=rotate_translate_flip --simaligncmp=frc:zeromask=1:snrweight=1 --simralign=refine --simraligncmp=frc:zeromask=1:snrweight=1 --classcmp=frc:zeromask=1:snrweight=1 --classalign=rotate_translate_flip --classaligncmp=frc:zeromask=1:snrweight=1 --classralign=refine --classraligncmp=frc:zeromask=1:snrweight=1'' * The convergence FSC plots are stored in a database in the refine_xx directory with the other output files (documented [[EMAN2/Concepts|here]]). The workflow provides a mechanism for plotting these directly. To extract them as text files you can ''e2bdb.py --extractplots bdb:refine_01#convergence.results''. You can plot them with ''e2display.py --plot'' or any other plotting program you like. * If you are interested in seeing how the convergence data can be accessed from python, there is also a script in the examples directory called ''extractfsc.py''. === Resolution Testing === * ''e2eotest.py'' will perform a standard 'split the data and compare' approach like ''eotest'' in EMAN1. Note that unlike EMAN1, ''e2eotest.py'' DOES work properly when the ''--sep'' option is used for refinement. * ''e2eotest.py'' can be run with exactly the same options as ''e2refine.py''. It will simply ignore options it doesn't use. * The FSC curve for the eotest can be extracted as described above for convergence plots.