Porting an EMAN1 refinement project to EMAN2.1

Note : This is a simplified version of the tutorial, and assumes you are familiar with EMAN1. The full tutorials provide a much better way of learning EMAN2. This page will get you started if you just want to rapidly switch a project from EMAN1. Just remember if you find yourself asking 'but what does THAT mean' when you read this, you're reading the wrong page :^)

A Quick (but worse) Alternative

Following the guide below will effectively start your project from the level of boxed out particles in EMAN2. There are a variety of advantages in doing things this way, as a number of things have improved in EMAN2 vs EMAN1. However, if your goal is just to get started quickly, even if the results are good, but perhaps not optimal, there is an alternative:

This will create a new EMAN2 project from start.hed/img. It will:

That's it, you're ready to start a refinement. Alternatively, the more thorough approach:

A Quickstart Guide

This quickstart makes use of the workflow, as the goal here is presumably to get the best results in EMAN2, and that is most easily accomplished in the workflow. It will take you all the way through a canonical reconstruction starting with data you've already processed in EMAN1 (or any other image processing package for that matter). This is written targeting primarily Linux/Mac users, but hopefully windows users will be able to follow along.

This may appear daunting, but there is a bit more detail here than in a typical 'quickstart' guide. It isn't nearly as complicated as its length would indicate.

Please make sure you have read this Important Warning.

For purposes of this tutorial will call the directory where your eman1 data resides 'eman1'. You should also make another directory where your EMAN2 project will reside. We will call this 'eman2', but you can use any name you like.

Outline

Since this rather long for a quickstart guide, we'll start out with a short outline of the steps:

Initial Setup

  1. It is very important that you cd eman2 (again, 'eman2' is the name of your project directory) before starting the EMAN2 GUI. Whatever directory you are in when you run e2workflow.py will become your project directory.

  2. Run e2projectmanager.py. A window containing an expandable list of tasks to complete and a large empty area will appear.

  3. The basic project parameters are accessed via a menu item. On the Mac this will be on the menu-bar at the top of the screen. On Linux/Win it will appear at the top of the projectmanager window. Use Project->Edit Project, and enter your basic project parameters.

Importing Particle Data

  1. There are 3 possibilities for getting your data into EMAN2. You MUST reprocess the CTF in EMAN2, rather than use the already processed data from EMAN1 (unless you don't plan on doing any CTF correction), but on the bright side, this process is much easier in EMAN2. So, the data we will import is the un phase-flipped data. It must also be completely unfiltered. Just the original, raw data. Pick only one of the following 3 sections in bold:
    • If you have the original micrographs/CCD frames, and EMAN1 style .box files :

      1. There are two tools under Raw Data' you can use to import the micrographs into the project:

        • Import Micrographs will just copy your images into a Micrographs directory and make sure they are in HDF format.

        • Evaluate & Import Micrographs will allow you to visually assess each image and its power spectrum in various ways, then press the Import button only for those you decide have sufficient quality. This tool can also simultaneously provide an estimate of the defocus, which will largely eliminate fitting failures during later CTF processing.

      2. There are a few options you can select as well.
        • It is important to get Inversion right. The final particles MUST appear white on a darker background, so whether you check this box depends on whether you are using stain or cryo, and how the data was digitized.

        • Filter X-ray Pixels is important for CCD data, but should not be used if you are using a phase-plate or film data.

      3. Under Particles, select Box Coordinate Import. Assuming your particle and box filenames match, everything should work well.

      4. You could actually go to the interactive particle picker and check/update the boxes at this point if you like, but we will just assume they are correct, and use Generate Output under Particles. Again, there are some options:

        • Box size is the most important one. In EMAN2, it is VERY important, if you want to obtain optimal results, to use a box size that is 1.5-2x the size of the maximum dimension of your particle. If you are dealing with a huge virus particle, you may need to skimp a bit on this, to avoid having a, say, 1500 box size, but some padding is essential for CTF correction to work well, and to avoid artifacts. See this list of good sizes to use. If you decide to change the box size later, it generally requires starting a new project from scratch, so consider this number very carefully.

        • You should select write_ptcls, normalize.edgemean and hdf.

        • Only select invert here if you made a mistake at the earlier step. Remember, particles MUST be light on a darker background.

    • If you only have or want to use particle stacks sorted by micrograph:

      1. Double check to make sure your particle stacks have light particles on a darker background. If they have inverted contrast, you will need to invert them before proceeding. You can do this with e2proc2d.py <file> <file> --mult=-1 --inplace or any other method you like. Having this correct is CRITICAL to getting good results in EMAN2.

      2. Use Particles->Particle Import.

    • If you only have a start.hed file containing phase-flipped particles from EMAN1 :

      • This is not an optimal situation, but it is still possible (assuming you have EMAN1 installed).
      • If the start.hed file contains particles which have been phase-flipped. You must first use EMAN1's 'applyctf' program to invert the phases a second time so the images are un-flipped again.
      • In EMAN1, use proc2d with the ctfsplit option to turn your start.hed file into a set of individual particle stacks (for each micrograph)
      • Now that you have particle stacks for individual images, follow the instructions above to import the stacks.

CTF Fitting/Correction (even for stain and phase plate)

  1. Now that your particles have been imported into the EMAN2 project, you can begin CTF correction. It is important to note at this point that CTF correction in EMAN2 entails a lot more than just CTF correction. It also makes an SSNR evaluation for each micrograph, which is used to perform weighting and other tasks. So, even if you do not feel you need to do CTF correction (neg stain or phase plates for example), we strongly encourage you to go through this process anyway.
  2. Use CTF->Automated Fitting. Since you want to do initial fitting for all images, you have two approaches. You can use Browse and select all of the image files manually, or instead, you can leave the 'particles' line blank and check the 'allparticles' checkbox. minptcl and minqual allow you to filter out certain image sets. Oversamp should be set to 2. Make sure the other values are correct. Normally you will want autohp, and curdefocushint checked. autofit must be selected. Fitting should only take 1-2 seconds per image.

  3. When it's done, select Interactive Tuning. At this point you do not need to check all of the images, but just a few you plan to use for structure factor determination. Click on the Num Particles column to sort in decreasing order. Then select the first 5-10 images in the list.

  4. Four windows will appear: a control panel, a 2-D power spectrum view, a particle view, and a 2-D plot. In this case all we need to do is make sure the defocus is roughly correct. This is the most common error in automatic CTF fitting. Either it will be completely correct, or the defocus will be way off. If you find an image with the wrong defocus, adjust it manually to roughly the correct position, and hit the refit button. If it is incorrect again after refit, then manually adjust to the correct value and Save Params.

  5. NOTE: If you are working with phase-plate data, hopefully your images will all have defocus of 0. The autofitting program is incapable of determining defocuses where fewer than 2 zeros appear in the power spectrum. So, for phase plate images, you will need to enter a defocus manually for all images.
  6. Next, we need to generate a structure factor (while you could use your own structure factor curve by replacing strucfac.txt with your own file, normally making one will give better results). When you open the Generate Structure Factor tool, it should already have the same micrographs you used in the previous step selected, so you only need to reselect if some of those images were bad. Generating a structure factor will usually take only a few seconds.

  7. The generated structure factor goes into a file called strucfac.txt. You can plot this file vs any other structure factor to see how it compares if you wish. Unlike EMAN2.0x, in EMAN2.1 this text file is used for all later CTF correction situations requiring a structure factor.

  8. Repeat automatic CTF fitting for all images as you did it earlier. Usually having a structure factor will improve fitting accuracy, however, there is generally no point in iterating the process (by making the structure factor again).
  9. You should really run Interactive Tuning again. You are welcome to check all of the images, or just a few of them. If you check a few, and find that the fitting is poor, you should probably check the rest as well. If this happens, please email sludtke@bcm.edu, as such failures are valuable in improving the software. You may optionally go through the data and adjust the Quality slider to indicate relative quality of each image. This is an entirely user controlled parameter, and is not actually required directly by any of the programs. Note that you do NOT need to press the Save Parm button after adjusting the quality slider, but do if you adjust anything else.

  10. Once you are happy with all of the CTF parameters, use CTF->Generate Output. Usually selecting refinebysnr will slightly improve the defocus values. You will usually want phaseflip and wiener checked. phasefliphp' can be useful if you have a lot of ice gradients or if working in negative stain, but in some cases can have very negative effects. You'll have to play with it.

Particle Sets

  1. When performing a single particle reconstruction, often it is useful to work with only part of your data for specific steps. For example, when generating an initial model, the process goes much faster if you only use the best fraction of your data, and generally ~1000 particles or so is sufficient for this process (fewer if the contrast is high). When performing a 3-D reconstruction, you may wish to assess the quality of the model if only the best fraction of the data is used, vs the entire data set. The Sets interface allows you to create named subsets of your data for the later stages of processing. You must make at least 1 set to proceed though the Workflow.

  2. Select Particle Sets->Build Particle Sets. Select the images you want to use in the set, and give your set a name. Filetype should be lst. If you check excludebad it will exclude particles which have been marked as bad using one of several different mechanisms. (In EMAN2.1alpha1 it is tricky to do manual bad particle selection, a new tool for this will appear later...)

Reference Free Class-averages

  1. If you have a good 3-D starting model already, and wish to skip directy to 3-D refinement, you can skip this and move down to Running a 3-D Refinement.

  2. It is often useful to make some reference-free 2-D class-averages to assess your particle's structural variability, and to compare to projections of the 3-D map later. They are also useful to build an initial model. This is the equivalent of the refine2d.py command in EMAN1.

  3. Select Reference Free Class Averages->Generate Classes.

    • The input file should be a set and usually will be the ctf_flip or ctf_flip_hp file.

    • Generally the default options are fine for a first try, though if you have multiple CPUs/cores available on your computer, you may wish to put thread:n where n is the number of CPUs in the parallel box. You could even run on a cluster with MPI or other parallelism scheme, if you like, though this job won't take all that long to run in most cases. After a trial run, you may play with the options a bit.

  4. When the job has finished, you will find the results in a directory called r2d_xx. Each time you run 2d refinement, it will increment xx. There are many files in this directory, but the main file to look at is either classes_xx or allrefs_xx. You can use the file browser to look at the results from the final iteration.

  5. The images in 'classes_xx' are in arbitrary order. 'allrefs_xx' contains the particles sorted in order of mutual similarity and aligned to each other in 2-D. If you wish to see which particles went into a specific class-average, you can simply double-click on that particle, and an additional window will open showing the contributing particles. It will also show particles which were classified in that class, but were excluded from the final average. These will be marked with a blue square in the corner. Note that these particles are not excluded from the set or from any subsequent process, they were simply excluded from that specific average.
  6. You may also wish to look at 'basis_xx', which contains the MSA basis vectors, and can tell you something about the symmetry and shape of your particles, if you understand how MSA works.
  7. If you observe some class-averages which clearly contain entirely bad particles, you can use this as a mechanism for permanently excluding those particles from future sets you create. To do this, you use the 'Evaluate Particle Sets' interface in the workflow. You can also use this interface to select particles from specific class-averages for further rounds of 2-D or 3-D refinement. It's detailed use is beyond the scope of this quickstart guide, but the corresponding command-line program is e2evalparticles.py. (note that this is broken in EMAN2.1alpha1)

Making an Initial 3-D Model

  1. EMAN2 uses an approach for initial particle generation that was always suggested in EMAN1, but had to be performed manually. This process is fully automated in EMAN2. It does not work on 100% of projects, but I would estimate it works ~90% of the time. If you have a project which you cannot get working this way, or do not trust your results, I suggest using Random Conical Tilt, or Single Particle Tomography to get an initial model. You can also try tilt validation after your refinement to confirm that your structure is accurate. All of these methods are available in EMAN2, including corresponding GUI tools.
  2. First you need to select a good subset of your class-averages to use in initial model generation. To do this, open the classes_xx file in the browser, then middle-click on the image display to bring up a control-panel (alt-click without a 3-button mouse). Select Del mode in the control panel, then click on images you do not wish to use in initial model generation to delete them. You will want to use ~10-20 averages for your initial model. Try to pick averages representing diverse views of your particle, but do not include anything that is questionable or bad. When you have the desired averages, use the Save button to put them into a file: good-classes.hdf.

  3. Select Initial Model->Make Model and select the good-classes.hdf file you just created.

  4. If you don't know the symmetry for sure. I suggest starting with the highest symmetry you suspect is possible. Do not rely on it to identify the symmetry for you.
  5. This process basically starts with a completely random pattern of blobs, and does a full single particle refinement, using the class-averages as particles. It does this for N different starting models, and iterates each M times. Some structures converge faster than others, M=8 is a reasonable value for most structures, but you may find yourself needing to try twice that many for difficult cases. N doesn't really matter too much, as if you don't get a good model on the first pass, you can always try N more. I suggest starting with Iterations=8 and Tries=5. You may also want to fill in the parallel box (as above) for speed.

  6. When this is complete, you will find several files in the initial_models directory. There are 4 types of files:
    • model_xx_yy : This is the refined model you would use as a starting model
    • model_xx_yy_aptcl : Contains the class-averages alternating with the corresponding projections of the model after the final iteration. Poor agreement between pairs is an indication of a bad initial model.
    • model_xx_yy_proj : Projections from the final round of refinement covering the asymmetric triangle.
    • model_xx_yy_init : The initial model for this refinement. Just in case you want to see it.
    • xx is the run number. If you run the initial model generator a second time, this number will be incremented.
    • yy is the model number. In theory these are sorted in order of quality at the end of the run, so 01 will be the best. However, this isn't always reliable. You should checkat least the first few models, and the _aptcl files to find the best one.
  7. If you didn't get a good model, run it again, until you're happy. Note that these models are not going to be perfect. They just need to be vaguely the correct shape, so when you run a 'real' refinement it will converge to the right thing. If you have no success with this approach, random conical tilt and single particle tomography are both viable alternatives, though both will require additional data collection. In most cases you really don't have to be very close. A ribosome, for example, will eventually refine from just about any sort of blobby starting model you like. While in some cases a refinement CAN get 'stuck' with a really bad starting model, this isn't very common with even a halfway decent starting model.

Running a 3-D Refinement

  1. At last, to the meat of the matter. In EMAN2.1 we have completely revamped the way 3-D refinements are run. Almost all of the options are now selected automatically, there are just a few basics you need to provide e2refine_easy:
    • the input particle stack and starting model (or if you want to continue a refinement you've started already, alternatively you can give the name of the refine_xx directory instead.
    • target resolution. Generally good to do a few iterations at 15 - 20 A before targeting subnanometer resolution, particularly if your starting model is bad.
    • Symmetry - as above
    • Number of iterations - Usually 3-5 is a good range to consider
    • Mass in kDa. Note, however, that the enclosed mass for a structure is highly resolution dependent, so the value you specify here may be as much as 2x different than the actual mass. This should not normally be a critical value in getting good refinements, it is just to help provide self-consistent threshold values.
    • The only optional option you will normally need to specify is the parallel box. (for refinement, you may really want to consider running on a cluster with MPI, unless your project is small or has high symmetry)

    • EMAN2.0x had MANY more parameters than this, and e2refine_easy will still accept most of these as options from the command-line (but not the GUI). Normally these options will be selected for you, and you don't need to worry about them.
  2. The refinement will produce a refine_xx folder, with many files (documented elsewhere in the Wiki). The main files you will want to look at are:

    • report/index.html (the info button in the browser can view this, or you can open it in firefox)

    • threed_yy.hdf The highest numbered one will be the final result of this refinement

    • fsc_masked_yy.txt A true 'gold standard' FSC resolution plot for iteration yy. There is also an unmasked version. Note that this is NOT comparable to the 'convergence plot' in EMAN1. This is an actual FSC curve which can be used to measure resolution. There is also an unmasked version of each. You can double-click on these in the browser to plot them.

    • classes_yy.hdf If results aren't what you hoped, you may want to look at the class-averages to see if you can detect any problems.

This is the end of this quickstart guide. There is a lot of documentation around for the details not covered here, but this should get you started. Good luck !