Diff for "EMAN2/Eman1Transition/QuickStart"

Differences between revisions 14 and 15

Porting an EMAN1 refinement project to EMAN2

Note : This page is for those who hate to read. The tutorials provide a much better way of learning EMAN2. This will get you started if you just want to rapidly switch a project from EMAN1, though. Just remember if you find yourself asking 'but what does THAT mean' when you read this, you're reading the wrong page :^)

A Quickstart Guide

This quickstart makes use of the workflow, as the goal here is presumably to get the best results in EMAN2, and that is most easily accomplished in the workflow. It will take you all the way through a canonical reconstruction starting with data you've already processed in EMAN1 (or any other image processing package for that matter). This is written targeting primarily Linux/Mac users, but hopefully windows users will be able to follow along.

For purposes of this tutorial will call the directory where your eman1 data resides 'eman1'. You should also make another directory where your EMAN2 project will reside. We will call this 'eman2', but you can use any name you like.

Initial Setup

It is very important that you cd eman2 before starting the EMAN2 GUI. Whatever directory you are in when you run e2workflow.py will become your project directory.
Run e2workflow.py. Two GUI windows should appear. One will show the status of running background jobs. The other will contain a collapsible tree of workflow items.
In the workflow window, click on the Single Particle Reconstruction item.
- Note that you are not expanding the list here, but clicking on the actual words.
- This should pop up a window containing project parameters.
- Fill in the first 4 parameters. Don't worry about the last 2.
- Close the window.
Now, expand the Single Particle Reconstruction item so you can see the detailed workflow under it.

Importing Particle Data

There are now 3 possibilities, for getting your data into EMAN2. You MUST reprocess the CTF in EMAN2, rather than use the already processed data from EMAN1 (unless you don't plan on doing any CTF correction), but on the bright side, this process is MUCH easier in EMAN2. So, the data we will import is the un phase-flipped data. It must also be completely unfiltered. Just the original, raw data. Pick only one of the following 3 sections in bold:
- preferred option -> If you have the original micrographs/CCD frames, and EMAN1 style .box files :
  1. Select Filter Raw Data
  2. Press the Browse to Add button
  3. Browse to, and select your micrograph/CCD files, and click ok.
  4. Select which options you need to use, then press OK
    - It is important to get Inversion right. The final particles MUST appear bold white on a darker background, so whether you check this box depends on whether you are using stain or in cryo, and how the data was digitized. You can double-click on one of the images in the list to display it an see if you need inversion.
    - Filter X-ray Pixels is important for CCD data, but not if you are using a phase-plate.
    - edgenorm, generate thumbnail, and associate with project should all be checked
    - in-place processing should normally be unchecked.
  5. Under Particles, select Coordinate Import and Browse to Add again. Browse to and select your .box files, then press OK in both windows. Assuming your particle and box filenames match, everything should work well.
  6. You could actually go to the interactive particle picker and check/update the boxes at this point if you like, but we will just assume they are correct, and use Generate Output under Particles. Again, there are some options:
    - Box size is the most important one. In EMAN2, it is VERY important, if you want to obtain optimal results, to use a box size that is 1.5-2x the size of the maximum dimension of your particle. If you are dealing with a huge virus particle, you may need to skimp a bit on this, to avoid having a, say, 1500 box size, but some padding is essential for CTF correction to work well, and to avoid artifacts. See this list of good sizes to use. If you decide to change the box size later, it generally requires starting a new project from scratch, so consider this point very carefully.
    - You should select Write box image files, normalize.edgemean and bdb.
    - Only select invert here if you made a mistake at the earlier step, and your micrographs are still dark particles on a light background.
- If you don't have those, but do have individual particle stack files without CTF correction for each frame :
  1. Double check to make sure your particle stacks have light particles on a darker background. If they have inverted contrast, you will need to invert them before proceeding. You can do this with e2proc2d.py <file> <file> --mult=-1 --inplace or any other method you like. Having this correct is CRITICAL to getting good results in EMAN2. Some of the algorithms require this.
  2. Select Particle Import under Particles, Browse to Add, and select all of your particle files. Press OK on both windows.
- If you only have a start.hed file containing phase-flipped particles from EMAN1 :
  - Sorry, this section not complete yet

CTF Fitting/Correction (even for stain and phase plate)

Now that your particles have been imported into the EMAN2 project, you can begin with CTF correction. It is important to note at this point that CTF correction in EMAN2 entails a lot more than just CTF correction. It also makes an SSNR evaluation for each micrograph, which is used to perform weighting and other tasks. So, even if you do not feel you need to do CTF correction (neg stain or phase plates for example), we STRONGLY encourage you to go through this process anyway. You are always free to disable the CTF amplitude correction at a later stage, but you will still be able to take advantage of the other computed image properties.
Under CTF, select Automated Fitting. Select all of the image files (again, you can click in the upper left hand corner of the image list to select all quickly). Make sure the microscope parameters are correct. For Amplitude Contrast ~10-20 should be good for cryo, perhaps 50-70% for stain, and 100% for phase plate imaging. Set oversampling to 2 here for better defocus estimation (but make sure it's 1 when you generate output later). If you have a multi-core computer, enter the number of processors to use. Check Auto High Pass for normal cryo data, but not for negative stain or phase plate data. Note that autofitting typically requires a minimum of 20-30 particles per frame. If you have fewer particles than this, you may have to do a lot of manual defocus tuning later. When ready, press OK.
CTF fitting should not take long to complete. When it's done, select Interactive Tuning. At this point you do not need to check all of the images, but just a few you plan to use for structure factor determination. Click on the Particles on Disk column to sort in decreasing order. Then select the first 5-10 images in the list and click OK.
Three windows will appear: a control panel, a 2-D power spectrum view, and a 2-D plot. In this case all we need to do is make sure the defocus is roughly correct. This is the most common error in automatic CTF fitting. Either it will be completely correct, or the defocus will be way off. If you find an image with the wrong defocus, adjust it manually to roughly the correct position, and hit the refit button. If it is incorrect again after refit, then manually adjust to the correct value and Save Params. When you're done for all images you selected, close all 3 windows.
NOTE: If you are working with phase-plate data, hopefully your images will all have defocus of 0. The autofitting program is incapable of determining defocuses where fewer than 2 zeros appear in the power spectrum. So, for phase plate images, you will need to enter a defocus manually for all images.
Next, we need to generate a structure factor (you cannot presently use a structure factor from EMAN1). Again, sort the list of images by number of particles, and select the same images you manually checked in the previous step. Oversampling should be 2. Then press ok. The process should take only a few seconds. You can see the output on the console where you launched the workflow.
The generated structure factor goes into the project database, and a copy is written to a text file 'strucfac.txt'. You can plot this file vs any other structure factor to see how it compares if you wish, but it is not used for anything. The internal copy in the database is used for fitting and correction.
Repeat automatic CTF fitting for all images as you did it earlier. Now that you have a structure factor, the fitting should be more accurate.
Next is an optional automatic step to fine-tune automatic defocus determination and smooth the SSNR measurement. It is primarily aimed at high resolution reconstructions of cryo data (no stain or phase plate). This step does not currently exist in the Workflow interface. To use it, exit the workflow, and from the same directory where you run e2workflow, run the following command: e2ctf.py --allparticles --refinebysnr. When done (it is very fast) it will tell you the mean defocus shift it found.
Also run Interactive Tuning again. You are welcome to check all of the images, or just a few of them. If you check a few, and find that the fitting is poor, you should probably check the rest as well. If this happens, please email sludtke@bcm.edu, as such failures are valuable in improving the software. You may optionally go through the data and adjust the Quality slider to indicate relative quality of each image. This is an entirely user controlled parameter, and is not actually used directly by any of the programs. Note that you do NOT need to press the Save Parm button after adjusting the quality slider, as unlike the other parameters, it is autosaved.
Once you are happy with all of the CTF parameters, select Generate Output (under CTF). Make SURE that oversampling is set to 1 in this window. I suggest selecting all 3 checkboxes below. Click OK. This process can take a bit of time, and is limited by the speed of your hard drive.

Particle Sets

When performing a single particle reconstruction, often it is useful to work with only part of your data for specific steps. For example, when generating an initial model, the process goes much faster if you only use the best fraction of your data, and generally ~1000 particles or so is sufficient for this process (fewer if the contrast is high). When performing a 3-D reconstruction, you may wish to assess the quality of the model if only the best fraction of the data is used, vs the entire data set. The Sets interface allows you to create named subsets of your data for the later stages of processing. You must make at least 1 set to proceed though the Workflow. This is roughly equivalent to making .lst files in EMAN1.
Select 'Build Particle Sets'. This will bring up a dialog where you are asked to select a type of particle : "Original Data", "Phase Flipped", "Wiener Filtered" or "Phase flipped-hp". This is NOT asking you to select which data will be included in the final set, but rather is asking which data you want to look at on the screen when you decide which particles to include. Normally Wiener Filtered is the best choice.
Next, you will see a table of information about all of your images. Again, you can sort this table by any of the columns. This window allows you to complete 2 tasks: defining a new set, and marking particles as 'bad' (and thus excluded from any sets you generate). This manual marking of bad particles is an optional step. If you double-click on any of the images in the list, it will bring up a tiled image display showing the particles from that micrograph. Clicking on any one of these particles will cause a small mark to appear in the corner, marking it as a 'bad' particle. Clicking again will remove the mark. If you wish to do this, you can go through the images one by one and click on any which are clearly not good particle images. There are also other processes further in the refinement procedure for automatically finding bad particles.
Once you are happy with the job you've done marking bad particles (or if you decide to skip this step for now), you need to decide which images to include in your first set. If you wish to generate a new starting model from scratch (rather than use one from EMAN1), you will want to make one set containing only the few best images from your data, totaling perhaps 1000 particles. If you are going to proceed directly to 3-D refinement, then you may want to include all of your data, or a large subset of the best data. There are a variety of ways to select items in the list including: clicking in the corner to select all, dragging with the mouse to select a range, and ctrl/opt-clicking to toggle individual micrographs on and off. Once you have selected the images you wish to include in the set, give your set a name and click 'OK'. Building sets can take a little while.
NOTE: sets take up very little disk space. They do not involve making a copy of the actual particle data, but only the header information. For the actual image data, the sets refer back to the original particle data files in the particles directory. So, you can make as many sets as you like.

Reference Free Class-averages

If you have a good 3-D starting model already, and wish to skip directy to 3-D refinement, you can skip this and move down to that section.
It is often useful to do some 2-D reference-free analysis of your particles, to: check for heterogeneity, identify bad subsets of particles and to make initial models. This is the equivalent of the refine2d.py command in EMAN1.
Select Generate Classes in the workflow. First you will need to select which type of data you want to make 2-D classes for. Using Wiener Filtered particles will produce much stronger looking class-averages, but they will also tend to have much less internal detail. Phase-flipped particles will produce sharper looking class-averages, but will clearly have more noise. You may need to experiment to decide which you prefer the most for any given task. When you click OK it will show a multi-tabbed dialog asking you for parameters to use in refinement.
In the first tab, General, you are asked to pick which set to use in class-averaging (select only 1). You also define the basic parameters for the refinement. Number of class-averages depends a bit on your specific purpose. If you are working to generate a 3-D starting model, a relatively small number (20-50) is best. If you are trying to evaluate heterogenity, you may wish to use a larger number (50-200). If you want to generate a really large number of classes (>~150), you may opt to run the e2refine2d.py command from the command-line so you can use the --fastseed option, which is not available in the Workflow at present. You should always have at least ~10 particles per class-average on average. The default options are probably fine for the other parameters on this tab. If you have a multi-core computer, enter thread:<n>, where <n> is the number of processors to use. If you wish to run on a cluster or use distributed processing, you will need to read this page.
In the second tab, simmx, you define the parameters related to particle alignment, prior to MSA-based classification. Like EMAN1, shrink allows you to scale the size of the particles down by a factor of N prior to analysis. 2 is typical. While you can opt to use very accurate alignment options on this page, for typical 2-D reference free analysis, this isn't really necessary, and will make the process much slower. You may just opt to use the default parameters. Otherwise, read the documentation at the top of this tab for information on setting the other parameters.
In the third tab, Class Averaging, you select the parameters used in the process of building class-averages after classification. The default options are probably fine, again, though if you are using phase flipped particles, you may wish to consider the ctf.auto or ctfw.auto averagers.
Once the parameters are set, hit 'OK', and the job will start.
When the job has finished, you will find the results in a directory called r2d_xx. Each time you run 2d refinement, it will increment xx. There are many files in this directory, but the main file to look at is classes_xx where xx is the highest number. Use the Browse workflow item or the e2display.py command to look at this file. You can double-click to open the default view, or right-click to get more options.
The images in 'classes_xx' are in arbitrary order. Alternatively, you could look at 'allrefs_xx' which contains the particles sorted in order of mutual similarity and aligned to each other in 2-D. If you wish to see which particles went into a specific class-average, you can simply double-click on that particle, and an additional window will open showing the contributing particles. It will also show particles which were classified in that class, but were excluded from the final average. These will be marked with a blue square in the corner. Note that these particles are not excluded from the set or from any subsequent process, they were simply excluded from that specific average.
You may also wish to look at 'basis_xx', which contains the MSA basis vectors, and can tell you something about the symmetry and shape of your particles, if you understand how MSA works.
If you observe some class-averages which clearly contain entirely bad particles, you can use this as a mechanism for permanently excluding those particles from future sets you create. To do this, you use the 'Evaluate Particle Sets' interface in the workflow. You can also use this interface to select particles from specific class-averages for further rounds of 2-D or 3-D refinement. It's detailed use is beyond the scope of this quickstart guide, but the corresponding command-line program is e2evalparticles.py.

Making an Initial 3-D Model

EMAN2 uses an approach for initial particle generation that was always suggested in EMAN1, but had to be performed manually. This process is fully automated in EMAN2. It does not work on 100% of projects, but I would estimate it works on ~80%. If you have a project which you cannot get working this way, or do not trust your results, I suggest using Random Conical Tilt, or Single Particle Tomography to get an initial model. Both methods are available in EMAN2, including corresponding GUI tools.
First you need to select a good subset of your class-averages to use in initial model generation. To do this, open the classes_xx file in the browser, then middle-click on the image display to bring up a control-panel (alt-click without a 3-button mouse). Select Del mode in the control panel, then click on images you do not wish to use in initial model generation to delete them. You will want to use ~10-20 averages for your initial model. Try to pick averages representing diverse views of your particle, but do not include anything that is questionable or bad. When you have the desired averages, use the Save button to put them into a file: good-classes.hdf.
Select Make Model from the Workflow. Press 'Browse to Add' and select the good-classes.hdf file you just created. Hilight this file once it's added.
If you don't know the symmetry for sure. I suggest starting with the highest symmetry you suspect is possible. Do not rely on it to identify the symmetry for you.
This process basically starts with a completely random pattern of blobs, and does a full single particle refinement, using the class-averages as particles. It does this for N different starting models, and iterates each M times. Some structures converge faster than others, M=8 is a reasonable value for most structures, but you may find yourself needing to try twice that many for difficult cases. N doesn't really matter too much, as if you don't get a good model on the first pass, you can always try N more. I suggest starting with Iterations=8 and Tries=5.
When this is complete, you will find several files in the initial_models directory. There are 4 types of files:
- model_xx_yy : This is the refined model you would use as a starting model
- model_xx_yy_aptcl : Contains the class-averages alternating with the corresponding projections of the model after the final iteration. Poor agreement between pairs is an indication of a bad initial model.
- model_xx_yy_proj : Projections from the final round of refinement covering the asymmetric triangle.
- model_xx_yy_init : The initial model for this refinement. Just in case you want to see it.
- xx is the run number. If you run the initial model generator a second time, this number will be incremented.
- yy is the model number. In theory these are sorted in order of quality at the end of the run, so 01 will be the best. However, this isn't very reliable. You should check all of the models, and all of the _aptcl files to find the best one.
If you didn't get a good model, run it again, until you're happy. Note that these models are not going to be perfect. They just need to be vaguely the correct shape, so when you run a 'real' refinement it will converge to the right thing. If you have no success with this approach, random conical tilt and single particle tomography are both viable alternatives, though both will require additional data collection.

Running a 3-D Refinement

Under Construction

EMAN2/Eman1Transition/QuickStart (last edited 2019-04-29 03:09:16 by SteveLudtke)

-  ⇤ ← Revision 14 as of 2011-07-03 19:10:02 → 
  Size: 20389
  Editor: SteveLudtke
  Comment:
+   ← Revision 15 as of 2011-07-04 03:30:05 → ⇥
  Size: 22214
  Editor: SteveLudtke
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 81:
-. This process basically starts with a completely random pattern of blobs, and does a full single particle refinement, using the class-averages as particles.
+. This process basically starts with a completely random pattern of blobs, and does a full single particle refinement, using the class-averages as particles. It does this for N different starting models, and iterates each M times. Some structures converge faster than others, M=8 is a reasonable value for most structures, but you may find yourself needing to try twice that many for difficult cases. N doesn't really matter too much, as if you don't get a good model on the first pass, you can always try N more. I suggest starting with Iterations=8 and Tries=5.
 1. When this is complete, you will find several files in the initial_models directory. There are 4 types of files:
  * model_xx_yy : This is the refined model you would use as a starting model
  * model_xx_yy_aptcl : Contains the class-averages alternating with the corresponding projections of the model after the final iteration. Poor agreement between pairs is an indication of a bad initial model.
  * model_xx_yy_proj : Projections from the final round of refinement covering the asymmetric triangle.
  * model_xx_yy_init : The initial model for this refinement. Just in case you want to see it.
  * xx is the run number. If you run the initial model generator a second time, this number will be incremented.
  * yy is the model number. In theory these are sorted in order of quality at the end of the run, so 01 will be the best. However, this isn't very reliable. You should check all of the models, and all of the _aptcl files to find the best one.
 1. If you didn't get a good model, run it again, until you're happy. Note that these models are not going to be perfect. They just need to be vaguely the correct shape, so when you run a 'real' refinement it will converge to the right thing. If you have no success with this approach, random conical tilt and single particle tomography are both viable alternatives, though both will require additional data collection.
=== Running a 3-D Refinement ===