Size: 5412
Comment:
|
Size: 7648
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 56: | Line 56: |
This program uses an iterative MSA-based reference-free classification algorithm. The names in parentheses below are the filenames produced by each step. The files will be found in bdb:r2d_XX (XX is incremented each time e2refine2d.py is run). A brief outline of the process follows : 1. Initialize the iterative process by making some initial guesses at class-averages. These are invariant-based, meaning even with MSA, this initial classification is not exceptionally good. a. Generate rotational/translational invariants for each particle (input_fp) a. Perform MSA on the invariants to define an orthogonal subspace representing the most important differences among the classes (input_fp_basis) a. Reproject the particles into the MSA subspace using ''--nbasis'' vectors (input_fp_basis_proj) a. Classify the particles into ''--ncls'' classes using K-means (classmx_00) a. Iterative class-averaging of the particles in each class to produce a set of initial averages (classes_init) 1. Align the current class-averages to each other, and sort them by similarity, keeping them centered (allrefs_YY) (Note that YY starts with 01 and is incremented after each iteration) 1. Perform MSA on the (aligned) class-averages. Again, this represents largest differences, but now performed on images, not invariants. (basis_YY) 1. Select a subset of ''--naliref'' averages to use as alignment references for this iteration (aliref_YY) 1. Align each particle to each of the reference averages from the last step. Keep the orientation corresponding to the best-matching reference. (simmx_YY) 1. Project aligned particles using reference MSA vectors from basis_YY ( aliref_x , x=01..10 each containing 5 images allref_x , x=01..10 each containing 99 images basis_x , x=01..10 each containing 6 images classes_x , x=01..10 each containing 99 images classes_ init , containing 99 images classmx_x , x=01..10 each containing 6 images input_fp , containing 2799 (input) images input_fp_basis , containing 6 images simmx_x , x=01..10 each containing 6 images input_fp_basis_proj input_x_proj , x=01..10 |
e2refine2d
e2refine2d.py runs in much the same way as EMAN1's refine2d.py, though it has been improved in a number of subtle ways
This program will take a set of boxed out particle images and perform iterative reference-free classification to produce a set of representative class-averages. The point of this process is to reduce noise levels, so the overall shape of the particle views present in the data can be better observed. Generally cryo-EM single particles are noisy enough that it is difficult to distinguish subtle, or even not-so-subtle differences between particle images. By aligning and averaging similar particles together, less noisy versions of representative views are created. The class-averages produced by this program are typically used for:
- Direct observation to look for heterogeneity or discover symmetry
- Building initial models for single particle reconstruction
- Separating particles into subgroups for additional analysis
This last point can be used to produce 'population-dynamics' movies of a particle in very close to the same orientation.
This program is quite fast for as many as a few thousand particles and ~100 classes. For most purposes if your data set is large (>10,000) particles you might consider using only a subset of the data for speed, though this clearly isn't appropriate for the 3rd use above. For large numbers of classes, specify the --fastseed option, or you will wait a very long time.
Options:
|
--path |
string |
Path for the refinement, default=auto |
|
--iter |
int |
The total number of refinement iterations to perform |
|
--automask |
bool |
This will perform a 2-D automask on class-averages to help with centering. May be useful for negative stain data particularly. |
|
--input |
string |
The name of the file containing the particle data |
|
--ncls |
int |
Number of classes to generate |
|
--maxshift |
int |
Maximum particle translation in x and y |
|
--naliref |
int |
Number of alignment references to when determining particle orientations |
|
--exclude |
string |
The named file should contain a set of integers, each representing an image from the input file to exclude. |
|
--resume |
bool |
This will cause a check of the files in the current directory, and the refinement will resume after the last completed iteration. It's ok to alter other parameters. |
|
--initial |
string |
File containing starting class-averages. If not specified, will generate starting averages automatically |
|
--nbasisfp |
int |
Number of MSA basis vectors to use when classifying particles |
|
--minchange |
int |
Minimum number of particles that change group before deicding to terminate. Default = -1 (auto) |
|
--fastseed |
bool |
Will seed the k-means loop quickly, but may produce less consistent results. |
|
--simalign |
string |
The name of an 'aligner' to use prior to comparing the images (default=rotate_translate_flip) |
|
--simaligncmp |
string |
Name of the aligner along with its construction arguments (default=frc) |
|
--simralign |
string |
The name and parameters of the second stage aligner which refines the results of the first alignment |
|
--simraligncmp |
string |
The name and parameters of the comparitor used by the second stage aligner. (default=dot). |
|
--simcmp |
string |
The name of a 'cmp' to be used in comparing the aligned images (default=frc:nweight=1) |
|
--shrink |
int |
Optionally shrink the input particles by an integer amount prior to computing similarity scores. For speed purposes. |
|
--classkeep |
float |
The fraction of particles to keep in each class, based on the similarity score generated by the --cmp argument (default=0.85). |
|
--classkeepsig |
bool |
Change the keep ('--keep') criterion from fraction-based to sigma-based. |
|
--classiter |
int |
Number of iterations to use when making class-averages (default=5) |
|
--classalign |
string |
If doing more than one iteration, this is the name and parameters of the 'aligner' used to align particles to the previous class average. |
|
--classaligncmp |
string |
This is the name and parameters of the comparitor used by the fist stage aligner Default is dot. |
|
--classralign |
string |
The second stage aligner which refines the results of the first alignment in class averaging. Default is None. |
|
--classraligncmp |
string |
The comparitor used by the second stage aligner in class averageing. Default is dot:normalize=1. |
|
--classaverager |
string |
The averager used to generate the class averages. Default is 'mean'. |
|
--classcmp |
string |
The name and parameters of the comparitor used to generate similarity scores, when class averaging. Default is frc' |
|
--classnormproc |
string |
Normalization applied during class averaging |
|
--classrefsf |
bool |
Use the setsfref option in class averaging to produce better filtered averages. |
|
--normproj |
bool |
Normalizes each projected vector into the MSA subspace. Note that this is different from normalizing the input images since the subspace is not expected to fully span the image |
-P |
--parallel |
string |
Run in parallel, specify type:<option>=<value>:<option>:<value> |
|
--dbls |
string |
data base list storage, used by the workflow. You can ignore this argument. |
-v |
--verbose |
int |
verbose level [0-9], higner number means higher level of verboseness |
This program uses an iterative MSA-based reference-free classification algorithm. The names in parentheses below are the filenames produced by each step. The files will be found in bdb:r2d_XX (XX is incremented each time e2refine2d.py is run). A brief outline of the process follows :
- Initialize the iterative process by making some initial guesses at class-averages. These are invariant-based, meaning even with MSA, this initial classification is not exceptionally good.
- Generate rotational/translational invariants for each particle (input_fp)
- Perform MSA on the invariants to define an orthogonal subspace representing the most important differences among the classes (input_fp_basis)
Reproject the particles into the MSA subspace using --nbasis vectors (input_fp_basis_proj)
Classify the particles into --ncls classes using K-means (classmx_00)
- Iterative class-averaging of the particles in each class to produce a set of initial averages (classes_init)
- Align the current class-averages to each other, and sort them by similarity, keeping them centered (allrefs_YY) (Note that YY starts with 01 and is incremented after each iteration)
- Perform MSA on the (aligned) class-averages. Again, this represents largest differences, but now performed on images, not invariants. (basis_YY)
Select a subset of --naliref averages to use as alignment references for this iteration (aliref_YY)
- Align each particle to each of the reference averages from the last step. Keep the orientation corresponding to the best-matching reference. (simmx_YY)
- Project aligned particles using reference MSA vectors from basis_YY (
aliref_x , x=01..10 each containing 5 images allref_x , x=01..10 each containing 99 images basis_x , x=01..10 each containing 6 images classes_x , x=01..10 each containing 99 images classes_ init , containing 99 images classmx_x , x=01..10 each containing 6 images input_fp , containing 2799 (input) images input_fp_basis , containing 6 images simmx_x , x=01..10 each containing 6 images input_fp_basis_proj input_x_proj , x=01..10