Differences between revisions 1 and 2

Particle orientation refinement using GMM representation

Most programs are available in EMAN2 builds after 2023-03, but some are still under continuous development. Newer versions are typically better.
It is recommended to add the "examples/" folder in EMAN2 binary to $PATH, as some new programs have not been moved to "bin/" yet.
The tutorial is only tested on Linux with Nvidia GPU and CUDA.

Here we use particles of SARS-COV-2 from EMPIAR-10492 as an example. Starting from particles with assigned orientation, i.e. the Polished folder (13.5GB) from EMPIAR, as well as job096_run_data.star.

Import existing refinement

Here we will need a .lst file with the location of all particles and their initial orientation assignment. Since here we start from a Relion star file, run

e2convertrelion.py job096_run_data.star --voltage 300 --cs 2.7 --apix 1.098 --amp 10 --skipheader 26 --onestack particles/particles_all.hdf --make3d --sym c3

Note that we need to phase flip the particles before the refinement, so this may take a while. Also make sure to provide the correct CTF related information to the program, including voltage, cs, amp, apix, since the program does not read those from the star file automatically. Check --help for more details. After importing the particles, with the --make3d option, the program will create a r3d_00 folder and reconstruct the 3D maps. You should see the structure of Covid spike at ~3.9Å at this point.

To start from other formats:

From classical EMAN2 refinement (e2refine_easy), run e2evalrefine.py refine_XX --extractorientptcl particles.lst
From the new EMAN2 refinement (e2spa_refine), simply use the ptcls_XX.lst file from the last iteration.
From CryoSPARC or others, convert it to a relion star file using pyem, then follow the relion conversion.

Global orientation refinement

e2gmm_refine_new.py r3d_00/threed_00.hdf --startres 3.9 --npt 20000

Here --startres should be set to the final resolution from the previous refinement, and --npt is the number of Gaussian in the model. For refinement at near atomic resolution, it is convenient to simply set N to the number of non-H atoms in the molecule. The number can also be estimated using e2gmm_guess_n.py given only a map and target resolution. The GMM can also be seeded from an existing PDB model using --initpts XXXX.pdb.

Focused refinement

Starting from a finished global refinement, run

e2gmm_refine_new.py gmm_XX/threed_XX.hdf --startres X --npt N --mask mask.hdf --masksigma

Here mask.hdf is a mask focusing on the target region. It is recommended to create this using Filtertool.

Refine from a GMM heterogeneity analysis

e2gmm_heter_refine.py gmm_XX/threed_XX.hdf --maxres X --mask mask.hdf

Here we also start from the global refinement. --maxres defines the resolution for the heterogeneity analysis, and it is typically safer to use a lower resolution (7Å by default), since the flexible parts are often not well resolved in the first place. The target region is specified with mask.hdf.

Patch-by-patch refinement

Starting from a finished global refinement, run

e2gmm_refine_patch.py gmm_XX/threed_XX.hdf --startres X --npatch N

-  ⇤ ← Revision 1 as of 2023-03-23 17:59:14 → 
  Size: 3793
  Editor: MuyuanChen
  Comment: start...
+   ← Revision 2 as of 2023-03-24 22:40:48 → ⇥
  Size: 3432
  Editor: MuyuanChen
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
+Here we use particles of SARS-COV-2 from [[https://www.ebi.ac.uk/empiar/EMPIAR-10492/ | EMPIAR-10492]] as an example. Starting from particles with assigned orientation, i.e. the '''Polished''' folder (13.5GB) from EMPIAR, as well as '''job096_run_data.star'''.
-Line 9:
+Line 11:
-Here we will need a .lst file with the location of all particles and their initial orientation assignment.
+Here we will need a .lst file with the location of all particles and their initial orientation assignment. Since here we start from a Relion star file, run 
{{{
e2convertrelion.py job096_run_data.star --voltage 300 --cs 2.7 --apix 1.098 --amp 10 --skipheader 26 --onestack particles/particles_all.hdf --make3d --sym c3
}}}

Note that we need to phase flip the particles before the refinement, so this may take a while. Also make sure to provide the correct CTF related information to the program, including voltage, cs, amp, apix, since the program does not read those from the star file automatically. Check `--help` for more details. After importing the particles, with the `--make3d` option, the program will create a `r3d_00` folder and reconstruct the 3D maps. You should see the structure of Covid spike at ~3.9Å at this point.

To start from other formats:
-Line 12:
+Line 21:
- * From Relion star file, run {{{e2convertrelion.py particles.star --output particles.lst}}}. Note that we need to phase flip the particles before the refinement, so this may take a while. Also make sure to provide the correct CTF related information to the program, including voltage, cs, amp, apix. Check --help for more information.
-Line 14:
+Line 22:
-If imported from another software than EMAN2, it is better to do one round of reconstruction to make sure the results are correct. (probably should make this a part of the import program...) 
{{{
mkdir r3d_00
e2proclst.py particles.lst --create r3d_00/ptcls_00.lst 
e2spa_make3d.py --input r3d_00/ptcls_00.lst --output r3d_00/threed_00_even.hdf --parallel thread:32 --clsid even
e2spa_make3d.py --input r3d_00/ptcls_00.lst --output r3d_00/threed_00_odd.hdf --parallel thread:32 --clsid odd
e2refine_postprocess.py --even r3d_00/threed_00_even.hdf --restarget 5 --tophat localwiener --thread 32
}}}

You should see the similar structure and FSC curves in the r3d_00 folder. Since EMAN2 may use different sharpening and masking approaches, the curve and structure may be slightly different. To use same sharpening, create a structure factor file with {{{e2proc3d.py map.mrc map.mrc --calcsf sf.txt}}}, and add `--setsf sf.txt` to the `e2refine_postprocess` command. To use the same mask, add `--mask mask.hdf` to the `e2refine_postprocess` command.
-Line 29:
+Line 26:
-e2gmm_refine_new.py r3d_XX/threed_XX.hdf --startres X --npt N
+e2gmm_refine_new.py r3d_00/threed_00.hdf --startres 3.9 --npt 20000
-Line 43:
+Line 40:
+== Refine from a GMM heterogeneity analysis ==

{{{
e2gmm_heter_refine.py gmm_XX/threed_XX.hdf --maxres X --mask mask.hdf
}}}
Here we also start from the global refinement. `--maxres` defines the resolution for the heterogeneity analysis, and it is typically safer to use a lower resolution (7Å by default), since the flexible parts are often not well resolved in the first place. The target region is specified with `mask.hdf`.
-Line 49:
+Line 54:
-== Refine from a GMM heterogeneity analysis ==

{{{
e2gmm_heter_refine.py gmm_XX/threed_XX.hdf --maxres X --mask mask.hdf
}}}
Here we also start from the global refinement. `--maxres` defines the resolution for the heterogeneity analysis, and it is typically safer to use a lower resolution (7Å by default), since the flexible parts are often not well resolved in the first place. The target region is specified with `mask.hdf`.