Making Simulated Single Particle Data

If your goal is to develop software and have a good test data set where you know the ground truth, just give up now. All simulated data sets I have ever seen fall far short of real data. That is, any reasonable algorithm will perform extremely will even with very noisy simulated data, much more so than with real data.

However, if your goal is to better understand how the software works through use of some artificial, but somewhat realistic simulated data, or if you need to perform initial testing on algorithms to make sure they at least work with simulated data, then this is the page to read. It is a good idea to use HDF format as much as possible in this process to preserve all metadata in the header.

Making simulated data isn't all that difficult. If you wish to start with a structure from the EMDB instead of PDB, you can jump directly to the seconds step:

  1. If you want to start from a PDB file, the first thing to do is convert the PDB file to a density map. There may be some issues with non-crystallographic symmetry, etc. Run e2pdb2mrc.py --help for a full set of options, but this will work for most purposes. Note that there is no MMCIF support at present:

e2pdb2mrc.py <input.pdb> <output.hdf> --center --apix <apix> --res <resolution> --box <boxsize>
  1. Next, you will likely want to make a set of projections in different orientations. You will likely want to repeat this process multiple times to simulate data from different "micrographs" each with a different defocus:

e2project3d.py <input.hdf> --output <output.hdf> --orientgen=rand:phitoo=1:n=200
  1. Once you have projections you will need to modify them to simulate the CTF/MTF of the instrument, and add noise. While EMAN2 does have e2ctfsim.py --apply <projections> for interactive CTF simulation/visualization, this isn't very useful in producing simulated data. Instead, I suggest using e2filtertool.py to interactively figure out what sorts of CTF and noise parameters you want to use, then use e2proc2d.py to apply these to the individual projection stacks you just created. You should be familiar with e2filtertool.py before attempting this. I highly recommend watching the video tutorial if you aren't familiar with this tool.

    • run e2filtertool.py <projection stack>. This will show a subset of the projections which will be updated interactively.

    • Create a math -> simulatectf entry

    • Typical values for parameters for a high-end microscope and detector would be:
      • ampcont=10 %
      • bfactor=50 A^2
      • cs=2.7 mm
      • defocus= 1 - 2 um
      • phaseflip=1 (otherwise CTF phase flipping isn't performed)
      • voltage=300
    • for noise simulation, you can use the simple "noiseamp" parameter above, or add another noise-adding processor, which may give you more control. Noiseamp values are project-dependent. If you have a defocus of 1 and adjust noiseamp such that the projections are barely visible through the noise, this is probably a fairly realistic noise level.
  2. The goal of e2filtertool.py is to interactively adjust the parameters until you are happy with them. While there is a menu item which will allow you to apply this processor to the full set of projections and save it, repeating that process for N stacks of projections would be annoying. Instead, once you have a set of parameters you like:

    • exit e2filtertool.py

    • look at the filtertool_default.txt file in the local folder. This text file contains the parameters you need to use e2proc2d.py to apply the CTF to many sets of projections, eg -

e2proc2d.py <projections_1.hdf> <simulated_1.hdf> --process=math.simulatectf:ampcont=10.0:bfactor=50.0:cs=2.7:defocus=1.5:noiseamp=0.02:phaseflip=1:voltage=300.0

EMAN2/SimulatedData (last edited 2019-09-19 15:29:48 by SteveLudtke)