EMAN1/FAQ/StructureFactor

Q: CTF Correction - I can't do an x-ray solution scattering experiment on my specimen. Is there some way I can get an apporoximate structure factor to use in fitting and CTF correction

A: The documentation really needs to address this, but doesn't. There are two reasons for this, though. First, it is really difficult to describe this adequately textually. Second, you need to have a sound understanding of the mathematics being used in CTF correction to use this method properly and avoid doing bad things to your structure (without realizing it).

One other note. Many people (myself included) have suggested generating a structure factor curve computationally from a PDB structure of a similar protein. As it turns out, this is a very difficult thing to do, largely because solvent effects have a profound effect on the overall shape of this curve. Current software (2003) used by the solution scattering community can accurately predict peak locations, etc., but it doesn't have the correct overall shape, and should not be used for CTF parameter determination. Perhaps this situation will improve in the future. While you could use this structure factor for 'setsf=' on the final reconstruction, unless the proteins are VERY nearly identical, this would be rather risky.

A similar question refers to downloading the experimental structure factors from the PDB when present. In addition to having a similar solvation issue, these structure factors are 3-D, not the sort of 1-D average you get from solution scattering. While you could rotationally average, it still has the issues mentioned above.

Still, there is a way to get the necessary curve. It isn't perfect, but it's adequate in most cases, and has been used for several published structures. The basic idea is to use several sets of particles from images at different defocuses. You then simultaneously fit the CTF of these data sets such that the CTF curve is a reasonable fit, and simultaneously the predicted structure factor for all of the curves matches pretty well at low resolution. This process must be done manually using ctfit, but once you have a result, you should be able to do most of your fitting with the automated program 'fitctf'.

The optimal way to approach this problem is to have some sort of solution scattering curve on-hand. This curve is simply used for scaling the data, and getting some general idea of a reasonable B-factor and amplitude contrast. It will not impose it's features on the final structure factor. This is also not strictly necessary, it is possible to proceed without one. The 'groel.sm' curve (native GroEL structure factor) is probably adequate for most cases. Then do the following:

Load 3 or 4 particle sets into ctfit
Select each set in turn, then select the 'From File' button in the 'Structure Factor' section. Select groel.sm or some other structure factor you will use as a model. (you can skip this if you like)
Go to the 'Advanced' menu and select 'Change background mode'. This will change the model used for the background noise. In this model, only the first parameter 'N/A' has any effect. the remainder of the background is fit based on the zeros of the CTF. Note that this mode is currently INCOMPATIBLE with doing actual CTF correction of the data, but it can be used for this task of producing a structure factor.
Set 'Amp' to 0, then adjust 'N/A' to make the background curve look fairly continuous. Note that the background curve should always be lower than the data curve. Try and make the curve somewhat continuous, DON'T try to fit the data curve.
Now adjust the remaining parameters to get the best possible fit. If you are using groel.sm, this fit will not be good at all at low spatial frequency. The peaks won't match up even vaguely in most cases. This isn't the point. The point is to get the overall scale of the curve to match reasonably well.
Now select the 'Struc Fac' button in the 'Display 2' section of the plot window. This will make a second plot appear, containing the predicted structure factor for all visible data sets. This is calculated from the data itself, based on the fit you have done. Note that it will act poorly around the CTF zeros. Don't worry about it. Zoom in (drag right mouse on the plot) to the low resolution range, from the x origin to around the first zero of the CTF.
This is the tricky part. You need to adjust the CTF parameters in such a way that the low resolution predicted structure factors match each other as well as possible, while simultaneously not making the fit bad in the other window. Generally Amp is the most useful parameter to adjust.
When everything looks good, make a note of the resolution just below the first zero that disturbs the structure factor. Hopefully this will be somewhere in the 1/20 - 1/30 angstrom range.
On the 'Advanced' menu, select 'Save 1 Column'. Give it a filename, and tell it to save column 11.
Almost done. Exit ctfit. Now use 'sfmerge.py' to combine the file you just created (at low resolution) with some other structure factor (like groel.sm) at high resolution. type 'sfmerge.py' for usage. Note that the cutoff frequency is in 1/A. ie - if you found the cutoff resolution to be at 25 A, specify .04.
That's it, use the new predicted structure factor file you just made to fit all of your data with either ctfit or fitctf. Then specify this structure factor using ctfcw= in the refine command. Remember you cannot currently use the alternate background mode in ctfit when fitting the data you will be reconstructing.