Particle Picking with Convolution Neural Network

Update 2018-1-24 EMAN2.21

Now we train two networks, one for boxing particles from micrographs, another for excluding obvious bad particles (ice etc).

Use Good Refs to pick particle references, Bkgnd Refs for background references (pure noise regions in micrograph, and Bad Refs for bad particle references like ice contamination or large aggregation. The program works when there are ~10 particles for each class, while more references may improve the performance.

To get the functionality of previous versions (without bad particle exclusion), simply having all bad particles including noise and contamination in Bkgnd Refs and leave Bad Refs empty.

If the boxer generates too few particles, consider lower the threshold value and redo the Autobox step (not the training). Also make sure there is no good particle in the Bad Refs class.

Theano is no longer developed, and we now use TensorFlow instead, which is included with EMAN2. If you don't have CUDA set up on your machine (with an appropriate NVidia GPU), it will take a long time to train the network (usually 20-30 min, but could be an hour or more)!

In EMAN2.2 release and later version, the neural network particle picking is built in e2boxer.py

Launch e2boxer from e2projectmanager.py, choose Neural Net in Autoboxing Methods panel. Click Good Refs in Mouse Mode panel, choose a few good particles (~10 is usually enough, having more may help). Make sure to choose the good particles over micrographs of different defocus range, and the particles are centered. Then click Bad Refs, and pick some non-particles things in micrograph, like background noise and ice contamination (N>50). Click Train in the Autoboxing Methods panel. Look at the command line output. When it says done, click Autobox or Autobox All to box particles. The Auto-boxed particles are sorted by their score. Shift-click a particle to remove it, or Control-Shift-click a particle to remove this particles and all the particles after this one.

Old Tutorial (deprecated)

EMAN2 daily build after 2015-11-06

The program can be found in:

//EMAN2/examples/convnet_pickparticle.py//

This program is developed by Muyuan Chen. Please contact muyuanc@bcm.edu if you have any questions.

Here we train a stack of convolutional neural nets to recognize particles in the micrograph. The basic structure of the convolutional net used in this program is described here:

http://deeplearning.net/tutorial/lenet.html

This program requires Theano, in addition to other EMAN2 dependencies. Guide to install Theano can be found here:

http://deeplearning.net/software/theano/install.html

This program runs on GPU if the GPU environment is set up in Theano. If not, it should be able to run on CPU, but the speed may be slower. Also, some functions (not very useful at this point) will be disabled if CPU is used.

Example

We use some IP3R images as a example.

Making Training Set

Pick some particles manually (We use 65 particles in this case). Use a box size slightly larger than the particle, and make sure to center these particles. These particles should cover most different view of particle. Here we save these particles as “ptcls_train.hdf”.

Pick some negative samples (Things that you are sure that are not particles, here we pick 10 pure noise samples.) The noise samples should be the same size as the particles. We save these negative samples as “ngtvs.hdf”

Train the convolutional network

Run the program to train the convolutional neural net with the command:

convnet_pickparticle.py ptcls_train.hdf –ngtvs ngtv_train.hdf –shrink 2 –trainout We shrink the particles by 2 for speed. Specify the –trainout option so the program will output the result on the training set.

When the program finishes, check the file “result_conv0.hdf” which can be found in the directory you run the program. You should see something like this:

In stack display, toggle on the value called “label”. Resize the view so that there are 3*N images in each row. Here for each 3 images, the first one is one particle or noise sample, the second one is the output from the first layer of the neural net, and the third one is the final output of the classification layer of the neural net. Real particles have the label value of 1, while negative samples have label 0.

Due to some Theano issue, if the program is run on CPU, the second image (output from the first layer) will not display.

In a satisfying training, each third image (classification layer output) of a particle should be a bright ball, while the third image for a negative sample should be empty.

If the output looks good, we can test the net on a micrograph to see the performance. Here we name it “testimg.hdf”. Note that this image should NOT be the same one that you box the training particles. Use the command:

convnet_pickparticle.py –teston testimg.hdf

When the program finishes, you should see a new file called //"testresult.hdf"// in the folder. It is an image stack with two images. Open it with //Show 2D// in //e2display//.

The first image is a filtered version of the input, and the second image is the output of the neural net. The particles should be highlighted in the second image. Note in this case some large ice contaminations are recognized as particles in this case but it generally looks fine.

Box particles

If the test image works, we can apply this to all the micrographs. Run the command:

convnet_pickparticle.py –teston micrographs –shrink 2 Here micrographs is the folder of all your micrographs. The program will apply the convolutional net on all images in the folder, find the particles and save as EMAN2 particle box format. Please make sure the shrink option is the same as in the training command.

Now look at the boxing result using e2boxer.py

e2boxer.py micrographs/*.mrc Auto-picked particles will show up as “manual picked particles” here.

Usually there are apparently more boxes than real particles here. The particles are sorted by the score given by the convolutional network. So particles with lower index should be “better”.
Here we just need to look at the particles and decide that the particles after X are not real anymore. Then switch to manual boxing tool and put the number X in the text box Clear from # and click the button Clear.

A zoomed in view of the particles:

That's it~

Muyuan