EMAN::KMeansAnalyzer Class Reference

KMeansAnalyzer Performs k-means classification on a set of input images (shape/size arbitrary) returned result is a set of classification vectors. More...

#include <analyzer.h>

Inheritance diagram for EMAN::KMeansAnalyzer:

Inheritance graph
[legend]
Collaboration diagram for EMAN::KMeansAnalyzer:

Collaboration graph
[legend]

List of all members.

Public Member Functions

 KMeansAnalyzer ()
virtual int insert_image (EMData *image)
 insert a image to the list of input images
virtual vector< EMData * > analyze ()
 main function for Analyzer, analyze input images and create output images
string get_name () const
 Get the Analyzer's name.
string get_desc () const
 Get the Analyzer's description.
void set_params (const Dict &new_params)
 Set the Analyzer parameters using a key/value dictionary.
TypeDict get_param_types () const
 Get Analyzer parameter information in a dictionary.

Static Public Member Functions

static AnalyzerNEW ()

Protected Member Functions

void update_centers (int sigmas=0)
void reclassify ()
void reseed ()

Protected Attributes

vector< EMData * > centers
int ncls
int verbose
int minchange
int maxiter
int mininclass
int nchanged
int slowseed
int calcsigmamean


Detailed Description

KMeansAnalyzer Performs k-means classification on a set of input images (shape/size arbitrary) returned result is a set of classification vectors.

Author:
Steve Ludtke
Date:
03/02/2008
Parameters:
verbose Display progress if set, more detail with larger numbers (9 max)
ncls number of desired classes
maxiter maximum number of iterations
minchange Terminate if fewer than minchange members move in an iteration
mininclass Minumum number of particles to keep a class as good (not enforced at termination
slowseed Instead of seeding all classes at once, it will gradually increase the number of classes by adding new seeds in groups with large standard deviations
calcsigmamean Computes standard deviation of the mean image for each class-average (center), and returns them at the end of the list of centers

Definition at line 138 of file analyzer.h.


Constructor & Destructor Documentation

EMAN::KMeansAnalyzer::KMeansAnalyzer (  )  [inline]

Definition at line 141 of file analyzer.h.

Referenced by NEW().

00141 : ncls(0),verbose(0),minchange(0),maxiter(100),mininclass(2),slowseed(0) {}


Member Function Documentation

virtual int EMAN::KMeansAnalyzer::insert_image ( EMData image  )  [inline, virtual]

insert a image to the list of input images

Parameters:
image 
Returns:
int 0 for success, <0 for fail

Implements EMAN::Analyzer.

Definition at line 143 of file analyzer.h.

References EMAN::Analyzer::images.

00143                                                          {
00144                         images.push_back(image);
00145                         return 0;
00146                 }

vector< EMData * > KMeansAnalyzer::analyze (  )  [virtual]

main function for Analyzer, analyze input images and create output images

Returns:
vector<EMData *> result os images analysis

Implements EMAN::Analyzer.

Definition at line 84 of file analyzer.cpp.

References calcsigmamean, centers, EMAN::Util::get_irand(), get_xsize(), get_ysize(), get_zsize(), EMAN::Analyzer::images, maxiter, minchange, mininclass, nchanged, ncls, reclassify(), reseed(), slowseed, update_centers(), and verbose.

00085 {
00086 if (ncls<=1) return vector<EMData *>();
00087 //srandom(time(0));
00088 
00089 // These are the class centers, start each with a random image
00090 int nptcl=images.size();
00091 int nclstot=ncls;
00092 if (calcsigmamean) centers.resize(nclstot*2);
00093 else centers.resize(nclstot);
00094 if (mininclass<1) mininclass=1;
00095 
00096 if (slowseed) {
00097         if (maxiter<ncls*3+20) maxiter=ncls*3+20;       // We need to make sure we have enough iterations to seed all of the classes
00098         ncls=2;
00099 }
00100 
00101 for (int i=0; i<ncls; i++) {
00102         // Fixed by d.woolford, Util.get_irand is inclusive (added a -1)
00103         centers[i]=images[Util::get_irand(0,nptcl-1)]->copy();
00104 
00105 }
00106 
00107 if (calcsigmamean) {
00108         for (int i=nclstot; i<nclstot*2; i++) centers[i]=new EMData(images[0]->get_xsize(),images[0]->get_ysize(),images[0]->get_zsize());
00109 }
00110 
00111 
00112 for (int i=0; i<maxiter; i++) {
00113         nchanged=0;
00114         reclassify();
00115         if (verbose) printf("iter %d>  %d (%d)\n",i,nchanged,ncls);
00116         if (nchanged<minchange && ncls==nclstot) break;
00117         update_centers();
00118 
00119         if (slowseed && i%3==2 && ncls<nclstot) {
00120                 centers[ncls]=0;
00121                 ncls++;
00122                 reseed();
00123         }
00124 }
00125 update_centers(calcsigmamean);
00126 
00127 return centers;
00128 }

string EMAN::KMeansAnalyzer::get_name (  )  const [inline, virtual]

Get the Analyzer's name.

Each Analyzer is identified by a unique name.

Returns:
The Analyzer's name.

Implements EMAN::Analyzer.

Definition at line 150 of file analyzer.h.

00151                 {
00152                         return "kmeans";
00153                 }

string EMAN::KMeansAnalyzer::get_desc (  )  const [inline, virtual]

Get the Analyzer's description.

Returns:
The Analyzer's description.

Implements EMAN::Analyzer.

Definition at line 155 of file analyzer.h.

00156                 {
00157                         return "k-means classification";
00158                 }

static Analyzer* EMAN::KMeansAnalyzer::NEW (  )  [inline, static]

Definition at line 160 of file analyzer.h.

References KMeansAnalyzer().

Referenced by EMAN::Factory< T >::Factory().

00161                 {
00162                         return new KMeansAnalyzer();
00163                 }

void KMeansAnalyzer::set_params ( const Dict new_params  )  [virtual]

Set the Analyzer parameters using a key/value dictionary.

Parameters:
new_params A dictionary containing the new parameters.

Reimplemented from EMAN::Analyzer.

Definition at line 71 of file analyzer.cpp.

References calcsigmamean, EMAN::Dict::has_key(), maxiter, minchange, mininclass, ncls, EMAN::Analyzer::params, slowseed, and verbose.

00072 {
00073         params = new_params;
00074         if (params.has_key("ncls")) ncls = params["ncls"];
00075         if (params.has_key("maxiter"))maxiter = params["maxiter"];
00076         if (params.has_key("minchange"))minchange = params["minchange"];
00077         if (params.has_key("mininclass"))mininclass = params["mininclass"];
00078         if (params.has_key("slowseed"))slowseed = params["slowseed"];
00079         if (params.has_key("verbose"))verbose = params["verbose"];
00080         if (params.has_key("calcsigmamean")) calcsigmamean=params["calcsigmamean"];
00081 
00082 }

TypeDict EMAN::KMeansAnalyzer::get_param_types (  )  const [inline, virtual]

Get Analyzer parameter information in a dictionary.

Each parameter has one record in the dictionary. Each record contains its name, data-type, and description.

Returns:
A dictionary containing the parameter info.

Implements EMAN::Analyzer.

Definition at line 167 of file analyzer.h.

References EMAN::EMObject::INT, and EMAN::TypeDict::put().

00168                 {
00169                         TypeDict d;
00170                         d.put("verbose", EMObject::INT, "Display progress if set, more detail with larger numbers (9 max)");
00171                         d.put("ncls", EMObject::INT, "number of desired classes");
00172                         d.put("maxiter", EMObject::INT, "maximum number of iterations");
00173                         d.put("minchange", EMObject::INT, "Terminate if fewer than minchange members move in an iteration");
00174                         d.put("mininclass", EMObject::INT, "Minumum number of particles to keep a class as good (not enforced at termination");
00175                         d.put("slowseed",EMObject::INT, "Instead of seeding all classes at once, it will gradually increase the number of classes by adding new seeds in groups with large standard deviations");
00176                         d.put("calcsigmamean",EMObject::INT, "Computes standard deviation of the mean image for each class-average (center), and returns them at the end of the list of centers");
00177                         return d;
00178                 }

void KMeansAnalyzer::update_centers ( int  sigmas = 0  )  [protected]

Definition at line 130 of file analyzer.cpp.

References centers, EMAN::Analyzer::images, mininclass, ncls, reseed(), sqrt(), and verbose.

Referenced by analyze().

00130                                               {
00131 int nptcl=images.size();
00132 //int repr[ncls];
00133 int * repr = new int[ncls];
00134 
00135 for (int i=0; i<ncls; i++) {
00136         centers[i]->to_zero();
00137         if (sigmas) centers[i+ncls]->to_zero();
00138         repr[i]=0;
00139 }
00140 
00141 for (int i=0; i<nptcl; i++) {
00142         int cid=images[i]->get_attr("class_id");
00143         centers[cid]->add(*images[i]);
00144         if (sigmas) centers[cid+ncls]->addsquare(*images[i]);
00145         repr[cid]++;
00146 }
00147 
00148 for (int i=0; i<ncls; i++) {
00149         if (repr[i]<mininclass) {
00150                 delete centers[i];
00151                 centers[i]=0;
00152                 repr[i]=0;
00153         }
00154         else {
00155                 centers[i]->mult((float)1.0/(float)(repr[i]));
00156                 centers[i]->set_attr("ptcl_repr",repr[i]);
00157                 if (sigmas) {
00158                         centers[i+ncls]->mult((float)1.0/(float)(repr[i]));             // sum of squares over n
00159                         centers[i+ncls]->subsquare(*centers[i]);                                        // subtract the mean value squared
00160                         centers[i+ncls]->process("math.sqrt");                                  // square root
00161                         centers[i+ncls]->mult((float)1.0/(float)sqrt((float)repr[i]));          // divide by sqrt(N) to get std. dev. of mean
00162                 }
00163 
00164         }
00165         if (verbose>1) printf("%d(%d)\t",i,(int)repr[i]);
00166 }
00167 
00168 if (verbose>1) printf("\n");
00169 
00170 reseed();
00171 
00172 delete [] repr;
00173 }

void KMeansAnalyzer::reclassify (  )  [protected]

Definition at line 230 of file analyzer.cpp.

References centers, EMAN::Cmp::cmp(), EMAN::Analyzer::images, nchanged, and ncls.

Referenced by analyze().

00230                                 {
00231 int nptcl=images.size();
00232 
00233 Cmp *c = Factory < Cmp >::get("sqeuclidean");
00234 for (int i=0; i<nptcl; i++) {
00235         float best=1.0e38f;
00236         int bestn=0;
00237         for (int j=0; j<ncls; j++) {
00238                 float d=c->cmp(images[i],centers[j]);
00239 //images[i]->cmp("sqeuclidean",centers[j]);
00240                 if (d<best) { best=d; bestn=j; }
00241         }
00242         int oldn=images[i]->get_attr_default("class_id",0);
00243         if (oldn!=bestn) nchanged++;
00244         images[i]->set_attr("class_id",bestn);
00245 }
00246 delete c;
00247 }

void KMeansAnalyzer::reseed (  )  [protected]

Definition at line 176 of file analyzer.cpp.

References centers, EMAN::Cmp::cmp(), get_attr(), EMAN::Util::get_irand(), EMAN::Analyzer::images, and ncls.

Referenced by analyze(), and update_centers().

00176                             {
00177 // if no classes need reseeding just return
00178 int nptcl=images.size();
00179 int i,j;
00180 for (i=0; i<ncls; i++) {
00181         if (!centers[i]) break;
00182 }
00183 if (i==ncls) return;
00184 
00185 int * best = new int[ncls];     // particles in the average
00186 float *sigmas = new float[ncls]; // array of deviations
00187 
00188 for (int i=0; i<ncls; i++) { sigmas[i]=0; best[i]=0; }
00189 
00190 // compute the deviation of each class
00191 Cmp *c = Factory < Cmp >::get("sqeuclidean");
00192 for (int i=0; i<nptcl; i++) {
00193         int cid=images[i]->get_attr("class_id");
00194         if (!centers[cid]) continue;
00195 //      sigmas[cid]+=(float)imc->get_attr("square_sum");
00196         float d=c->cmp(images[i],centers[cid]);
00197         if (d>sigmas[cid]) {
00198                 sigmas[cid]=d;  // Instead of using sigma, use the largest distance in the class
00199                 best[cid]=i;
00200         }
00201 }
00202 delete c;
00203 //for (i=0; i<ncls; i++) sigmas[i]/=repr[i];    //since we aren't doing a sigma now...
00204 
00205 //we could sort the list, but for this use we just search
00206 for (i=0; i<ncls; i++) {
00207         if (centers[i]) continue;
00208 
00209         float maxsig=0;
00210         int maxi=0;
00211         // find the class with the largest sigma
00212         for (j=0; j<ncls; j++) {
00213                 if (sigmas[j]>maxsig) { maxsig=sigmas[j]; maxi=j; }
00214         }
00215 
00216         // find an image in that class
00217         for (j=0; j<ncls; j++) if ((int)images[j]->get_attr("class_id")==maxi) break;
00218         if (Util::get_irand(0,1)==0) centers[i]=images[best[maxi]]->copy();
00219         else centers[i]=images[j]->copy();
00220         centers[i]->set_attr("ptcl_repr",1);
00221         sigmas[maxi]=0;         // if we get another one to reseed, pick the next largest set (zero out the current one)
00222         printf("reseed %d -> %d (%d or %d)\n",i,maxi,best[maxi],j);
00223 }
00224 
00225 delete [] sigmas;
00226 delete [] best;
00227 }


Member Data Documentation

vector<EMData *> EMAN::KMeansAnalyzer::centers [protected]

Definition at line 185 of file analyzer.h.

Referenced by analyze(), reclassify(), reseed(), and update_centers().

int EMAN::KMeansAnalyzer::ncls [protected]

Definition at line 186 of file analyzer.h.

Referenced by analyze(), reclassify(), reseed(), set_params(), and update_centers().

Definition at line 187 of file analyzer.h.

Referenced by analyze(), set_params(), and update_centers().

Definition at line 188 of file analyzer.h.

Referenced by analyze(), and set_params().

Definition at line 189 of file analyzer.h.

Referenced by analyze(), and set_params().

Definition at line 190 of file analyzer.h.

Referenced by analyze(), set_params(), and update_centers().

Definition at line 191 of file analyzer.h.

Referenced by analyze(), and reclassify().

Definition at line 192 of file analyzer.h.

Referenced by analyze(), and set_params().

Definition at line 193 of file analyzer.h.

Referenced by analyze(), and set_params().


The documentation for this class was generated from the following files:

Generated on Sat Nov 7 02:19:54 2009 for EMAN2 by  doxygen 1.5.6