EMAN2
Public Member Functions | Static Public Member Functions | Static Public Attributes | Protected Member Functions | Protected Attributes
EMAN::KMeansAnalyzer Class Reference

KMeansAnalyzer Performs k-means classification on a set of input images (shape/size arbitrary) returned result is a set of classification vectors. More...

#include <analyzer.h>

Inheritance diagram for EMAN::KMeansAnalyzer:
Inheritance graph
[legend]
Collaboration diagram for EMAN::KMeansAnalyzer:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 KMeansAnalyzer ()
virtual int insert_image (EMData *image)
 insert a image to the list of input images
virtual vector< EMData * > analyze ()
 main function for Analyzer, analyze input images and create output images
string get_name () const
 Get the Analyzer's name.
string get_desc () const
 Get the Analyzer's description.
void set_params (const Dict &new_params)
 Set the Analyzer parameters using a key/value dictionary.
TypeDict get_param_types () const
 Get Analyzer parameter information in a dictionary.

Static Public Member Functions

static AnalyzerNEW ()

Static Public Attributes

static const string NAME = "kmeans"

Protected Member Functions

void update_centers (int sigmas=0)
void reclassify ()
void reseed ()

Protected Attributes

vector< EMData * > centers
int ncls
int verbose
int minchange
int maxiter
int mininclass
int nchanged
int slowseed
int calcsigmamean

Detailed Description

KMeansAnalyzer Performs k-means classification on a set of input images (shape/size arbitrary) returned result is a set of classification vectors.

Author:
Steve Ludtke
Date:
03/02/2008
Parameters:
verboseDisplay progress if set, more detail with larger numbers (9 max)
nclsnumber of desired classes
maxitermaximum number of iterations
minchangeTerminate if fewer than minchange members move in an iteration
mininclassMinumum number of particles to keep a class as good (not enforced at termination
slowseedInstead of seeding all classes at once, it will gradually increase the number of classes by adding new seeds in groups with large standard deviations
calcsigmameanComputes standard deviation of the mean image for each class-average (center), and returns them at the end of the list of centers

Definition at line 239 of file analyzer.h.


Constructor & Destructor Documentation

EMAN::KMeansAnalyzer::KMeansAnalyzer ( ) [inline]

Definition at line 242 of file analyzer.h.

Referenced by NEW().

: ncls(0),verbose(0),minchange(0),maxiter(100),mininclass(2),slowseed(0) {}

Member Function Documentation

vector< EMData * > KMeansAnalyzer::analyze ( ) [virtual]

main function for Analyzer, analyze input images and create output images

Returns:
vector<EMData *> result os images analysis

Implements EMAN::Analyzer.

Definition at line 177 of file analyzer.cpp.

References calcsigmamean, centers, EMAN::Util::get_irand(), get_xsize(), get_ysize(), get_zsize(), EMAN::Analyzer::images, maxiter, minchange, mininclass, nchanged, ncls, reclassify(), reseed(), set_attr(), slowseed, update_centers(), and verbose.

{
if (ncls<=1) return vector<EMData *>();
//srandom(time(0));

// These are the class centers, start each with a random image
int nptcl=images.size();
int nclstot=ncls;
if (calcsigmamean) centers.resize(nclstot*2);
else centers.resize(nclstot);
if (mininclass<1) mininclass=1;

for (int i=0; i<nptcl; i++) images[i]->set_attr("is_ok_center",(int)5);  // if an image becomes part of too small a set, it will (eventually) be marked as a bad center

if (slowseed) {
        if (ncls>25) slowseed=ncls/25+1;        // this becomes the number to seed in each step
//      if (maxiter<ncls*3+20) maxiter=ncls*3+20;       // We need to make sure we have enough iterations to seed all of the classes
//      ncls=2;
}

for (int i=0; i<ncls; i++) {
        // Fixed by d.woolford, Util.get_irand is inclusive (added a -1)
        centers[i]=images[Util::get_irand(0,nptcl-1)]->copy();

}

if (calcsigmamean) {
        for (int i=nclstot; i<nclstot*2; i++) centers[i]=new EMData(images[0]->get_xsize(),images[0]->get_ysize(),images[0]->get_zsize());
}


for (int i=0; i<maxiter; i++) {
        nchanged=0;
        reclassify();
        if (verbose) printf("iter %d>  %d (%d)\n",i,nchanged,ncls);
        if (nchanged<minchange && ncls==nclstot) break;
        update_centers();

        if (slowseed && i%3==2 && ncls<nclstot) {
                for (int j=0; j<slowseed && ncls<nclstot; j++) {
                        centers[ncls]=0;
                        ncls++;
                }
                reseed();
        }
}
update_centers(calcsigmamean);

return centers;
}
string EMAN::KMeansAnalyzer::get_desc ( ) const [inline, virtual]

Get the Analyzer's description.

Returns:
The Analyzer's description.

Implements EMAN::Analyzer.

Definition at line 256 of file analyzer.h.

                {
                        return "k-means classification";
                }
string EMAN::KMeansAnalyzer::get_name ( ) const [inline, virtual]

Get the Analyzer's name.

Each Analyzer is identified by a unique name.

Returns:
The Analyzer's name.

Implements EMAN::Analyzer.

Definition at line 251 of file analyzer.h.

References NAME.

                {
                        return NAME;
                }
TypeDict EMAN::KMeansAnalyzer::get_param_types ( ) const [inline, virtual]

Get Analyzer parameter information in a dictionary.

Each parameter has one record in the dictionary. Each record contains its name, data-type, and description.

Returns:
A dictionary containing the parameter info.

Implements EMAN::Analyzer.

Definition at line 268 of file analyzer.h.

References EMAN::EMObject::INT, and EMAN::TypeDict::put().

                {
                        TypeDict d;
                        d.put("verbose", EMObject::INT, "Display progress if set, more detail with larger numbers (9 max)");
                        d.put("ncls", EMObject::INT, "number of desired classes");
                        d.put("maxiter", EMObject::INT, "maximum number of iterations");
                        d.put("minchange", EMObject::INT, "Terminate if fewer than minchange members move in an iteration");
                        d.put("mininclass", EMObject::INT, "Minumum number of particles to keep a class as good (not enforced at termination");
                        d.put("slowseed",EMObject::INT, "Instead of seeding all classes at once, it will gradually increase the number of classes by adding new seeds in groups with large standard deviations");
                        d.put("calcsigmamean",EMObject::INT, "Computes standard deviation of the mean image for each class-average (center), and returns them at the end of the list of centers");
                        return d;
                }
virtual int EMAN::KMeansAnalyzer::insert_image ( EMData image) [inline, virtual]

insert a image to the list of input images

Parameters:
image
Returns:
int 0 for success, <0 for fail

Implements EMAN::Analyzer.

Definition at line 244 of file analyzer.h.

References EMAN::Analyzer::images.

                                                        {
                        images.push_back(image);
                        return 0;
                }
static Analyzer* EMAN::KMeansAnalyzer::NEW ( ) [inline, static]

Definition at line 261 of file analyzer.h.

References KMeansAnalyzer().

                {
                        return new KMeansAnalyzer();
                }
void KMeansAnalyzer::reclassify ( ) [protected]

Definition at line 314 of file analyzer.cpp.

References centers, EMAN::Cmp::cmp(), EMAN::Analyzer::images, nchanged, and ncls.

Referenced by analyze().

                                {
int nptcl=images.size();

Cmp *c = Factory < Cmp >::get("sqeuclidean");
for (int i=0; i<nptcl; i++) {
        float best=1.0e38f;
        int bestn=0;
        for (int j=0; j<ncls; j++) {
                float d=c->cmp(images[i],centers[j]);
//images[i]->cmp("sqeuclidean",centers[j]);
                if (d<best) { best=d; bestn=j; }
        }
        int oldn=images[i]->get_attr_default("class_id",0);
        if (oldn!=bestn) nchanged++;
        images[i]->set_attr("class_id",bestn);
}
delete c;
}
void KMeansAnalyzer::reseed ( ) [protected]

Definition at line 285 of file analyzer.cpp.

References centers, get_attr(), EMAN::Util::get_irand(), EMAN::Analyzer::images, ncls, and UnexpectedBehaviorException.

Referenced by analyze(), and update_centers().

                            {
int nptcl=images.size();
int i,j;

// if no classes need reseeding just return
for (i=0; i<ncls; i++) {
        if (!centers[i]) break;
}
if (i==ncls) return;

// make a list of all particles which could be centers
vector<int> goodcen;
for (int i=0; i<nptcl; i++) if ((int)images[i]->get_attr("is_ok_center")>0) goodcen.push_back(i);

if (goodcen.size()==0) throw UnexpectedBehaviorException("Kmeans ran out of valid center particles with the provided parameters");

// pick a random particle for the new seed
for (i=0; i<ncls; i++) {
        if (centers[i]) continue;               // center doesn't need reseeding
        j=Util::get_irand(0,goodcen.size()-1);
        centers[i]=images[j]->copy();
        centers[i]->set_attr("ptcl_repr",1);
        printf("reseed %d -> %d\n",i,j);
}


}
void KMeansAnalyzer::set_params ( const Dict new_params) [virtual]

Set the Analyzer parameters using a key/value dictionary.

Parameters:
new_paramsA dictionary containing the new parameters.

Reimplemented from EMAN::Analyzer.

Definition at line 164 of file analyzer.cpp.

References calcsigmamean, EMAN::Dict::has_key(), maxiter, minchange, mininclass, ncls, EMAN::Analyzer::params, slowseed, and verbose.

{
        params = new_params;
        if (params.has_key("ncls")) ncls = params["ncls"];
        if (params.has_key("maxiter"))maxiter = params["maxiter"];
        if (params.has_key("minchange"))minchange = params["minchange"];
        if (params.has_key("mininclass"))mininclass = params["mininclass"];
        if (params.has_key("slowseed"))slowseed = params["slowseed"];
        if (params.has_key("verbose"))verbose = params["verbose"];
        if (params.has_key("calcsigmamean")) calcsigmamean=params["calcsigmamean"];

}
void KMeansAnalyzer::update_centers ( int  sigmas = 0) [protected]

Definition at line 228 of file analyzer.cpp.

References centers, get_attr(), EMAN::Analyzer::images, mininclass, ncls, reseed(), sqrt(), and verbose.

Referenced by analyze().

                                              {
int nptcl=images.size();
//int repr[ncls];
int * repr = new int[ncls];

for (int i=0; i<ncls; i++) {
        centers[i]->to_zero();
        if (sigmas) centers[i+ncls]->to_zero();
        repr[i]=0;
}

// compute new position for each center
for (int i=0; i<nptcl; i++) {
        int cid=images[i]->get_attr("class_id");
        if ((int)images[i]->get_attr("is_ok_center")>0) {
                centers[cid]->add(*images[i]);
                if (sigmas) centers[cid+ncls]->addsquare(*images[i]);
                repr[cid]++;
        }
}

for (int i=0; i<ncls; i++) {
        // If this class is too small
        if (repr[i]<mininclass) {
                // find all of the particles in the class, and decrement their "is_ok_center" counter.
                // when it reaches zero the particle will no longer participate in determining the location of a center
                for (int j=0; j<nptcl; j++) {
                        if ((int)images[j]->get_attr("class_id")==i) images[i]->set_attr("is_ok_center",(int)images[i]->get_attr("is_ok_center")-1);
                }
                // Mark the center for reseeding
                delete centers[i];
                centers[i]=0;
                repr[i]=0;
        }
        // finishes off the statistics we started computing above
        else {
                centers[i]->mult((float)1.0/(float)(repr[i]));
                centers[i]->set_attr("ptcl_repr",repr[i]);
                if (sigmas) {
                        centers[i+ncls]->mult((float)1.0/(float)(repr[i]));             // sum of squares over n
                        centers[i+ncls]->subsquare(*centers[i]);                                        // subtract the mean value squared
                        centers[i+ncls]->process("math.sqrt");                                  // square root
                        centers[i+ncls]->mult((float)1.0/(float)sqrt((float)repr[i]));          // divide by sqrt(N) to get std. dev. of mean
                }

        }
        if (verbose>1) printf("%d(%d)\t",i,(int)repr[i]);
}

if (verbose>1) printf("\n");

reseed();

delete [] repr;
}

Member Data Documentation

Definition at line 296 of file analyzer.h.

Referenced by analyze(), and set_params().

vector<EMData *> EMAN::KMeansAnalyzer::centers [protected]

Definition at line 288 of file analyzer.h.

Referenced by analyze(), reclassify(), reseed(), and update_centers().

Definition at line 292 of file analyzer.h.

Referenced by analyze(), and set_params().

Definition at line 291 of file analyzer.h.

Referenced by analyze(), and set_params().

Definition at line 293 of file analyzer.h.

Referenced by analyze(), set_params(), and update_centers().

const string EMAN::KMeansAnalyzer::NAME = "kmeans" [static]

Definition at line 281 of file analyzer.h.

Referenced by get_name().

Definition at line 294 of file analyzer.h.

Referenced by analyze(), and reclassify().

int EMAN::KMeansAnalyzer::ncls [protected]

Definition at line 289 of file analyzer.h.

Referenced by analyze(), reclassify(), reseed(), set_params(), and update_centers().

Definition at line 295 of file analyzer.h.

Referenced by analyze(), and set_params().

Definition at line 290 of file analyzer.h.

Referenced by analyze(), set_params(), and update_centers().


The documentation for this class was generated from the following files: