Differences between revisions 2 and 12 (spanning 10 versions)
Revision 2 as of 2014-04-24 14:29:52
Size: 2147
Editor: SteveLudtke
Comment:
Revision 12 as of 2017-06-02 12:07:55
Size: 6054
Editor: SteveLudtke
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== Folder Arrangement in EMAN2.1 projects == == Folder Arrangement in EMAN2.2 projects ==
Line 3: Line 3:
When using e2projectmanager.py and following canonical EMAN2.1 procedures, your data will be contained within a "project" with a very specific organization. For things to work properly you must not break this organization. If you are just running EMAN2 command-line programs yourself, and are not using the projectmanager or other GUI tools, you can do what you like, of course, but regardless we strongly suggest following the canonical structure. === Detailed description of files ===
You will find a detailed description of the contents of many files produced by refinements and other programs here:
[[EMAN2/ProgramFiles|Output file descriptions and details]]

=== Overall project folder arrangement ===
When using e2projectmanager.py and following canonical EMAN2 procedures, your data will be contained within a "project" with a very specific organization. For things to work properly you must not break this organization. If you are just running EMAN2 command-line programs yourself, and are not using the projectmanager or other GUI tools, you can do what you like, of course, but regardless we strongly suggest following the canonical structure.
Line 10: Line 15:
0README.txt
Line 13: Line 19:
movies/ (optional)
Line 14: Line 21:
multi_xx
r2d_xx
refine_xx
multi_xx/
r2d_xx/
refine_xx/
Line 18: Line 25:
The '''micrographs/''' folder, if present, will contain the raw micrograph images. These are used for particle picking, whole frame CTF fitting, etc. If pre-boxed particles have been imported into the project, then this folder will not exist. Once particles have been extracted in the "Generate Output" step, the contents of this folder are not used again in the standard workflow.
Line 19: Line 27:
The particles/ folder will contain files like: The '''movies/''' folder, if present, will contain movie-mode stack files for each micrograph present in the micrographs/ folder. These movies can be aligned with e2ddd_movie or e2ddd_particles to produce drift-corrected micrographs or particle images (see the appropriate program Wiki pages for more information on this).

The '''particles/''' folder will contain files like:
Line 30: Line 41:
where the {{{__}}} (double underscore) denotes modifications of the same set of particles. That is, if you have a file {{{XXX.hdf}}} and {{{XXX__YYY.hdf}}}, both files will contain the same particles, but {{{XXX__YYY.hdf}}} may have undergone some processing. {{{XXX.hdf}}} should represent the original, unprocessed particle images.
Line 31: Line 43:
where the __ (double underscore) denotes modifications of the same set of particles. The '''info/''' folder will then contain files like:
Line 33: Line 45:
Sets/ will contain files like: {{{
01252014_AE_100_info.json
01252014_AE_101_info.json
01252014_AE_102_info.json
}}}
Where each file contains the information for a particular micrograph as documented http://blake.bcm.edu/emanwiki/Eman2InfoMetadata. These JSON files are generally human-readable and editable, though sometimes they will contain text-encoded binary data which can make parts of them difficult to read. The e2display.py browser can be used to look at the contents of .json files in a more organized fashion.

'''sets/''' will contain files like:
Line 40: Line 60:
Each of these files is a text file containing references to particles in the particles/ folder. A "set" is something like the '.star' files used in packages like Relion and XMIPP. That is, it allows you to combine particles from many micrographs without having to have two copies of the (disk hungry) image data. The contents of one of these ''.lst'' files will look like:
Line 41: Line 62:
and the contents of one of these ''.lst'' files will look like:
Line 46: Line 66:
0 particles/01252014_AE_100_ptcls.hdf
1 particles/01252014_AE_100_ptcls.hdf
2 particles/01252014_AE_100_ptcls.hdf
3 particles/01252014_AE_100_ptcls.hdf
4 particles/01252014_AE_100_ptcls.hdf
5 particles/01252014_AE_100_ptcls.hdf
0 particles/01252014_AE_100_ptcls.hdf
1 particles/01252014_AE_100_ptcls.hdf
2 particles/01252014_AE_100_ptcls.hdf
3 particles/01252014_AE_100_ptcls.hdf
4 particles/01252014_AE_100_ptcls.hdf
5 particles/01252014_AE_100_ptcls.hdf
Line 53: Line 73:
Note that the references to individual particles are from the project folder. Note that the references to individual particles are from the project folder. You can run ''e2proc2d.py'' or any other program on an LST file just as if it actually contained the images it references, but you MUST do this from the project folder, not from any other place. Hand-editing LSX files is not recommended, since every line must have exactly the same number of characters in it, or the file will become invalid.

The '''multi_xx/, r2d_xx/ and refine_xx/''' folders contain the results of e2refinemulti.py, e2refine2d.py and e2refine_easy.py runs, which are documented here: [[EMAN2/ProgramFiles]]

=== Why this structure ? ===
This structure is not arbitrary, and its logic has been carefully designed. If, for example, LST files contained an absolute path to a referenced image file, like ''/home/stevel/data/particles/abc.hdf'', then if I later decided to move my project to a different hard drive, the LST file would no longer work properly. Similarly, if the LST files in ''sets'' contained references to ''../particles/abc.hdf'', it would be confusing because that means you would have to run programs from within the ''sets'' directory for the references to be valid.

To deal with these and other issues, the overall structure is such that all references are made relative to the project folder, and it is expected that command-line programs will all be executed from the project folder. This makes the project the main organizational unit for data. A project folder can be moved around from disk to disk or machine to machine without anything breaking. Additional sub-folders can be made within a project, as long as the rules are followed, and programs are run from the project level. That is, with this scheme, you never have to ask yourself, "now what folder should I be in when I run my refinement?" The answer is always "the project folder".

Folder Arrangement in EMAN2.2 projects

Detailed description of files

You will find a detailed description of the contents of many files produced by refinements and other programs here: Output file descriptions and details

Overall project folder arrangement

When using e2projectmanager.py and following canonical EMAN2 procedures, your data will be contained within a "project" with a very specific organization. For things to work properly you must not break this organization. If you are just running EMAN2 command-line programs yourself, and are not using the projectmanager or other GUI tools, you can do what you like, of course, but regardless we strongly suggest following the canonical structure.

Note: It is critical when running command-line programs within a project that you run them from the project folder, not from subfolders. For example, if you are in the particles folder, then try to build a set by referencing ../sets you can create all sorts of havoc.

A project directory will normally contain these folders, and some other (unlisted) files:

0README.txt
info/
particles/
micrographs/  (optional)
movies/ (optional)
sets/
multi_xx/
r2d_xx/
refine_xx/

The micrographs/ folder, if present, will contain the raw micrograph images. These are used for particle picking, whole frame CTF fitting, etc. If pre-boxed particles have been imported into the project, then this folder will not exist. Once particles have been extracted in the "Generate Output" step, the contents of this folder are not used again in the standard workflow.

The movies/ folder, if present, will contain movie-mode stack files for each micrograph present in the micrographs/ folder. These movies can be aligned with e2ddd_movie or e2ddd_particles to produce drift-corrected micrographs or particle images (see the appropriate program Wiki pages for more information on this).

The particles/ folder will contain files like:

01252014_AE_100__ctf_flip.hdf
01252014_AE_100__ctf_flip_hp.hdf
01252014_AE_100__ctf_wiener.hdf
01252014_AE_100_ptcls.hdf
01252014_AE_101__ctf_flip.hdf
01252014_AE_101__ctf_flip_hp.hdf
01252014_AE_101__ctf_wiener.hdf
01252014_AE_101_ptcls.hdf

where the __ (double underscore) denotes modifications of the same set of particles. That is, if you have a file XXX.hdf and XXX__YYY.hdf, both files will contain the same particles, but XXX__YYY.hdf may have undergone some processing. XXX.hdf should represent the original, unprocessed particle images.

The info/ folder will then contain files like:

01252014_AE_100_info.json
01252014_AE_101_info.json
01252014_AE_102_info.json

Where each file contains the information for a particular micrograph as documented http://blake.bcm.edu/emanwiki/Eman2InfoMetadata. These JSON files are generally human-readable and editable, though sometimes they will contain text-encoded binary data which can make parts of them difficult to read. The e2display.py browser can be used to look at the contents of .json files in a more organized fashion.

sets/ will contain files like:

my_combine__ctf_flip_hp.lst
my_combine__ctf_flip.lst
my_combine__ctf_wiener.lst
my_combine_ptcls.lst

Each of these files is a text file containing references to particles in the particles/ folder. A "set" is something like the '.star' files used in packages like Relion and XMIPP. That is, it allows you to combine particles from many micrographs without having to have two copies of the (disk hungry) image data. The contents of one of these .lst files will look like:

#LSX
# This file is in fast LST format. All lines after the next line have exactly the number of characters shown on the next line. This MUST be preserved if editing.
# 47
0       particles/01252014_AE_100_ptcls.hdf
1       particles/01252014_AE_100_ptcls.hdf
2       particles/01252014_AE_100_ptcls.hdf
3       particles/01252014_AE_100_ptcls.hdf
4       particles/01252014_AE_100_ptcls.hdf
5       particles/01252014_AE_100_ptcls.hdf

Note that the references to individual particles are from the project folder. You can run e2proc2d.py or any other program on an LST file just as if it actually contained the images it references, but you MUST do this from the project folder, not from any other place. Hand-editing LSX files is not recommended, since every line must have exactly the same number of characters in it, or the file will become invalid.

The multi_xx/, r2d_xx/ and refine_xx/ folders contain the results of e2refinemulti.py, e2refine2d.py and e2refine_easy.py runs, which are documented here: EMAN2/ProgramFiles

Why this structure ?

This structure is not arbitrary, and its logic has been carefully designed. If, for example, LST files contained an absolute path to a referenced image file, like /home/stevel/data/particles/abc.hdf, then if I later decided to move my project to a different hard drive, the LST file would no longer work properly. Similarly, if the LST files in sets contained references to ../particles/abc.hdf, it would be confusing because that means you would have to run programs from within the sets directory for the references to be valid.

To deal with these and other issues, the overall structure is such that all references are made relative to the project folder, and it is expected that command-line programs will all be executed from the project folder. This makes the project the main organizational unit for data. A project folder can be moved around from disk to disk or machine to machine without anything breaking. Additional sub-folders can be made within a project, as long as the rules are followed, and programs are run from the project level. That is, with this scheme, you never have to ask yourself, "now what folder should I be in when I run my refinement?" The answer is always "the project folder".

EMAN2/DirectoryStructure (last edited 2018-12-09 14:38:28 by SteveLudtke)