12138
Comment:
|
20226
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
## page was renamed from EMAN2ImageFormats | |
Line 2: | Line 3: |
= Table of supported image formats in EMAN2 = ||Type ||Extension ||Read ||Write ||3D ||Image Stacks ||Volume Stacks ||Bit Trunc. ||Region I/O ||Comments || ||||||||||||||||<style="text-align:left">'''Primary EMAN2 Format''' || ||HDF5 ||hdf ||Y ||Y ||Y ||Y ||Y ||Y ||Y ||HDF5 is an international standard for scientific data (http://www.hdfgroup.org/HDF5/). It supports arbitrary metadata (header info) and is very portable. This is the standard interchange format for EMAN2. Chimera can read EMAN2 style HDF files. || ||||||||||||||||<style="text-align:left">'''Cryo-EM Formats''' || ||DM2 (Gatan) ||dm2 ||Y ||N ||N ||N ||N ||N ||N ||Proprietary Gatan format (older version) || ||DM3 (Gatan) ||dm3 ||Y ||N ||N ||N || ||N ||N ||Proprietary Gatan format from Digital Micrograph || ||DM4 (Gatan) ||dm4 ||Y ||N ||Y ||Y || ||N ||N ||Proprietary Gatan format from Digital Micrograph, used with K2 cameras || ||SER (FEI) ||ser ||Y ||N ||N ||Y || ||N ||N ||Proprietary FEI format (Falcon camera ?) || ||EER (TF) ||eer||Y ||N ||N ||Y ||N ||N ||N ||Falcon 4 camera counting mode format. Extremely large frame count with RLE compression to make frames very small. Supports up to 4x oversampling of counting data. Default reader is without oversampling. See below for details.|| ||EM ||em ||Y ||Y ||Y ||N || ||N ||Y ||As produced by the EM software package || ||ICOS ||icos ||Y ||Y ||Y ||N || ||N ||Y ||Old icosahedral format || ||Imagic ||img/hed ||Y ||Y ||Y ||Y ||Y ||N ||Y ||This format stores header and image data in 2 separate files. Region I/O is only available for 2D. The Imagic format in EMAN2 is fully compatible with Imagic4D standard since the 2.0 release. || ||MRC ||mrc ||Y ||Y ||N ||Y || ||N ||Y ||Largely compatible with CCP4. Note that some programs will treat 3D MRC files as stacks of 2D imagess (like IMOD). This behavior is partially supported in EMAN, but be aware that it is impossible to store metadata about each image in the stack when doing this, so it is not suitable as an export format for single particle work. EMAN2 supports reading of FEI MRC, which is an extended MRC format for tomography. The extra header information will be read into the header. All FEI MRC images will be 2-byte integer. || ||MRCS ||mrcs ||Y ||Y ||N ||Y ||soon? ||N ||Y ||Identical to MRC format above. If the filename is .mrcs, then a 3-D volume file will automatically be treated as a stack of 2-D images. If any other extension is used, it will appear to be a single 3-D volume. || ||Spider Stack ||spi ||Y ||Y ||Y ||Y || ||N ||Y ||To read the overall image header in a stacked spider file, use image_index = -1. || ||Spider Single ||spi ||Y ||Y ||Y ||N || ||N ||Y ||Specify "--outtype=spidersingle" to use with e2proc2d/3d || ||SER ||ser ||Y ||N ||N ||Y || ||N ||N ||Also known as TIA (Emospec) file format, used by FEI Tecnai and Titan microscope for acquiring and displaying scanned images and spectra || ||PIF ||pif ||Y ||Y ||Y ||Y || ||N ||N ||Purdue Image Format. This will read most, but not all PIF images. Recent support added for mode 40 and 46 (boxed particles). Some of the FFT formats cannot be read by EMAN2. PIF writing is normally done in FLOAT mode, which is not used very often in PIF. PIF technically permits only images with odd dimensions, EMAN does not enforce this. || ||BDB ||N/A ||Y ||Y ||Y ||Y || ||N ||Y ||This entry is for EMAN2's (retired) embedded database system. While it is still possible to read/write BDB's for backwards compatibility, we do not suggest any new use of this format in EMAN2 (SPARX still uses it for many operations) || ||||||||||||||||<style="text-align:left">'''Other Supported Formats''' || ||Amira ||am ||Y ||Y ||Y ||N || ||N ||N ||A native format for the Amira visualization package || ||DF3 ||df3 ||Y ||Y ||Y ||N || ||N ||N ||File format for POV-Ray, support 8,16,32 bit integer per pixel || ||FITS ||fts ||Y ||N ||Y ||N || ||N ||N ||Widely used file format in astronomy || ||JPEG ||jpg/jpeg ||N ||Y ||N ||N || ||N ||N ||Note that JPEG images use lossy compression and are NOT suitable for quantitative analysis. PNG (lossless compression) is a better alternative unless file size is of critical importance. || ||LST ||lst ||Y ||Y ||Y ||Y || ||N ||N ||ASCII file contains a list of image file names and numbers. Two variants, LST and LSX. LSX is normally used in EMAN2 and has the additional restraint that all lines have the same length. || ||LSTFAST ||lsx/lst ||Y ||Y ||Y ||Y || ||N ||N ||Optimized version of LST || ||OMAP ||omap ||Y ||N ||Y ||N || ||N ||N ||Also called DSN6 map, 1 byte integer per pixel || ||PGM ||pgm ||Y ||Y ||N ||N || ||N ||N ||Standard graphics format with 8 bit greyscale images. No compression. || ||PNG ||png ||Y ||Y ||N ||N || ||N ||N ||Excellent format for presentations. Lossless data compression, 8 bit or 16 bit per pixel || ||SAL ||hdr/img ||Y ||N ||N ||N || ||N ||N ||Scans-A-Lot. Old proprietary scanner format. Separate header and data file || ||SITUS ||situs ||Y ||Y ||Y ||N || ||N ||N ||Situs-specific ASCII format on a cubic lattice. Used by Situs programs || ||TIFF ||tiff/tif ||Y ||Y ||N ||N || ||N ||N ||Good format for use with programs like photoshop. Some variants are good for quantitative analysis, but JPEG compression should be avoided. || ||V4L ||v4l ||Y ||N ||N ||N || ||N ||N ||Used by some video-capture boards in Linux. Acquires images from the V4L2 interface in real-time(video4linux). || ||VTK ||vtk ||Y ||Y ||Y ||N || ||N ||N ||Native format from Visualization Toolkit || ||XPLOR ||xplor ||Y ||Y ||Y ||N || ||N ||N ||8 bytes integer, 12.5E float ASCII format || |
|
Line 3: | Line 40: |
= Table of supported image formats in EMAN2 = ||Type ||Extension ||Read ||Write ||3D ||Image Stacks ||Region I/O ||Comments || ||||||||||||||||<(>'''Primary EMAN2 Format'''|| ||HDF5 ||hdf ||Y ||Y ||Y ||Y ||Y ||HDF5 is an international standard for scientific data (http://www.hdfgroup.org/HDF5/). It supports arbitrary metadata (header info) and is very portable. This is the standard interchange format for EMAN2. Chimera can read EMAN2 style HDF files. || ||||||||||||||||<(>'''Cryo-EM Formats'''|| ||DM2 (Gatan)||dm2 ||Y ||N ||N ||N ||N ||Proprietary Gatan format (older version) || ||DM3 (Gatan) ||dm3 ||Y ||N ||N ||N ||N ||Proprietary Gatan format from Digital Micrograph || ||DM4 (Gatan) ||dm4 ||Y ||N ||Y ||Y ||N ||Proprietary Gatan format from Digital Micrograph, used with K2 cameras|| ||SER (FEI) ||ser ||Y ||N ||N ||Y ||N ||Proprietary FEI format (Falcon camera ?)|| ||EM ||em ||Y ||Y ||Y ||N ||Y ||As produced by the EM software package || ||ICOS ||icos ||Y ||Y ||Y ||N ||Y ||Old icosahedral format || ||Imagic ||img/hed ||Y ||Y ||Y ||Y ||Y ||This format stores header and image data in 2 separate files. Region I/O is only available for 2D. The Imagic format in EMAN2 is fully compatible with Imagic4D standard since the 2.0 release. || ||MRC ||mrc ||Y ||Y ||Y ||N ||Y ||Largely compatible with CCP4. Note that some programs will treat 3D MRC files as stacks of 2D imagess (like IMOD). This behavior is partially supported in EMAN, but be aware that it is impossible to store metadata about each image in the stack when doing this, so it is not suitable as an export format for single particle work. EMAN2 support reading of FEI MRC, which is an extended MRC format for tomography. The extra header information will be read into the header. All FEI MRC images will be 2-byte integer. || ||Spider ||spi ||Y ||Y ||Y ||Y ||Y ||To read the overall image header in a stacked spider file, use image_index = -1. || ||SER ||ser ||Y ||N ||N ||Y ||N ||Also known as TIA (Emospec) file format, used by FEI Tecnai and Titan microscope for acquiring and displaying scanned images and spectra || ||BDB ||N/A ||Y ||Y ||Y ||Y ||Y ||This entry is for EMAN2's (retired) embedded database system. While it is still possible to read/write BDB's for backwards compatibility, we do not suggest any new use of this format in EMAN2 (SPARX still uses it for many operations) || ||||||||||||||||<(>'''Other Supported Formats'''|| ||Amira ||am ||Y ||Y ||Y ||N ||N ||A native format for the Amira visualization package || ||DF3 ||df3 ||Y ||Y ||Y ||N ||N ||File format for POV-Ray, support 8,16,32 bit integer per pixel || ||FITS ||fts ||Y ||N ||Y ||N ||N ||Widely used file format in astronomy || ||JPEG ||jpg/jpeg ||N ||Y ||N ||N ||N ||Note that JPEG images use lossy compression and are NOT suitable for quantitative analysis. PNG (lossless compression) is a better alternative unless file size is of critical importance. || ||LST ||lst ||Y ||Y ||Y ||Y ||N ||ASCII file contains a list of image file names and numbers. Used in EMAN1 to avoid large files. Not commonly used in EMAN2 || ||LSTFAST ||lsx/lst ||Y ||Y ||Y ||Y ||N ||Optomized version of LST || ||OMAP ||omap ||Y ||N ||Y ||N ||N ||Also called DSN6 map, 1 byte integer per pixel || ||PGM ||pgm ||Y ||Y ||N ||N ||N ||Standard graphics format with 8 bit greyscale images. No compression. || ||PIF ||pif ||Y ||Y ||Y ||Y ||N ||Purdue Image Format. This will read most, but not all PIF images. Recent support added for mode 40 and 46 (boxed particles). Some of the FFT formats cannot be read by EMAN2. PIF writing is normally done in FLOAT mode, which is not used very often in PIF. PIF technically permits only images with odd dimensions, EMAN does not enforce this. || ||PNG ||png ||Y ||Y ||N ||N ||N ||Excellent format for presentations. Lossless data compression, 8 bit or 16 bit per pixel || ||SAL ||hdr/img ||Y ||N ||N ||N ||N ||Scans-A-Lot. Old proprietary scanner format. Separate header and data file || ||SITUS ||situs ||Y ||Y ||Y ||N ||N ||Situs-specific ASCII format on a cubic lattice. Used by Situs programs || ||TIFF ||tiff/tif ||Y ||Y ||N ||N ||N ||Good format for use with programs like photoshop. Some variants are good for quantitative analysis, but JPEG compression should be avoided. || ||V4L ||v4l ||Y ||N ||N ||N ||N ||Used by some video-capture boards in Linux. Acquires images from the V4L2 interface in real-time(video4linux). || ||VTK ||vtk ||Y ||Y ||Y ||N ||N ||Native format from Visualization Toolkit || ||XPLOR ||xplor ||Y ||Y ||Y ||N ||N ||8 bytes integer, 12.5E float ASCII format || |
== Image files in EMAN == Virtually all cryo-EM file formats are supported as well as many generic image formats. The default format used in EMAN2 processing is HDF5, which supports stacks of 2-D and 3-D images as well as arbitrary header information for each image in the file. If you convert an image to a format like MRC, you will lose any metadata not compatible with that format. * '''Any''' program in EMAN2 should directly read '''any''' supported file format without conversion. (Specific programs may require header information not available in all formats) * '''Most''' programs can write images to '''any''' output file format, determined by the filename you use. However, we strongly suggest using HDF unless you are transferring data to other software, as any other format will lose header information. * ''e2proc2d.py'' and ''e2proc3d.py'' can be used to explicitly convert files among specified file formats with specific data types, and are also used for general-purpose image processing. * ''e2display.py'' and ''e2projectmanager.py'' (via the file browser) can be used with the ''Save as'' button to convert to arbitrary output formats using a graphical interface. * '''MRC/CCP4''' have a number of special issues, please see the appropriate section below. == Compression and Bit Truncation == EMAN2 supports bit truncation with lossless compression as a mechanism for reducing file size without information loss. The file size reductions can be quite dramatic for raw data. Compression is currently supported only in HDF5 format, but bit truncation is supported for all formats, and bit-truncated files can be very effectively (losslessly) compressed with command-line tools, such as gzip, bzip2 or similar. The main use case for bit truncation is gain normalized movie averages being stored as 32 bit floating point values (or even 16 bit integers). For raw counting mode movie data collected on direct detectors, we recommend using the manufacturer's recommended storage mechanism (compressed TIFF, EER, etc.). Bit truncation is only appropriate for movie averages, not for the individual movie frames. === Using Bit Truncation === * Most programs in the normal EMAN single particle and subtomogram averaging pipelines will automatically perform bit truncation at save levels by default. * Many programs accept the --compressbits command line option, which will perform truncation to the specified number of bits. compressbits=0 will disable bit truncation, but in HDF5 will still perform lossless compression of the raw floating point numbers * In EMAN2 versions after 6/15/2022 you may override any defaults for any output file by replacing <filename> with <filename>:bits[:min:max] * if [:min:max] is omitted, default behavior is to clamp outliers to reduced values to retain sufficient histogram spread in the integer representation. Only a small number of outliers (1/20k on each end) will be removed. * [:min:max] represent the minimum and maximum values which will be represented by the integer range, and may be specified in several different ways * [:absolute min:absolute max] specified as actual image values, eg - :-4.5:4.5 would extend from -4.5 to 4.5 in actual image values * [:s<nmin>:s<nmax>] specified as a multiple of the standard deviation from the mean, eg - :s3:s3 would extend from mean-3*sigma to mean+3*sigma * [:f] will use the full range of image values, including outliers. This is appropriate for images such as masks, but should be avoided with raw CryoEM data. Examples: * e2proc2d.py particles.mrcs particles.hdf:5 - convert a MRC particle stack to HDF with 5 bits of retained precision and automatic removal of extreme outliers. ~5-6x storage reduction typical * e2proc3d.py tomogram.mrc tomogram.hdf:10:3s:3s - convert a MRC tomogram to HDF with 10 bits of precision, truncating data outside +-3*sigma * e2proc3d.py mask.mrc mask_10.hdf:10:f - convert a soft mask (0-1.0) to 10 bits, retaining the full range of values. * e2proc3d.py map.mrc map_10.mrc:10:f - convert a reconstructed map to a 10 bit representation in MRC format. This will immediately reduce file size by 2x if the original volume was floating point. gzip/bzip/etc would produce significant additional compression. Note, however, that the new MRC volume would be mapped to a 0-1023 range rather than the original floating point range. In HDF format, the original range is restored upon read in EMAN2. === General recommendations === * For movie averaged/gain corrected/aligned frames, 5 bits should be a safe level of retention for virtually any situation. * For for class-averages or 3-D volumes, typically 8-12 bits is appropriate, depending on your precision needs. * Note that conversion to integers may cause the transition from "almost exactly zero" to "exactly zero" in soft masks to appear to shift somewhat, but in practice this generally has no impact. * For raw (unfiltered) cryo-tomograms, 5 bits is safe, similar to raw micrographs. After low-pass filtration/denoising, arguments could be made for preserving more bits, though the actual mathematical need to do so is questionable. * For negative stain data, arguments for retaining up to 8 bits could be made if very high acquisition doses are used, though impact on data interpretation is unlikey. |
Line 42: | Line 82: |
== Special issues for MRC/CCP4 files == MRC/CCP4 format supports a single 1-D, 2-D or 3-D image, with an associated header. At some point in time, someone decided it would be a good idea to store sets of 2-D particle images as "stacks" in 3-D. That is, a set of NZ identically sized NX x NY images are stacked to make a single 3-D pseudo-volume image. The problem is that the original format was not designed for this, and there is no consistent way a program can tell if an MRC file contains a true volume (like a 3-D reconstruction) or a stack of 2-D images. While a number of developers have recently agreed upon a standard way of doing this in future, the last 30 years of files floating around in the community don't have this information stored in a consistent way. So, EMAN2 programs requiring a set of 2-D particle images will only read MRC stack files in specific situations. |
== Special issues for MRC/MRCS/CCP4 files == MRC/CCP4 format supports a single 1-D, 2-D or 3-D image, with an associated header. At some point in time, someone decided it would be a good idea to store sets of 2-D particle images as "stacks" in 3-D. That is, a set of NZ identically sized NX x NY images are stacked to make a single 3-D pseudo-volume image. The problem is that the original format was not designed for this, and historically there was no consistent way a program could tell if an MRC file contains a true volume (like a 3-D reconstruction) or a stack of 2-D images. While a number of developers have recently agreed upon a standard way of doing this in future, the last 30 years of files floating around in the community don't have this information stored in a consistent way. As of EMAN2.1, stack files should use the ".mrcs" extension and single volumes should use the ".mrc" extension. ".mrc" files will always be read as if they contain a single image, and ".mrcs" files can never be 3-D. This may evolve in the future as the new standards become more refined. |
Line 45: | Line 85: |
To convert or operate on such 2-D -> 3-D stack files, ''e2proc2d.py'' has a number of special options. These options are compatible with most other options in ''e2proc2d.py'' : | Additionally, there are options in the e2proc2d.py command which will treat single volume files as stacks of images without the .mrcs extension (again, if you just use the .mrcs extension, these methods should not be required) : |
Line 50: | Line 91: |
''Note: The term ''stack file'' is used universally to refer to a single file containing multiple images, regardless of whether they are 'stacked' into a single 3-D image, or actually a set of 2-D images, each with its own header. It is even possible in some formats (HDF and IMAGIC) to have stacks of 3-D volumes. '' | These options can also be used with other file formats. == Special issues for EER files == EER is a specialized format for the Falcon4 camera which records actual pixel events at a very high effective framerate. Individual frames are RLE encoded, so despite storing up to 4x superresolution counting mode images, file size is still smaller than a gain corrected MRC stack averaged to 30 FPS. The default reader will operate without oversampling. You will need to specify an option, --eer2x or --eer4x, with e2proc2d.py to read super-resolution data instead (8k x 8k or 16k x 16k). To make use of these files you normally also need to have the appropriate gain reference image from the Falcon 4, which at the time of this writing is stored in FEIRAW format. There is a program in examples, which can convert FEIRAW files to any other format you like, but they aren't natively read by other EMAN2 programs. Several bugs were fixed in mid-October 2020, so it is critical that you use a version dated 10/22/20 or later! Here is a workflow for processing gain references with EER files: {{{#!highlight bash examples/feiraw2hdf.py gain_post_ec_eer.raw gain_post_ec_eer.hdf e2proc2d.py gain_post_ec_eer.hdf gain_norm.hdf --process math.reciprocal e2proc2d.py myimage.eer moviestack.mrcs --avgseq 60 e2proc2d.py moviestack.mrcs moviestack.mrcs --inplace --mult gain_norm.hdf }}} * It may be possible to merge steps 3 and 4. * there was previously a --translate 1,0 in step 2, but that should no longer be necessary. If you observe a problem which seems to be due to the gain image being translationally misaligned, please report it on the Google group * The ''--avgseq 60'' options says to average sets of 60 frames in the original EER file, so an 1800 frame EER would produce a 30 frame normalized movie. 60 can be changed to any desired value. |
Line 53: | Line 115: |
Line 57: | Line 118: |
The specification for reading/writing images is: {{{#!python # note that optional arguments [ ] below require all previous arguments to be specified # Read multiple images at once, class-method imagelist=EMData.read_images(filename,[image#_list],[header_only]) # Read a single image img=EMData() img.read_image(filename,[image#],[header_only],[Region]) # or img=EMData(filename,[image#],[header_only]) # write a single image EMData.write_image(filename,image#,[filetype],[header_only],[Region],[Datatype]) }}} Where filename, is the name of the file containing the image data, in any supported format, image# is the zero-indexed image number within the file, image#_list is a python list or tuple of image numbers, header_only is a boolean flag indicating that only the header should be read/written from/to the file, Region is a Region(x0,y0,xsize,ysize) or Region(x0,y0,z0,xsize,ysize,zsize) object. Filetype can be : ''IMAGE_UNKNOWN, IMAGE_AMIRA, IMAGE_IMAGIC, IMAGE_PIF, IMAGE_SPIDER, IMAGE_VTK, IMAGE_DM3, IMAGE_GATAN2, IMAGE_LST, IMAGE_PNG, IMAGE_TIFF, IMAGE_XPLOR, IMAGE_DM4, IMAGE_HDF, IMAGE_MRC, IMAGE_SAL, IMAGE_EM, IMAGE_ICOS, IMAGE_PGM, IMAGE_SINGLE_SPIDER, IMAGE_V4L''. If IMAGE_UNKNOWN is used on write, then the file extension will be used to determine the filetype. Note that since MRC format does not distinguish between 3-D volumes and stacks of 2-D images, the '.mrcs' extension MUST be used for stack files, and the '.mrc' extension MUST be used for non-stack volume data. Datatype can be: ''EM_CHAR, EM_FLOAT, EM_INT, EM_UINT, EM_USHORT, EM_DOUBLE, EM_FLOAT_COMPLEX, EM_SHORT, EM_UCHAR, EM_USHORT_COMPLEX, EM_SHORT_COMPLEX''. While SHORT_COMPLEX types are defined, they should never be actually used. FLOAT_COMPLEX is only really usable for HDF files. Strongly suggest not reading/writing complex images, and simply recomputing the FFT instead. Not all file formats support all data types! If no image#_list is specified to read_images, then ALL images in the file will be read in. |
|
Line 59: | Line 146: |
# Create a new EMData object and initializes it with the first image in "myimage.hdf". | # Create a new EMData object and initializes it with the first image in "myimage.hdf". |
Line 76: | Line 163: |
# Create a new EMData object with ONLY HEADER INFORMATION from the 5th image | # Create a new EMData object with ONLY HEADER INFORMATION from the 5th image |
Line 82: | Line 169: |
Line 84: | Line 170: |
Region I/O permits reading or writing sub-images/volumes from within a file. This is useful when processing huge files (like full 4k tomograms) on machines with limited RAM. The region specification is the same as in EMData::get_clip() function. For region reading, it is possible to specify extending outside the actual image dimensions (missing areas are filled with 0), though this generally isn't a good idea. For region writing, the region must be completely inside image bounds. | Region I/O permits reading or writing sub-images/volumes from within a file. It is not supported for all file formats. This is useful when processing huge files (like full 4k tomograms) on machines with limited RAM. For region reading, it is possible to specify a Region extending outside the actual image dimensions, though this generally isn't a good idea. For region writing, the region must be completely inside image bounds. |
Line 93: | Line 179: |
Line 96: | Line 181: |
Line 103: | Line 187: |
img.write_image('short-image.mrc', 0, EMUtil.ImageType.IMAGE_MRC, False, None, EMUtil.EMDataType.EM_SHORT) #write mrc file in short (16bit) img.write_image('byte-image.mrc', 0, EMUtil.ImageType.IMAGE_MRC, False, None, EMUtil.EMDataType.EM_UCHAR) #write mrc file in byte (8bit) |
img.write_image('short-image.mrc', 0, IMAGE_MRC, False, None, EM_SHORT) #write mrc file in short (16bit) img.write_image('byte-image.mrc', 0, IMAGE_MRC, False, None, EM_UCHAR) #write mrc file in byte (8bit) img.write_image('byte-image.spi', 0, IMAGE_UNKNOWN, False, None, EM_FLOAT) #write mrc file in byte (8bit) |
Line 106: | Line 191: |
In the last write_image() funciton call, 'byte-image.mrc' is the output file name. The second argument is the index of the image in stack files. MRC format only supports Z-stacks, not multiple image stacks, so this number must always be 0 for MRC. The third argument, EMUtil.ImageType.IMAGE_MRC is the file type you are writing to, type 'help(EMUtil.ImageType)' in python will print out all types supported by the specific version of EMAN2 you are using. The fourth argument, False tells it to write both header and image data. If this were true, only the header would be written. The fifth argument, None means we are not doing Region I/O. This could optionally be a ''Region()'' object specifying only part of the image data should be written to disk (any other data being left unchanged). The last argument, EMUtil.EMDataType.EM_UCHAR specify the data storage type for this image file. = WRITING TO THE HEADER OF AN IMAGE = For some reason, you have to specify the type of an image via a monstrous flag if you want to write to the header of the image without actually opening/loading it. Say you load ONLY the header of an image (in python) by doing: {{{ a=EMData('myimage.hdf',0,True) <--- "0" means "load the first image in the file/stack", while "True" means "load the header only". }}} And then you define a new header parameter: {{{ a['my_new_parameter'] = 'whatever_value' }}} To write out the new header into the image, you cannot simply say ''a.write_image('myimage.hdf',0,True)'', but actually have to do it this way: {{{ img.write_image("test.hdf",0,EMUtil.ImageType.IMAGE_HDF,True) }}} ''EMUtil.ImageType.IMAGE_HDF'' can either correspond to the image format you are writing to, or be left undefined: ''EMUtil.ImageType.IMAGE_UNKNWON'', '''BUT''' you have to write it out nonetheless (it is what it is...). Note that this is NOT the case if you load the ENTIRE image (opposed to just the header). If you load the ENTIRE image, you can reasonably set or reset a value on the header as follows: {{{ img=EMData('my_file.hdf',0) img['my_parameter']=whatever_value img.write_image('test.hdf',0) }}} |
Table of supported image formats in EMAN2
Type |
Extension |
Read |
Write |
3D |
Image Stacks |
Volume Stacks |
Bit Trunc. |
Region I/O |
Comments |
Primary EMAN2 Format |
|||||||||
HDF5 |
hdf |
Y |
Y |
Y |
Y |
Y |
Y |
Y |
HDF5 is an international standard for scientific data (http://www.hdfgroup.org/HDF5/). It supports arbitrary metadata (header info) and is very portable. This is the standard interchange format for EMAN2. Chimera can read EMAN2 style HDF files. |
Cryo-EM Formats |
|||||||||
DM2 (Gatan) |
dm2 |
Y |
N |
N |
N |
N |
N |
N |
Proprietary Gatan format (older version) |
DM3 (Gatan) |
dm3 |
Y |
N |
N |
N |
|
N |
N |
Proprietary Gatan format from Digital Micrograph |
DM4 (Gatan) |
dm4 |
Y |
N |
Y |
Y |
|
N |
N |
Proprietary Gatan format from Digital Micrograph, used with K2 cameras |
SER (FEI) |
ser |
Y |
N |
N |
Y |
|
N |
N |
Proprietary FEI format (Falcon camera ?) |
EER (TF) |
eer |
Y |
N |
N |
Y |
N |
N |
N |
Falcon 4 camera counting mode format. Extremely large frame count with RLE compression to make frames very small. Supports up to 4x oversampling of counting data. Default reader is without oversampling. See below for details. |
EM |
em |
Y |
Y |
Y |
N |
|
N |
Y |
As produced by the EM software package |
ICOS |
icos |
Y |
Y |
Y |
N |
|
N |
Y |
Old icosahedral format |
Imagic |
img/hed |
Y |
Y |
Y |
Y |
Y |
N |
Y |
This format stores header and image data in 2 separate files. Region I/O is only available for 2D. The Imagic format in EMAN2 is fully compatible with Imagic4D standard since the 2.0 release. |
MRC |
mrc |
Y |
Y |
N |
Y |
|
N |
Y |
Largely compatible with CCP4. Note that some programs will treat 3D MRC files as stacks of 2D imagess (like IMOD). This behavior is partially supported in EMAN, but be aware that it is impossible to store metadata about each image in the stack when doing this, so it is not suitable as an export format for single particle work. EMAN2 supports reading of FEI MRC, which is an extended MRC format for tomography. The extra header information will be read into the header. All FEI MRC images will be 2-byte integer. |
MRCS |
mrcs |
Y |
Y |
N |
Y |
soon? |
N |
Y |
Identical to MRC format above. If the filename is .mrcs, then a 3-D volume file will automatically be treated as a stack of 2-D images. If any other extension is used, it will appear to be a single 3-D volume. |
Spider Stack |
spi |
Y |
Y |
Y |
Y |
|
N |
Y |
To read the overall image header in a stacked spider file, use image_index = -1. |
Spider Single |
spi |
Y |
Y |
Y |
N |
|
N |
Y |
Specify "--outtype=spidersingle" to use with e2proc2d/3d |
SER |
ser |
Y |
N |
N |
Y |
|
N |
N |
Also known as TIA (Emospec) file format, used by FEI Tecnai and Titan microscope for acquiring and displaying scanned images and spectra |
PIF |
pif |
Y |
Y |
Y |
Y |
|
N |
N |
Purdue Image Format. This will read most, but not all PIF images. Recent support added for mode 40 and 46 (boxed particles). Some of the FFT formats cannot be read by EMAN2. PIF writing is normally done in FLOAT mode, which is not used very often in PIF. PIF technically permits only images with odd dimensions, EMAN does not enforce this. |
BDB |
N/A |
Y |
Y |
Y |
Y |
|
N |
Y |
This entry is for EMAN2's (retired) embedded database system. While it is still possible to read/write BDB's for backwards compatibility, we do not suggest any new use of this format in EMAN2 (SPARX still uses it for many operations) |
Other Supported Formats |
|||||||||
Amira |
am |
Y |
Y |
Y |
N |
|
N |
N |
A native format for the Amira visualization package |
DF3 |
df3 |
Y |
Y |
Y |
N |
|
N |
N |
File format for POV-Ray, support 8,16,32 bit integer per pixel |
FITS |
fts |
Y |
N |
Y |
N |
|
N |
N |
Widely used file format in astronomy |
JPEG |
jpg/jpeg |
N |
Y |
N |
N |
|
N |
N |
Note that JPEG images use lossy compression and are NOT suitable for quantitative analysis. PNG (lossless compression) is a better alternative unless file size is of critical importance. |
LST |
lst |
Y |
Y |
Y |
Y |
|
N |
N |
ASCII file contains a list of image file names and numbers. Two variants, LST and LSX. LSX is normally used in EMAN2 and has the additional restraint that all lines have the same length. |
LSTFAST |
lsx/lst |
Y |
Y |
Y |
Y |
|
N |
N |
Optimized version of LST |
OMAP |
omap |
Y |
N |
Y |
N |
|
N |
N |
Also called DSN6 map, 1 byte integer per pixel |
PGM |
pgm |
Y |
Y |
N |
N |
|
N |
N |
Standard graphics format with 8 bit greyscale images. No compression. |
PNG |
png |
Y |
Y |
N |
N |
|
N |
N |
Excellent format for presentations. Lossless data compression, 8 bit or 16 bit per pixel |
SAL |
hdr/img |
Y |
N |
N |
N |
|
N |
N |
Scans-A-Lot. Old proprietary scanner format. Separate header and data file |
SITUS |
situs |
Y |
Y |
Y |
N |
|
N |
N |
Situs-specific ASCII format on a cubic lattice. Used by Situs programs |
TIFF |
tiff/tif |
Y |
Y |
N |
N |
|
N |
N |
Good format for use with programs like photoshop. Some variants are good for quantitative analysis, but JPEG compression should be avoided. |
V4L |
v4l |
Y |
N |
N |
N |
|
N |
N |
Used by some video-capture boards in Linux. Acquires images from the V4L2 interface in real-time(video4linux). |
VTK |
vtk |
Y |
Y |
Y |
N |
|
N |
N |
Native format from Visualization Toolkit |
XPLOR |
xplor |
Y |
Y |
Y |
N |
|
N |
N |
8 bytes integer, 12.5E float ASCII format |
Image files in EMAN
Virtually all cryo-EM file formats are supported as well as many generic image formats. The default format used in EMAN2 processing is HDF5, which supports stacks of 2-D and 3-D images as well as arbitrary header information for each image in the file. If you convert an image to a format like MRC, you will lose any metadata not compatible with that format.
Any program in EMAN2 should directly read any supported file format without conversion. (Specific programs may require header information not available in all formats)
Most programs can write images to any output file format, determined by the filename you use. However, we strongly suggest using HDF unless you are transferring data to other software, as any other format will lose header information.
e2proc2d.py and e2proc3d.py can be used to explicitly convert files among specified file formats with specific data types, and are also used for general-purpose image processing.
e2display.py and e2projectmanager.py (via the file browser) can be used with the Save as button to convert to arbitrary output formats using a graphical interface.
MRC/CCP4 have a number of special issues, please see the appropriate section below.
Compression and Bit Truncation
EMAN2 supports bit truncation with lossless compression as a mechanism for reducing file size without information loss. The file size reductions can be quite dramatic for raw data. Compression is currently supported only in HDF5 format, but bit truncation is supported for all formats, and bit-truncated files can be very effectively (losslessly) compressed with command-line tools, such as gzip, bzip2 or similar. The main use case for bit truncation is gain normalized movie averages being stored as 32 bit floating point values (or even 16 bit integers).
For raw counting mode movie data collected on direct detectors, we recommend using the manufacturer's recommended storage mechanism (compressed TIFF, EER, etc.). Bit truncation is only appropriate for movie averages, not for the individual movie frames.
Using Bit Truncation
- Most programs in the normal EMAN single particle and subtomogram averaging pipelines will automatically perform bit truncation at save levels by default.
- Many programs accept the --compressbits command line option, which will perform truncation to the specified number of bits. compressbits=0 will disable bit truncation, but in HDF5 will still perform lossless compression of the raw floating point numbers
In EMAN2 versions after 6/15/2022 you may override any defaults for any output file by replacing <filename> with <filename>:bits[:min:max]
- if [:min:max] is omitted, default behavior is to clamp outliers to reduced values to retain sufficient histogram spread in the integer representation. Only a small number of outliers (1/20k on each end) will be removed.
- [:min:max] represent the minimum and maximum values which will be represented by the integer range, and may be specified in several different ways
- [:absolute min:absolute max] specified as actual image values, eg - :-4.5:4.5 would extend from -4.5 to 4.5 in actual image values
[:s<nmin>:s<nmax>] specified as a multiple of the standard deviation from the mean, eg - :s3:s3 would extend from mean-3*sigma to mean+3*sigma
- [:f] will use the full range of image values, including outliers. This is appropriate for images such as masks, but should be avoided with raw CryoEM data.
Examples:
- e2proc2d.py particles.mrcs particles.hdf:5 - convert a MRC particle stack to HDF with 5 bits of retained precision and automatic removal of extreme outliers. ~5-6x storage reduction typical
- e2proc3d.py tomogram.mrc tomogram.hdf:10:3s:3s - convert a MRC tomogram to HDF with 10 bits of precision, truncating data outside +-3*sigma
- e2proc3d.py mask.mrc mask_10.hdf:10:f - convert a soft mask (0-1.0) to 10 bits, retaining the full range of values.
- e2proc3d.py map.mrc map_10.mrc:10:f - convert a reconstructed map to a 10 bit representation in MRC format. This will immediately reduce file size by 2x if the original volume was floating point. gzip/bzip/etc would produce significant additional compression. Note, however, that the new MRC volume would be mapped to a 0-1023 range rather than the original floating point range. In HDF format, the original range is restored upon read in EMAN2.
General recommendations
- For movie averaged/gain corrected/aligned frames, 5 bits should be a safe level of retention for virtually any situation.
- For for class-averages or 3-D volumes, typically 8-12 bits is appropriate, depending on your precision needs.
- Note that conversion to integers may cause the transition from "almost exactly zero" to "exactly zero" in soft masks to appear to shift somewhat, but in practice this generally has no impact.
- For raw (unfiltered) cryo-tomograms, 5 bits is safe, similar to raw micrographs. After low-pass filtration/denoising, arguments could be made for preserving more bits, though the actual mathematical need to do so is questionable.
- For negative stain data, arguments for retaining up to 8 bits could be made if very high acquisition doses are used, though impact on data interpretation is unlikey.
File format conversions
- Most EMAN2 programs will read and write any of the formats above directly without conversion.
e2proc2d.py and e2proc3d.py provide options for saving in specific formats with specific data modes.
The Save as button in the e2display.py and e2projectmanager.py browsers can be used to save into an arbitrary format.
Special issues for MRC/MRCS/CCP4 files
MRC/CCP4 format supports a single 1-D, 2-D or 3-D image, with an associated header. At some point in time, someone decided it would be a good idea to store sets of 2-D particle images as "stacks" in 3-D. That is, a set of NZ identically sized NX x NY images are stacked to make a single 3-D pseudo-volume image. The problem is that the original format was not designed for this, and historically there was no consistent way a program could tell if an MRC file contains a true volume (like a 3-D reconstruction) or a stack of 2-D images. While a number of developers have recently agreed upon a standard way of doing this in future, the last 30 years of files floating around in the community don't have this information stored in a consistent way. As of EMAN2.1, stack files should use the ".mrcs" extension and single volumes should use the ".mrc" extension. ".mrc" files will always be read as if they contain a single image, and ".mrcs" files can never be 3-D. This may evolve in the future as the new standards become more refined.
Additionally, there are options in the e2proc2d.py command which will treat single volume files as stacks of images without the .mrcs extension (again, if you just use the .mrcs extension, these methods should not be required) :
--threed2twod - reads an MRC-style stack file and outputs to a 'normal' set of 2-D images (used by all other file formats supporting multiple images)
--twod2threed - reads a set of 2-D images and outputs an MRC-style stack
--threed2threed - reads an MRC-style stack file and outputs to another MRC-style stack file
These options can also be used with other file formats.
Special issues for EER files
EER is a specialized format for the Falcon4 camera which records actual pixel events at a very high effective framerate. Individual frames are RLE encoded, so despite storing up to 4x superresolution counting mode images, file size is still smaller than a gain corrected MRC stack averaged to 30 FPS.
The default reader will operate without oversampling. You will need to specify an option, --eer2x or --eer4x, with e2proc2d.py to read super-resolution data instead (8k x 8k or 16k x 16k).
To make use of these files you normally also need to have the appropriate gain reference image from the Falcon 4, which at the time of this writing is stored in FEIRAW format. There is a program in examples, which can convert FEIRAW files to any other format you like, but they aren't natively read by other EMAN2 programs.
Several bugs were fixed in mid-October 2020, so it is critical that you use a version dated 10/22/20 or later!
Here is a workflow for processing gain references with EER files:
- It may be possible to merge steps 3 and 4.
- there was previously a --translate 1,0 in step 2, but that should no longer be necessary. If you observe a problem which seems to be due to the gain image being translationally misaligned, please report it on the Google group
The --avgseq 60 options says to average sets of 60 frames in the original EER file, so an 1800 frame EER would produce a 30 frame normalized movie. 60 can be changed to any desired value.
Reading and Writing images in Python (for programmers)
The main image object in EMAN2 is called EMData(). EMData objects represent an image in an arbitrary file format in the computer's memory, with arbitrary associated tag-based metadata.
Simple Image Reading/Writing
The specification for reading/writing images is:
1 # note that optional arguments [ ] below require all previous arguments to be specified
2
3 # Read multiple images at once, class-method
4 imagelist=EMData.read_images(filename,[image#_list],[header_only])
5
6 # Read a single image
7 img=EMData()
8 img.read_image(filename,[image#],[header_only],[Region])
9 # or
10 img=EMData(filename,[image#],[header_only])
11
12
13 # write a single image
14 EMData.write_image(filename,image#,[filetype],[header_only],[Region],[Datatype])
Where filename, is the name of the file containing the image data, in any supported format, image# is the zero-indexed image number within the file, image#_list is a python list or tuple of image numbers, header_only is a boolean flag indicating that only the header should be read/written from/to the file, Region is a Region(x0,y0,xsize,ysize) or Region(x0,y0,z0,xsize,ysize,zsize) object.
Filetype can be : IMAGE_UNKNOWN, IMAGE_AMIRA, IMAGE_IMAGIC, IMAGE_PIF, IMAGE_SPIDER, IMAGE_VTK, IMAGE_DM3, IMAGE_GATAN2, IMAGE_LST, IMAGE_PNG, IMAGE_TIFF, IMAGE_XPLOR, IMAGE_DM4, IMAGE_HDF, IMAGE_MRC, IMAGE_SAL, IMAGE_EM, IMAGE_ICOS, IMAGE_PGM, IMAGE_SINGLE_SPIDER, IMAGE_V4L. If IMAGE_UNKNOWN is used on write, then the file extension will be used to determine the filetype. Note that since MRC format does not distinguish between 3-D volumes and stacks of 2-D images, the '.mrcs' extension MUST be used for stack files, and the '.mrc' extension MUST be used for non-stack volume data.
Datatype can be: EM_CHAR, EM_FLOAT, EM_INT, EM_UINT, EM_USHORT, EM_DOUBLE, EM_FLOAT_COMPLEX, EM_SHORT, EM_UCHAR, EM_USHORT_COMPLEX, EM_SHORT_COMPLEX. While SHORT_COMPLEX types are defined, they should never be actually used. FLOAT_COMPLEX is only really usable for HDF files. Strongly suggest not reading/writing complex images, and simply recomputing the FFT instead. Not all file formats support all data types!
If no image#_list is specified to read_images, then ALL images in the file will be read in.
1 # Create a new EMData object and initializes it with the first image in "myimage.hdf".
2 # This will work with any supported file format, not just HDF
3 img=EMData("myimage.hdf")
4
5 # Replace the data in EMdata object 'img' with the 3rd image from "myimage.hdf" (the first is #0)
6 img.read_image("myimage.hdf",2)
7
8 # Write an EMData object to disk as the 3rd image in "image.hdf"
9 img.write_image("image.hdf",2)
10
11 # Read all of the images from the SPIDER stack file (also works with single image files) "test.spi"
12 # lst will become a list of EMData objects
13 lst=EMData.read_images("test.spi")
14
15 # Count the number of images available in a stack file
16 n=EMUtil.get_image_count("myimage.hdf")
17
18 # Create a new EMData object with ONLY HEADER INFORMATION from the 5th image
19 # in the "myimage.hdf" stack file. Any image processing operations on this object
20 # will cause EMAN2 to crash, because it doesn't have data loaded for the actual image.
21 # This can be useful when all you need is the header information from a bunch of images.
22 hdr=EMData("myimage.hdf",4,True)
Region I/O
Region I/O permits reading or writing sub-images/volumes from within a file. It is not supported for all file formats. This is useful when processing huge files (like full 4k tomograms) on machines with limited RAM. For region reading, it is possible to specify a Region extending outside the actual image dimensions, though this generally isn't a good idea. For region writing, the region must be completely inside image bounds.
Storage type
Internally EMAN2 stores all images as 32-bit (single precision) floating point. Many file formats also support other storage modes. The various formats are defined in a dictionary imported from EMAN2.py: file_mode_map. There is also a file_mode_range dictionary which contains the numeric limits for each type. If you set the header values renfer_min and render_max in each image before writing, this will control how the float data is scaled to the specified mode. ie - if render_min is 0 and render_max is 1.0, then the 0-1 range in the internal image will be mapped to the full available scale of (integer mode) output formats. Note also that not all file formats support all modes.
Here are some examples of how to write in alternative formats:
img = EMData(128,128) img.write('float-image.mrc') #by default, image will be write as float img.write_image('short-image.mrc', 0, IMAGE_MRC, False, None, EM_SHORT) #write mrc file in short (16bit) img.write_image('byte-image.mrc', 0, IMAGE_MRC, False, None, EM_UCHAR) #write mrc file in byte (8bit) img.write_image('byte-image.spi', 0, IMAGE_UNKNOWN, False, None, EM_FLOAT) #write mrc file in byte (8bit)