Diff for "Eman2UsingCudaFromC++"

Differences between revisions 5 and 6

using the EMAN2 CUDA api

EMAN2 includes support for CUDA processing. To use CUDA in EMAN2 you must set the flag ENABLE_EMAN2_CUDA using ccmake, thne recompile. This step defines the identifier ENABLE_EMAN2_CUDA so the preprocessor compiles CUDA code. Any new cuda code should be enclosed by #ifdef #endif directives so it is only compiled when CUDA is desired. Compiling with CUDA exposes addition methods and members of the class EMData. Below is a list of addition EMData methods with python bindings.

bool EMData::copy_to_cuda() const, this copies EMData data from the CPU to the GPU global memory
bool EMData::copy_to_cudaro() const, this copies EMData data from the CPU to the GPU texture memory
bool EMData::copy_rw_to_ro() const, this copes EMData data from global memory to texture memory
void EMData::switchoncuda(), this tells EMAN2 to use CUDA, you almost never want to call this function directly anymore. Use cuda_initialize instead.
void EMData::switchoffcuda(), this tells EMAN2 to stop using CUDA.
bool EMData::cuda_initialize(), this tells EMAN2 to initialize CUDA and start using CUDA.
void EMData::cuda_cleanup(), this cleans up CUDA cahce and is called by an event handler in the EMAN2 module. You should nver call this function unless you intend to shut down a EMAN2 program.
const char* EMData::getcudalock(), this returns a CUDA lock file. CUDA lock files are created to the system can keep track for what process is using what device. Insanely this functionality is not built into the CUDA API. CUDA lock files are stored in /tmp (Yes this will not work for WIndows, but neither will CUDA either).

If you are writing new C++ code, you will have access to and want to you additional EMData CUDA methods these are:

float* getcudarwdata() const, returns a pointer to data in the global GPU memory. If nothing is there, 0 is returned
float* getcudarodata() const, returns a pointer to data in the texture GPU memory. If nothing is there, 0 is returned
bool EMData::isrodataongpu() const, returns True if data is in the GPU texture memory. Also returns True is data is in the gobal memory AND it succefuuly copied memory from global to texture. Other wise False is returned
bool EMData::usecuda, this member acts as a flag to signle when CUDA is being used. You should enclose all CUDA code in the braces: if(EMData::usecuda ==1){......}

In addition to the above function, the following methods are used internally to implement the CUDA memory management scheme. The CUDA memory management scheme uses a least frequently used algorithm. When an EMData object data array is copied to the GPU it goes on top of a linked list. Additional items moved to the GPU also go on the top of the list. When an item is accessed it is moved to the top of the list. If GPU memory runs out items are removed from the bottom of the linked list. These items will be the ones least frequently used, as these filter down to the bottom of the linked list. The following methods implement this.

bool EMData::rw_alloc() const, this method allocates GPU global memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
bool EMData::ro_alloc() const, this method allocates GPU texture memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
void EMData::bindcudaarrayA(const bool intp_mode) const, bind GPU texture 'A'. It is possible to have only 2 textures bound at any one time, A and B.
void EMData::bindcudaarrayB(const bool intp_mode) const, bind a GPU texture 'B'. It is possible to have only 2 textures bound at any one time, A and B.
void EMData::unbindcudaarryA() const, unbind GPU texture 'A'. It is possible to have only 2 textures bound at any one time, A and B.
void EMData::unbindcudaarryB() const, unbind GPU texture 'B'. It is possible to have only 2 textures bound at any one time, A and B.
bool EMData::copy_from_device(const bool rocpy), copy data from GPU to CPU. The argument rocpy determines the type of memory copied from. If set to False(default) global memory is copied from, if set to True, texture memory is copied from.
void EMData::rw_free() const, free GPU global memory. This removes the EMData object from the linked list provided texture memory is not in use.
void EMData::ro_free() const, free GPU texture memory. This removes the EMData object from the linked list provided global memory is not in use.
bool EMData::freeup_devicemem(const int& num_bytes) const request to free up 'num_bytes' of memory on the GPU. If 'num_bytes' is already available, this method returns True. If not, then items are removed from the bottom of the linked list until enough memory is available. If enough memory cannot be made availble, then the method returns False.
void EMData::setdirtybit() const This sets a flag denoting that the data on the GPU has changed relative to the CPU. This strategy is not currently in use and the default is to always copy data from GPU to CPU irrespective of whether or not GPU data has changed.
void EMData::elementaccessed() const this method moves the EMData object to the top of the linked list
void EMData::removefromlist() const this method removes the EMData object from the linked list.

-  ⇤ ← Revision 5 as of 2012-04-30 21:54:05 → 
  Size: 2508
  Editor: JohnFlanagan
  Comment:
+   ← Revision 6 as of 2012-04-30 22:53:43 → ⇥
  Size: 5544
  Editor: JohnFlanagan
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 18:
+In addition to the above function, the following methods are used internally to implement the CUDA memory management scheme. The CUDA memory management scheme uses a least frequently used algorithm. When an EMData object data array is copied to the GPU it goes on top of a linked list. Additional items moved to the GPU also go on the top of the list. When an item is accessed it is moved to the top of the list. If GPU memory runs out items are removed from the bottom of the linked list. These items will be the ones least frequently used, as these filter down to the bottom of the linked list. The following methods implement this.
 * '''bool EMData::rw_alloc() const''', this method allocates GPU global memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
 * '''bool EMData::ro_alloc() const''', this method allocates GPU texture memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
 * '''void EMData::bindcudaarrayA(const bool intp_mode) const''', bind GPU texture 'A'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''void EMData::bindcudaarrayB(const bool intp_mode) const''', bind a GPU texture 'B'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''void EMData::unbindcudaarryA() const''', unbind GPU texture 'A'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''void EMData::unbindcudaarryB() const''', unbind GPU texture 'B'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''bool EMData::copy_from_device(const bool rocpy)''', copy data from GPU to CPU. The argument rocpy determines the type of memory copied from. If set to False(default) global memory is copied from, if set to True, texture memory is copied from. 
 * '''void EMData::rw_free() const''', free GPU global memory. This removes the EMData object from the linked list provided texture memory is not in use.
 * '''void EMData::ro_free() const''', free GPU texture memory. This removes the EMData object from the linked list provided global memory is not in use.
 * '''bool EMData::freeup_devicemem(const int& num_bytes) const''' request to free up 'num_bytes' of memory on the GPU. If 'num_bytes' is already available, this method returns True. If not, then items are removed from the bottom of the linked list until enough memory is available. If enough memory cannot be made availble, then the method returns False.
 * '''void EMData::setdirtybit() const''' This sets a flag denoting that the data on the GPU has changed relative to the CPU. This strategy is not currently in use and the default is to always copy data from GPU to CPU irrespective of whether or not GPU data has changed.
 * '''void EMData::elementaccessed() const''' this method moves the EMData object to the top of the linked list
 * '''void EMData::removefromlist() const''' this method removes the EMData object from the linked list.