Diff for "Eman2UsingCudaFromC++"

Differences between revisions 9 and 10

using the EMAN2 CUDA api

EMAN2 includes support for CUDA processing. To use CUDA in EMAN2 you must set the flag ENABLE_EMAN2_CUDA using ccmake, then recompile. This step defines the identifier ENABLE_EMAN2_CUDA causing the preprocessor to demarcate CUDA code for compilation. Any new CUDA code should be enclosed by #ifdef ENABLE_EMAN2_CUDA, #endif directives to restrict its complication. Compiling with CUDA exposes addition methods and members of the class EMData. Below is a list of addition EMData methods with python bindings.

bool EMData::copy_to_cuda() const, this copies EMData data from the CPU to the GPU global memory. Python method = copy_to_cuda()
bool EMData::copy_to_cudaro() const, this copies EMData data from the CPU to the GPU texture memory. Python method = copy_to_cudaro()
bool EMData::copy_rw_to_ro() const, this copies EMData data from global memory to texture memory. Python method = copy_rw_to_ro()
void EMData::switchoncuda(), this tells EMAN2 to use CUDA, you almost never want to call this function directly anymore. Use cuda_initialize instead. Python method = switchoncuda()
void EMData::switchoffcuda(), this tells EMAN2 to stop using CUDA. Python method = switchoffcuda()
bool EMData::cuda_initialize(), this tells EMAN2 to initialize CUDA and start using CUDA. This method needs to be called before CUDA is used. Python method = cuda_initialize()
void EMData::cuda_cleanup(), this cleans up CUDA cache and is called by an event handler in the EMAN2 module. You should nver call this function unless you intend to shut down a EMAN2 program. Python method = cuda_cleanup()
const char* EMData::getcudalock(), this returns a CUDA lock file. CUDA lock files are created to help the system keep track of what process is using what device. Insanely this functionality is not built into the CUDA API. CUDA lock files are stored in /tmp (Yes this will not work for WIndows, but neither will CUDA either). Python method = getcudalock()

If you are writing new C++ code, you will have access to and want to use these additional EMData CUDA methods:

float* getcudarwdata() const, returns a pointer to data in the global GPU memory. If null, 0 is returned
float* getcudarodata() const, returns a pointer to data in the texture GPU memory. If null, 0 is returned
bool EMData::isrodataongpu() const, returns True if data is in the GPU texture memory. Also returns True is data is in the global memory AND it succefuuly copied memory from global to texture. Other wise False is returned
bool EMData::usecuda, this member acts as a flag to signal when CUDA is being used. You should enclose all CUDA code in the braces: if(EMData::usecuda ==1){......}

In addition to the above functions, the following methods are used internally to implement the CUDA memory management scheme, which uses a least frequently used algorithm. When an EMData object data array is copied to the GPU it goes on top of a static linked list(there is only one linked list whose beginning and ending pointers are static). Additional items moved to the GPU go on the top of the list. When an item is accessed it is moved to the top of the list. If GPU memory runs out items are removed from the bottom of the linked list. These items will be the ones least frequently used, as EMData objects filter down to the bottom. The following methods implement this memory management scheme. Both global and texture memory is mamaged (the concept of texture memory makes more sense for openGL programmers).

bool EMData::rw_alloc() const, this method allocates GPU global memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
bool EMData::ro_alloc() const, this method allocates GPU texture memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
void EMData::bindcudaarrayA(const bool intp_mode) const, bind current texture memory to GPU device. After binding, texture memory can be accessed using texture object 'texA'. It is possible to have only 2 textures bound at any one time, A and B. The argument intp_mode specifies whether or not linear interpolation is desired. Many graphics cards have hardware support for this, hence speedups can be immense.
void EMData::bindcudaarrayB(const bool intp_mode) const, bind current texture memory to GPU device. After binding, texture memory can be accessed using texture object 'texB'. It is possible to have only 2 textures bound at any one time, A and B. The argument intp_mode specifies whether or not linear interpolation is desired. Many graphics cards have hardware support for this, hence speedups can be immense.
void EMData::unbindcudaarryA() const, unbind GPU texture 'A' from texture object 'texA'.
void EMData::unbindcudaarryB() const, unbind GPU texture 'B' from texture object 'texB'.
bool EMData::copy_from_device(const bool rocpy), copy data from GPU to CPU. The argument rocpy determines the type of memory copied from. If set to False(default) global memory is copied from, if set to True, texture memory is copied from.
void EMData::rw_free() const, free GPU global memory. This removes the EMData object from the linked list provided texture memory is not in use.
void EMData::ro_free() const, free GPU texture memory. This removes the EMData object from the linked list provided global memory is not in use.
bool EMData::freeup_devicemem(const int& num_bytes) const request to free up 'num_bytes' of memory on the GPU. If 'num_bytes' are already available, this method returns True. If not, then items are removed from the bottom of the linked list until enough memory is available. If enough memory cannot be made availble, then the method returns False.
void EMData::setdirtybit() const This sets a flag denoting that the data on the GPU has changed relative to the CPU. This strategy is not currently in use and the default is to always copy data from GPU to CPU irrespective of whether or not GPU data has changed.
void EMData::elementaccessed() const this method moves the EMData object to the top of the linked list.
void EMData::addtolist() const this method adds this EMData object onto the linked list top.
void EMData::removefromlist() const this method removes this EMData object from the linked list.

-  ⇤ ← Revision 9 as of 2012-05-01 20:23:09 → 
  Size: 5994
  Editor: JohnFlanagan
  Comment:
+   ← Revision 10 as of 2012-05-01 21:56:33 → ⇥
  Size: 6642
  Editor: JohnFlanagan
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 7:
- * '''bool EMData::copy_rw_to_ro() const''', this copes EMData data from global memory to texture memory. Python method = '''copy_rw_to_ro()'''
+ * '''bool EMData::copy_rw_to_ro() const''', this copies EMData data from global memory to texture memory. Python method = '''copy_rw_to_ro()'''
 Line 10:
- * '''bool EMData::cuda_initialize()''', this tells EMAN2 to initialize CUDA and start using CUDA. Python method = '''cuda_initialize()'''
 * '''void EMData::cuda_cleanup()''', this cleans up CUDA cahce and is called by an event handler in the EMAN2 module. You should nver call this function unless you intend to shut down a EMAN2 program. Python method = '''cuda_cleanup()'''
 * '''const char* EMData::getcudalock()''', this returns a CUDA lock file. CUDA lock files are created to the system can keep track for what process is using what device. Insanely this functionality is not built into the CUDA API. CUDA lock files are stored in /tmp (Yes this will not work for WIndows, but neither will CUDA either). Python method = '''getcudalock()''' 
If you are writing new C++ code, you will have access to and want to you additional EMData CUDA methods these are:
 * '''float* getcudarwdata() const''', returns a pointer to data in the global GPU memory. If nothing is there, 0 is returned
 * '''float* getcudarodata() const''', returns a pointer to data in the texture GPU memory. If nothing is there, 0 is returned
 * '''bool EMData::isrodataongpu() const''', returns True if data is in the GPU texture memory. Also returns True is data is in the gobal memory AND it succefuuly copied memory from global to texture. Other wise False is returned
 * '''bool EMData::usecuda''', this member acts as a flag to signle when CUDA is being used. You should enclose all CUDA code in the braces: if(EMData::usecuda ==1){......}
+ * '''bool EMData::cuda_initialize()''', this tells EMAN2 to initialize CUDA and start using CUDA. This method needs to be called before CUDA is used. Python method = '''cuda_initialize()'''
 * '''void EMData::cuda_cleanup()''', this cleans up CUDA cache and is called by an event handler in the EMAN2 module. You should nver call this function unless you intend to shut down a EMAN2 program. Python method = '''cuda_cleanup()'''
 * '''const char* EMData::getcudalock()''', this returns a CUDA lock file. CUDA lock files are created to help the system keep track of what process is using what device. Insanely this functionality is not built into the CUDA API. CUDA lock files are stored in /tmp (Yes this will not work for WIndows, but neither will CUDA either). Python method = '''getcudalock()'''
-Line 19:
+Line 14:
-In addition to the above function, the following methods are used internally to implement the CUDA memory management scheme. The CUDA memory management scheme uses a least frequently used algorithm. When an EMData object data array is copied to the GPU it goes on top of a linked list. Additional items moved to the GPU also go on the top of the list. When an item is accessed it is moved to the top of the list. If GPU memory runs out items are removed from the bottom of the linked list. These items will be the ones least frequently used, as these filter down to the bottom of the linked list. The following methods implement this.
+If you are writing new C++ code, you will have access to and want to use these additional EMData CUDA methods:
 * '''float* getcudarwdata() const''', returns a pointer to data in the global GPU memory. If null, 0 is returned
 * '''float* getcudarodata() const''', returns a pointer to data in the texture GPU memory. If null, 0 is returned
 * '''bool EMData::isrodataongpu() const''', returns True if data is in the GPU texture memory. Also returns True is data is in the global memory AND it succefuuly copied memory from global to texture. Other wise False is returned
 * '''bool EMData::usecuda''', this member acts as a flag to signal when CUDA is being used. You should enclose all CUDA code in the braces: if(EMData::usecuda ==1){......}

In addition to the above functions, the following methods are used internally to implement the CUDA memory management scheme, which uses a least frequently used algorithm. When an EMData object data array is copied to the GPU it goes on top of a static linked list(there is only one linked list whose beginning and ending pointers are static). Additional items moved to the GPU go on the top of the list. When an item is accessed it is moved to the top of the list. If GPU memory runs out items are removed from the bottom of the linked list. These items will be the ones least frequently used, as EMData objects filter down to the bottom. The following methods implement this memory management scheme. Both global and texture memory is mamaged (the concept of texture memory makes more sense for openGL programmers).
-Line 22:
+Line 23:
- * '''void EMData::bindcudaarrayA(const bool intp_mode) const''', bind GPU texture 'A'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''void EMData::bindcudaarrayB(const bool intp_mode) const''', bind a GPU texture 'B'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''void EMData::unbindcudaarryA() const''', unbind GPU texture 'A'. It is possible to have only 2 textures bound at any one time, A and B.
 * '''void EMData::unbindcudaarryB() const''', unbind GPU texture 'B'. It is possible to have only 2 textures bound at any one time, A and B.
+ * '''void EMData::bindcudaarrayA(const bool intp_mode) const''', bind current texture memory to GPU device. After binding, texture memory can be accessed using texture object 'texA'. It is possible to have only 2 textures bound at any one time, A and B. The argument intp_mode specifies whether or not linear interpolation is desired. Many graphics cards have hardware support for this, hence speedups can be immense.  
 * '''void EMData::bindcudaarrayB(const bool intp_mode) const''', bind current texture memory to GPU device. After binding, texture memory can be accessed using texture object 'texB'. It is possible to have only 2 textures bound at any one time, A and B. The argument intp_mode specifies whether or not linear interpolation is desired. Many graphics cards have hardware support for this, hence speedups can be immense.
 * '''void EMData::unbindcudaarryA() const''', unbind GPU texture 'A' from texture object 'texA'.
 * '''void EMData::unbindcudaarryB() const''', unbind GPU texture 'B' from texture object 'texB'.
-Line 29:
+Line 30:
- * '''bool EMData::freeup_devicemem(const int& num_bytes) const''' request to free up 'num_bytes' of memory on the GPU. If 'num_bytes' is already available, this method returns True. If not, then items are removed from the bottom of the linked list until enough memory is available. If enough memory cannot be made availble, then the method returns False.
+ * '''bool EMData::freeup_devicemem(const int& num_bytes) const''' request to free up 'num_bytes' of memory on the GPU. If 'num_bytes' are already available, this method returns True. If not, then items are removed from the bottom of the linked list until enough memory is available. If enough memory cannot be made availble, then the method returns False.
-Line 32:
+Line 33:
- * '''void EMData::addtolist() const''' this method adds this EMData object onto the linked list.
+ * '''void EMData::addtolist() const''' this method adds this EMData object onto the linked list top.