This document gives a brief introduction into memory-related CUDA features and discusses benefits provided by the CUDA template classes.
NVIDIA's GPUs support various types of memory, each optimized for a particular access pattern. The CUDA toolkit provides functions for allocating memory of each type and for copying data between the different memory types (including host/device data transfer). Only issues relevant to the CUDA templates are discussed here, please see the CUDA Programming Guide for more information.
CUDA supports direct data exchange between OpenGL and CUDA applications. This typically involves access to an OpenGL buffer object, which can be mapped into the address space of a CUDA application. Buffer objects can be used in various ways in OpenGL, e.g., as a data source in texture creation commands. For this purpose, an OpenGL texture class is included in the CUDA templates.
Access to all above-mentioned memory types is implemented in the CUDA toolkit. However, function signatures largely differ depending on the particular memory type involved. Moreover, the size of the data to be processed is given in bytes for some functions and in terms of elements (e.g., floats) for other functions. Error conditions must be checked explicitly, otherwise the program continues in an undefined state.
The main goal of the CUDA templates is to provide a clean and consistent interface to the underlying functions of the CUDA toolkit. Each of the different memory types is represented as a class template parameterized by the element data type and the dimension (most of them with specializations for one, two, and three dimensions). Since the data type (i.e., the type of memory to be accessed) is known at compile time, all the details about different data access methods are left to the compiler, therefore a single template function copy() can handle all possible memory transactions. Other benefits of the object-oriented approach are the mapping of CUDA errors to corresponding exceptions and the automatic deallocation of resources in the class destructors.
To simplify integration of CUDA with existing applications, the CUDA templates include compatibility classes for the following image libraries:
Image data created by one of these libraries can be used with the CUDA templates in the same way as data natively allocated by some CUDA template class. The CUDA templates do not define their own image I/O methods, but instead allow the programmer to use his favourite library for this purpose (or easily add integration for other libraries if none of the above matches).