CUDA study 3. CUDA C language
Glossary of Terms
- Host: PC components (CPU, DRAM)
- Device: Graphics Card
3.1 Function Modifiers
-
__global__
It can be executed on the device and called from the host, but it cannot be called from the device itself.
It is used for specifying kernel functions executed on the device.
__global__ function<<< >>>() { ... }
- The return value is always
void
. - You can specify the blocks and threads during execution using
<<<
and>>>
. - Recursion is not allowed.
- You cannot have
static
variables within a function. - Variable-length argument lists are not allowed.
- You can use a pointer to a function specified with
__global__
. - Cannot be used concurrently with
__host__
(to be explained later). - It returns immediately upon invocation before the processing on the device is completed, enabling asynchronous behavior.
- You can utilize shared memory and use arguments up to 256 bytes.
- The return value is always
-
__device__
It can be executed on the device, callable from the device, but not callable from the host.
It is written within device code and used as a sub-function executed internally on the device.
__device__ int function(int a, int b) { ... }
- Recursion is not allowed.
- You cannot have
static
variables within a function. - Variable-length argument lists are not allowed.
- You cannot use a pointer to a function specified with
__device__
.
-
__host__
It is executed on the host, callable from the host, but not callable from the device.
It typically becomes a function commonly used on the host.
__host__ int function(int a, int b) { ... }
- If not specified as
__host__
,__global__
, or__device__
, it is equivalent to specifying__host__
. - It cannot be used concurrently with
__global__
. - By using it concurrently with
__device__
, you can write a function that can be used on both the host and the device.
- If not specified as
3.2 Variable Modifiers
-
__device__
It is allocated in the global memory space and remains valid until the program terminates.
It is accessible to all threads, and on the host side, reading and writing can be done through API functions.
-
__constant__
It is allocated in the constant memory space and remains valid until the program terminates.
All threads can access it, but only reading is allowed.
Values can be written from the host using the API cudaMemcpyToSymbol().
It is accompanied by constant cache.
-
__shared__
It is allocated in the shared memory space and is valid within the executing thread block.
Threads within the block can access it, allowing both reading and writing.