CUDA study 3. CUDA C language

2 minute read

Glossary of Terms

Host: PC components (CPU, DRAM)
Device: Graphics Card

3.1 Function Modifiers

__global__

It can be executed on the device and called from the host, but it cannot be called from the device itself.

It is used for specifying kernel functions executed on the device.
```
__global__ function<<< >>>()
{ ... }
```
- The return value is always void.
- You can specify the blocks and threads during execution using <<< and >>>.
- Recursion is not allowed.
- You cannot have static variables within a function.
- Variable-length argument lists are not allowed.
- You can use a pointer to a function specified with __global__.
- Cannot be used concurrently with __host__ (to be explained later).
- It returns immediately upon invocation before the processing on the device is completed, enabling asynchronous behavior.
- You can utilize shared memory and use arguments up to 256 bytes.
__device__

It can be executed on the device, callable from the device, but not callable from the host.

It is written within device code and used as a sub-function executed internally on the device.
```
__device__ int function(int a, int b)
{ ... }
```
- Recursion is not allowed.
- You cannot have static variables within a function.
- Variable-length argument lists are not allowed.
- You cannot use a pointer to a function specified with __device__.
__host__

It is executed on the host, callable from the host, but not callable from the device.

It typically becomes a function commonly used on the host.
```
__host__ int function(int a, int b)
{ ... }
```
- If not specified as __host__, __global__, or __device__, it is equivalent to specifying __host__.
- It cannot be used concurrently with __global__.
- By using it concurrently with __device__, you can write a function that can be used on both the host and the device.

3.2 Variable Modifiers

__device__

It is allocated in the global memory space and remains valid until the program terminates.

It is accessible to all threads, and on the host side, reading and writing can be done through API functions.
__constant__

It is allocated in the constant memory space and remains valid until the program terminates.

All threads can access it, but only reading is allowed.

Values can be written from the host using the API cudaMemcpyToSymbol().

It is accompanied by constant cache.
__shared__

It is allocated in the shared memory space and is valid within the executing thread block.

Threads within the block can access it, allowing both reading and writing.

Share on

Twitter Facebook LinkedIn

SeungWon Jeong

CUDA study 3. CUDA C language

Glossary of Terms

3.1 Function Modifiers

3.2 Variable Modifiers

Share on

You may also enjoy

CUDA study 2-5. Data Transfer Overhead

CUDA study 2-4. CUDA Kernel Function

CUDA study 2-3. Parallel Processing of CUDA Data

CUDA study 2-2. Using Graphics Card Memory