2 minute read

Glossary of Terms

  • Host: PC components (CPU, DRAM)
  • Device: Graphics Card

3.1 Function Modifiers

  • __global__

    It can be executed on the device and called from the host, but it cannot be called from the device itself.

    It is used for specifying kernel functions executed on the device.

    __global__ function<<< >>>()
    { ... }
    
    • The return value is always void.
    • You can specify the blocks and threads during execution using <<< and >>>.
    • Recursion is not allowed.
    • You cannot have static variables within a function.
    • Variable-length argument lists are not allowed.
    • You can use a pointer to a function specified with __global__.
    • Cannot be used concurrently with __host__ (to be explained later).
    • It returns immediately upon invocation before the processing on the device is completed, enabling asynchronous behavior.
    • You can utilize shared memory and use arguments up to 256 bytes.
  • __device__

    It can be executed on the device, callable from the device, but not callable from the host.

    It is written within device code and used as a sub-function executed internally on the device.

    __device__ int function(int a, int b)
    { ... }
    
    • Recursion is not allowed.
    • You cannot have static variables within a function.
    • Variable-length argument lists are not allowed.
    • You cannot use a pointer to a function specified with __device__.
  • __host__

    It is executed on the host, callable from the host, but not callable from the device.

    It typically becomes a function commonly used on the host.

    __host__ int function(int a, int b)
    { ... }
    
    • If not specified as __host__, __global__, or __device__, it is equivalent to specifying __host__.
    • It cannot be used concurrently with __global__.
    • By using it concurrently with __device__, you can write a function that can be used on both the host and the device.

3.2 Variable Modifiers

  • __device__

    It is allocated in the global memory space and remains valid until the program terminates.

    It is accessible to all threads, and on the host side, reading and writing can be done through API functions.

  • __constant__

    It is allocated in the constant memory space and remains valid until the program terminates.

    All threads can access it, but only reading is allowed.

    Values can be written from the host using the API cudaMemcpyToSymbol().

    It is accompanied by constant cache.

  • __shared__

    It is allocated in the shared memory space and is valid within the executing thread block.

    Threads within the block can access it, allowing both reading and writing.

Tags:

Categories:

Updated: