1 minute read

The CUDA kernel function operates as a combination of instructions executed on the GPU, enabling simultaneous multi-threaded execution across numerous cores.

Creation of CUDA Kernel Function

__global__ void KernelFunction(int a, int b, int c)
{
    ......
}

The kernel function resembles a typical C function in structure, but it is distinguished by the presence of the __global__ specifier at the beginning, indicating the scope in which it can be used.

When using a kernel function, there are certain constraints to be considered:

  • The return type must always be specified as void.
  • It must use predefined arguments.

Since it cannot return a value directly, if you need to retrieve results, you have to use a return-type argument with a pointer variable.

Invocation of the Created Kernel Function

Since it is a function utilized on the GPU, the memory pointed to by the pointer variables used as arguments must be allocated in the graphics card’s memory.

KernelFunction<<< block, thread >>>(1, 2, 3);

The <<< and >>> symbols specify the creation of threads, and the number of threads to be created is set within these symbols.

Here, a block refers to a group containing threads, and the total number of threads created can be expressed as follows:

$ N_{Total} = N_{Blocks} \times N_{Threads per Block}. $

Here is the complete code example:

#include <stdio.h>

__global__ void KernelFunction(int a, int b, int c)
{
    int sum = a+b+c;
}

int main()
{
    KernelFunction<<6, 6>>(1, 2, 3);

    printf("Complete invocation threads\n");
    return 0;
}

Tags:

Categories:

Updated: