CUDA study 2-4. CUDA Kernel Function
The CUDA kernel function operates as a combination of instructions executed on the GPU, enabling simultaneous multi-threaded execution across numerous cores.
Creation of CUDA Kernel Function
__global__ void KernelFunction(int a, int b, int c)
{
......
}
The kernel function resembles a typical C function in structure, but it is distinguished by the presence of the __global__
specifier at the beginning, indicating the scope in which it can be used.
When using a kernel function, there are certain constraints to be considered:
- The return type must always be specified as
void
. - It must use predefined arguments.
Since it cannot return a value directly, if you need to retrieve results, you have to use a return-type argument with a pointer variable.
Invocation of the Created Kernel Function
Since it is a function utilized on the GPU, the memory pointed to by the pointer variables used as arguments must be allocated in the graphics card’s memory.
KernelFunction<<< block, thread >>>(1, 2, 3);
The <<<
and >>>
symbols specify the creation of threads, and the number of threads to be created is set within these symbols.
Here, a block refers to a group containing threads, and the total number of threads created can be expressed as follows:
$ N_{Total} = N_{Blocks} \times N_{Threads per Block}. $
Here is the complete code example:
#include <stdio.h>
__global__ void KernelFunction(int a, int b, int c)
{
int sum = a+b+c;
}
int main()
{
KernelFunction<<6, 6>>(1, 2, 3);
printf("Complete invocation threads\n");
return 0;
}