CUDA study 2-3. Parallel Processing of CUDA Data

less than 1 minute read

The entire process of parallel processing data using CUDA is as follows.

Allocate data to be used for input and output to PC memory.
Allocate data to be used for input and output to graphics memory.
Enter the value you want to process into PC memory.
Copy input data from PC memory to graphics memory.
Split the data and bring it to the GPU.
More than a thousand threads are created and processed in parallel using kernel functions.
Merge the processed results.
Transfer results to PC memory.
Free up graphics memory.
Free up PC memory.

The kernel refers to a function that processes operations in CUDA, such as the worker function in Windows multithreading programming.

The difference between worker functions and kernel functions lies in the process of creating threads.

Worker functions have a separate process for creating threads,
but CUDA programs also proceed with creating threads while calling kernel functions.

You may also enjoy