CUDA study 2-3. Parallel Processing of CUDA Data
The entire process of parallel processing data using CUDA is as follows.
- Allocate data to be used for input and output to PC memory.
- Allocate data to be used for input and output to graphics memory.
- Enter the value you want to process into PC memory.
- Copy input data from PC memory to graphics memory.
- Split the data and bring it to the GPU.
- More than a thousand threads are created and processed in parallel using kernel functions.
- Merge the processed results.
- Transfer results to PC memory.
- Free up graphics memory.
- Free up PC memory.
The kernel refers to a function that processes operations in CUDA, such as the worker function in Windows multithreading programming.
The difference between worker functions and kernel functions lies in the process of creating threads.
- Worker functions have a separate process for creating threads,
- but CUDA programs also proceed with creating threads while calling kernel functions.