About CUDA

 


CUDA

Basics


      -- COMPUTE UNIFIED DEVICE ARCHITECTURE   


    --  Used to expose the computational horsepower of NVIDIA GPUs for GPU Computing


    --   It is scalable across any number of threads


    -- Software




  • Based on industry-standard C



  • Small set of extensions to C language



  • Low learning curve



  • Straightforward APIs to manage devices, memory, etc.


Terminology



  • Host -The CPU and its memory

  • Device -   The GPU and its memory

  • Kernel -   Function compiled or the device and it is executed on the device with many threads

Pre-requisites




  • You (probably) need experience with C or C++



  • You do not need any GPU experience  



  • You do not need any graphics experience  



  • You do not need any parallel programming experience


 Why CUDA?


 Ø  Data Parallesim -      Program property where arithmetic operations are simultaneously performed on data structures.


 


  A 1,000 X 1,000 matrix multiplication  




  • 1,000,000 independent dot products.



  • Each 1,000 multiply & 1,000 add arithmetic operations.


Ø  Thread Creation : CUDA threads light weight than CPU threads.  




  • Take few cycles to generate and schedule due to efficient hardware support.



  • CPU threads typically take thousands of clock cycles to generate and schedule.  


Ø  It avoids performance overhead of graphics layer APIs by compiling software directly to hardware (GPU assembly lang).


 


Example of CUDA processing flow


 


Ø  Copy data from main memory to GPU memory  


Ø  CPU instructs the process to GPU  


Ø   GPU execute parallel in each core  


Ø  Copy the result from GPU memory to main memory


 


How to Write


 


Ø  Create or edit the CUDA program with your favorite  editor. Note: CUDA C language programs have the suffix ".cu". 


Ø  Compile the program with nvcc to create the executable.  


Ø  Run the executable.