About CUDA





    --  Used to expose the computational horsepower of NVIDIA GPUs for GPU Computing

    --   It is scalable across any number of threads

    -- Software

  • Based on industry-standard C

  • Small set of extensions to C language

  • Low learning curve

  • Straightforward APIs to manage devices, memory, etc.


  • Host -The CPU and its memory
  • Device -   The GPU and its memory
  • Kernel -   Function compiled or the device and it is executed on the device with many threads


  • You (probably) need experience with C or C++

  • You do not need any GPU experience  

  • You do not need any graphics experience  

  • You do not need any parallel programming experience

 Why CUDA?

 Ø  Data Parallesim -      Program property where arithmetic operations are simultaneously performed on data structures.


  A 1,000 X 1,000 matrix multiplication  

  • 1,000,000 independent dot products.

  • Each 1,000 multiply & 1,000 add arithmetic operations.

Ø  Thread Creation : CUDA threads light weight than CPU threads.  

  • Take few cycles to generate and schedule due to efficient hardware support.

  • CPU threads typically take thousands of clock cycles to generate and schedule.  

Ø  It avoids performance overhead of graphics layer APIs by compiling software directly to hardware (GPU assembly lang).


Example of CUDA processing flow


Ø  Copy data from main memory to GPU memory  

Ø  CPU instructs the process to GPU  

Ø   GPU execute parallel in each core  

Ø  Copy the result from GPU memory to main memory


How to Write


Ø  Create or edit the CUDA program with your favorite  editor. Note: CUDA C language programs have the suffix ".cu". 

Ø  Compile the program with nvcc to create the executable.  

Ø  Run the executable.