About CUDA

 

CUDA

Basics

      -- COMPUTE UNIFIED DEVICE ARCHITECTURE   

    --  Used to expose the computational horsepower of NVIDIA GPUs for GPU Computing

    --   It is scalable across any number of threads

    -- Software

  • Based on industry-standard C

  • Small set of extensions to C language

  • Low learning curve

  • Straightforward APIs to manage devices, memory, etc.

Terminology

  • Host -The CPU and its memory
  • Device -   The GPU and its memory
  • Kernel -   Function compiled or the device and it is executed on the device with many threads

Pre-requisites

  • You (probably) need experience with C or C++

  • You do not need any GPU experience  

  • You do not need any graphics experience  

  • You do not need any parallel programming experience

 Why CUDA?

 Ø  Data Parallesim -      Program property where arithmetic operations are simultaneously performed on data structures.

 

  A 1,000 X 1,000 matrix multiplication  

  • 1,000,000 independent dot products.

  • Each 1,000 multiply & 1,000 add arithmetic operations.

Ø  Thread Creation : CUDA threads light weight than CPU threads.  

  • Take few cycles to generate and schedule due to efficient hardware support.

  • CPU threads typically take thousands of clock cycles to generate and schedule.  

Ø  It avoids performance overhead of graphics layer APIs by compiling software directly to hardware (GPU assembly lang).

 

Example of CUDA processing flow

 

Ø  Copy data from main memory to GPU memory  

Ø  CPU instructs the process to GPU  

Ø   GPU execute parallel in each core  

Ø  Copy the result from GPU memory to main memory

 

How to Write

 

Ø  Create or edit the CUDA program with your favorite  editor. Note: CUDA C language programs have the suffix ".cu". 

Ø  Compile the program with nvcc to create the executable.  

Ø  Run the executable.