Formateur : ATOS
Pré-requis : avoir suivi la formation CUDA Basics, ou équivalent
Requisite : Training CUDA Basics or equivalent attended
basics c/c++, basics parallel programming (thread, posix, openmp, mpi, ...).
Synopsys :
Quick recap *
Data transfer optimizations (pinned memory, zero copy, cuda managed memory) *
concurrent execution (streams, events, levels of synchronization across warps/blocks) *
Kernel optimizations (warps, impact of branches, global/constant/shared memory in detail espacially bank conflicts)
overall GPU efficiency (occupancy, roofline model)
Hardware specific behaviours (Kepler, Pascal, Volta) key differences for the programmer
Compilation of CUDA in detail (execution model)
Multi GPU (device management, CUDA context, Peer2Peer in CUDA, NV-Link, CUDA +
MPI (gdr-copy), Multi-Process-Service mps)
Advanced profiling (nvidia-smi, nvprof, nvvp)
* = overlap with training "CUDA basics"