Skip to content

goabiaryan/awesome-gpu-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome GPU Engineering Awesome

A curated list of resources for mastering GPU engineering from architecture and kernel programming to large-scale distributed systems and AI acceleration.


πŸ“˜ Foundational Books

  • Programming Massively Parallel Processors: A Hands-on Approach β€” David B. Kirk & Wen-mei W. Hwu The canonical introduction to CUDA, memory hierarchies, and parallel patterns. Amazon , notes: Abi's Concise Notes
  • CUDA by Example β€” Jason Sanders & Edward Kandrot
    A practical introduction to CUDA for beginners. Amazon
  • The Ultra-Scale Playbook: Training LLMs on GPU Clusters - Hugging Face Web Version

πŸ’» GPU Programming Frameworks

  • CUDA β€” NVIDIA’s proprietary GPU programming platform.
  • ROCm β€” AMD’s open compute stack.
  • OpenCL β€” Cross-platform parallel computing standard.
  • SYCL / oneAPI β€” Intel’s C++ abstraction for heterogeneous compute.
  • Vulkan Compute β€” Low-level GPU compute API.
  • Kompute β€” Higher level general purpose GPU compute framework built on Vulkan.
  • Metal Performance Shaders β€” Apple’s GPU framework.

🧩 Optimization and Performance

  • NVIDIA Nsight Systems β€” System-wide GPU profiler.
  • Nsight Compute β€” Kernel-level performance analysis.
  • Occupancy Calculator β€” NVIDIA spreadsheet for kernel configuration.
  • CUTLASS β€” CUDA templates for linear algebra subroutines.
  • TensorRT β€” High-performance deep learning inference.
  • OpenAI Triton β€” Python DSL for writing high-performance GPU kernels.
  • Roofline Model β€” Analytical model to reason about compute/memory bottlenecks.

🧠 Architecture and Low-Level Design

βš™οΈ Systems and Multi-GPU Engineering

πŸ§ͺ Tutorials and Courses

πŸ“„ Research Papers and Articles

🧰 Tools and Utilities

  • nvprof, nvvp, Nsight Systems / Compute β€” NVIDIA profiling tools.
  • cuda-memcheck, compute-sanitizer β€” Memory and correctness tools.
  • GPGPU-Sim, Accel-Sim β€” GPU simulation frameworks.
  • Perfetto, Nsight UI β€” Visual profilers for tracing GPU workloads.

Learning Tools

πŸ§‘β€πŸ”¬ GPU for AI & ML

  • PyTorch CUDA Extensions β€” Custom kernels for PyTorch.
  • JAX + XLA β€” Compiler-based GPU vectorization.
  • TensorFlow XLA Compiler β€” Ahead-of-time GPU graph compilation.
  • FlashAttention, FlashConv β€” Kernel optimization techniques for transformers.
  • DeepSpeed, FSDP, Megatron-LM β€” Distributed training systems.

🧱 GPU Systems Design Topics For Interview Prep

  • FlashAttention and PagedAttention
  • Matmul Operations
  • GPU scheduling algorithms and runtime systems.
  • Memory oversubscription and unified memory models.
  • Resource allocation in GPU clusters.
  • GPU virtualization
  • Kernel fusion and graph execution
  • Dataflow optimization
  • Persistent threads model

πŸ§‘β€πŸ’» Contributors

Contributions welcome!
Please read the contribution guidelines before submitting a pull request.

🧾 License

CC BY 4.0 β€” feel free to share and adapt with attribution.

⭐ Acknowledgements

Inspired by:


β€œGPU engineering is not just about writing kernels. It’s about understanding how systems work.” β€” Model Craft

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages