The Heterogeneous Interface for Portability (HIP) is AMD’s dedicated GPU programming environment for designing high performance kernels on GPU hardware. HIP is a C++ runtime API and programming language that allows developers to create portable applications on different platforms. This means that developers can write their GPU applications and with very minimal changes be able to run their code in any environment. This module provides in-depth training on programming with HIP.
➤ Deep Dive into GPU and Performance Optimizations
Learn about the GPU Programming Model and other basics to help optimize code performance.
➤ Your First HIP Code: Vector Add
Use HIP API’s to write a simple vector add application and compile it two different ways.
➤ HIP using ROCm Profiler: Matrix Transpose
Learn about the profiler, a tool that can help determine the bottlenecks of an application and other characteristics, and its use in optimizations.
➤ Matrix Transpose Part 2: Naïve Version
The Naïve Matrix Transpose is a basic transpose kernel that can achieve a fraction of the effective bandwidth of the copy kernel because of the way it reads data.
➤ Matrix Transpose Part 3: Optimized LDS Version
The Local Data Share is a user-managed cache available of AMD GPU’s that enables data-sharing within threads of the same thread-block. It allows for 100x faster reads and writes than global memory and can optimize the naïve matrix transpose throughput.
➤ Matrix Transpose Part 4: Key Takeaways
Walk through the key summaries and takeaways of the module.
➤ Debugging Tips and Tricks
Learn about the expert tips and tricks that can help when dealing with debugging program crashes.
➤ Debugging Tips and Tricks: Wrap-up
Learn some more debugging tips when writing and compiling GPU applications.
CUDA to HIP
HIP code can run on multiple platforms and provides the much-needed code portability in today’s world.HIP also allows for easy porting of code from CUDA, allowing developers to run CUDA applications on ROCm with ease. This module will walk through the porting process in detail through examples.
➤ Converting a simple application
Learn how to port a simple vector add application written in CUDA and run it on a ROCm GPU.
➤ Porting Deep Learning CUDA-CNN to HIP
The Convolutional Neural Networks (CNN) are one of the most popular class of machine learning algorithms and have multiple important uses. Learn how to convert machine learning applications using HIP in this module.
➤ Porting Machine Learning K-means to HIP
KMeans is a popular unsupervised ML algorithm based on the idea of clustering data into three different groups. This tutorial will port a CUDA KMeans application to HIP.
➤ Wrap-up: Porting from CUDA to HIP
Walk through the key takeaways and conclusions of the module.