gpu offload host code generation #142097

ZuseZ4 · 2025-06-05T20:01:04Z

r? ghost

This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Before merging, we will use LLVM's Info infrastructure to verify that the memcopies match what openmp offloa generates in C++. LIBOMPTARGET_INFO=-1 ./my_rust_binary should print that a memcpy to and later from the device is happening.

A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.

I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.

Tracking:

Tracking Issue for GPU-offload #131513

ZuseZ4 · 2025-06-10T03:09:23Z

@oli-obk Featurewise, I am almost done. I'll add a few more lines to describe the layout of Rust types to the offload library, but in this PR I only intend to support one type or two (maybe array's, raw pointer, or slices). I might even hardcode the length in the very first approach. In a follow-up PR I'll do some proper type parsing on a higher level, similar to what I did in the past with Rust TypeTrees. This work is much simpler and more reliable though, since offload doesn't care what type something has, just how many bytes it is large, and hence need to be moved to/from the GPU.

I was able to just move a few of the builder methods I needed to the generic builder.
However, there are also around 7 that I had to duplicate. I guess at some point I'll need to do the proper work of enabling the trait implementations for both builders :/
Once I have everything working, I'll clean it up and add some tests and docs.

compiler/rustc_codegen_llvm/src/back/lto.rs

compiler/rustc_codegen_llvm/src/context.rs

ZuseZ4 · 2025-06-12T14:32:20Z

Not fully ready yet, I apparently missed yet another global to initialize the offload runtime. But at least it compiles successfully to a binary if I emit the IR from Rust, and then use clang for the rest. I'll add the global today, then I should be done and will clean it up

bors · 2025-06-18T04:08:45Z

☔ The latest upstream changes (presumably #142644) made this pull request unmergeable. Please resolve the merge conflicts.

ZuseZ4 · 2025-06-18T21:57:35Z

Jay, turns out the only issue in my test binary was a bug in LLVM, which was already fixed upstream in llvm/llvm-project#143638.
Once rustc syncs the llvm submodule again (in a week or so), we should get the fix. This does not affect the llvm-ir we generate with rustc, it just affects how clang compiles the llvm-ir from rustc to a binary. Therefore we don't have to wait for it. I'll add an llvm-ir test to make sure we generate the right things and clean it up a bit more.

rustbot · 2025-06-19T00:39:01Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

ZuseZ4

I did the first round of reviews for myself, I'll address them tomorrow.
I'll also clean up the code in gpu_builder more, it has a lot of duplications and IR comments from when I was trying to figure out what to generate..

ZuseZ4 · 2025-06-19T00:40:07Z

compiler/rustc_codegen_llvm/src/builder.rs

@@ -117,6 +118,70 @@ impl<'a, 'll, CX: Borrow<SCx<'ll>>> GenericBuilder<'a, 'll, CX> {
        }
        bx
    }
+
+    pub(crate) fn my_alloca2(&mut self, ty: &'ll Type, align: Align, name: &str) -> &'ll Value {


I'll find a better name for it.

also document why/how it is different from alloca

compiler/rustc_codegen_llvm/src/back/lto.rs

compiler/rustc_codegen_llvm/src/declare.rs

compiler/rustc_codegen_llvm/src/llvm/ffi.rs

compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp

ZuseZ4 · 2025-06-24T22:00:11Z

ok, I think I'm mostly done. Do you have any suggestions? I don't want to add any actual run tests, as these would require a working clang based on the same commit.

ZuseZ4 · 2025-06-24T23:45:38Z

compiler/rustc_codegen_llvm/src/back/lto.rs

@@ -667,6 +668,12 @@ pub(crate) fn run_pass_manager(
        write::llvm_optimize(cgcx, dcx, module, None, config, opt_level, opt_stage, stage)?;
    }

+    if cfg!(llvm_enzyme) && enable_gpu && !thin {


There is no dependency of offload on Enzyme, but since I think I'm supposed to gate my features, for now I'll just re-use the ones from Enzyme.

bors · 2025-06-26T03:33:32Z

☔ The latest upstream changes (presumably #143026) made this pull request unmergeable. Please resolve the merge conflicts.

oli-obk

I... don't know if I can review this properly. I can review it from the "does this fit into how I want the llvm backend to look" side, but what it actually does just looks random to me.

oli-obk · 2025-07-01T15:51:17Z

compiler/rustc_codegen_llvm/src/builder.rs

@@ -117,6 +118,70 @@ impl<'a, 'll, CX: Borrow<SCx<'ll>>> GenericBuilder<'a, 'll, CX> {
        }
        bx
    }
+
+    pub(crate) fn my_alloca2(&mut self, ty: &'ll Type, align: Align, name: &str) -> &'ll Value {


also document why/how it is different from alloca

oli-obk · 2025-07-01T15:55:49Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

+        let llcx = llvm::LLVMRustContextCreate(false);
+        let module_name = CString::new("offload.wrapper.module").unwrap();
+        let llmod = llvm::LLVMModuleCreateWithNameInContext(module_name.as_ptr(), llcx);
+        let cx = SimpleCx::new(llmod, llcx, cgcx.pointer_size);
+        let tptr = cx.type_ptr();
+        let ti64 = cx.type_i64();
+        let ti32 = cx.type_i32();
+        let ti16 = cx.type_i16();
+        let dl_cstr = llvm::LLVMGetDataLayoutStr(old_cx.llmod);
+        llvm::LLVMSetDataLayout(llmod, dl_cstr);
+        let target_cstr = llvm::LLVMGetTarget(old_cx.llmod);
+        llvm::LLVMSetTarget(llmod, target_cstr);


this shares a bit of code with create_module, can we do better? Or at least make all the individual functions have safe wrappers

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

compiler/rustc_codegen_llvm/src/declare.rs

oli-obk · 2025-07-01T15:59:24Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

+    offload_entry_ty
+}
+
+fn gen_globals<'ll>(


uh. please do one function per global, they are mostly unrelated after all

oli-obk · 2025-07-01T15:59:45Z

compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs

+    let foo = crate::declare::declare_simple_fn(
+        &cx,
+        &mapper_begin,
+        llvm::CallConv::CCallConv,
+        llvm::UnnamedAddr::No,
+        llvm::Visibility::Default,
+        mapper_fn_ty,
+    );
+    let bar = crate::declare::declare_simple_fn(
+        &cx,
+        &mapper_update,
+        llvm::CallConv::CCallConv,
+        llvm::UnnamedAddr::No,
+        llvm::Visibility::Default,
+        mapper_fn_ty,
+    );
+    let baz = crate::declare::declare_simple_fn(
+        &cx,
+        &mapper_end,
+        llvm::CallConv::CCallConv,
+        llvm::UnnamedAddr::No,
+        llvm::Visibility::Default,
+        mapper_fn_ty,
+    );


name these the same as at the use site

oli-obk · 2025-07-01T16:02:04Z

I don't want to add any actual run tests, as these would require a working clang based on the same commit.

why is clang necessary for this?

ZuseZ4 · 2025-07-01T23:03:53Z

I don't want to add any actual run tests, as these would require a working clang based on the same commit.

why is clang necessary for this?

This time I started with dev guide docs! https://rustc-dev-guide.rust-lang.org/offload/installation.html#usage
Pretty much, to create and run the binary we have to implement multiple steps, and this is only the first step out of maybe 5.
Clang has the full pipeline implemented, so I just rely on it for the following steps, until we also implemented more in rustc.

ZuseZ4 · 2025-07-01T23:08:36Z

I... don't know if I can review this properly. I can review it from the "does this fit into how I want the llvm backend to look" side, but what it actually does just looks random to me.

Thanks! And no worries, I'm discussing the offloading design with @jdoerfert and @kevinsala. The memory transfer is pretty straightforward and not that interesting. The only question was how many layers of abstraction we wanted, but we made a decision which should be fine, we could always re-evaluate it later. For the Kernel launches PR I'll ask them to also review the code, but they aren't rust devs, so your reviews on the rustc side are definetly appreciated!

rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025

ZuseZ4 added F-gpu_offload `#![feature(gpu_offload)]` and removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025

This comment has been minimized.

Sign in to view

rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025

This comment has been minimized.

Sign in to view

rustbot added the F-autodiff `#![feature(autodiff)]` label Jun 9, 2025

This comment has been minimized.

Sign in to view

ZuseZ4 mentioned this pull request Mar 4, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

oli-obk reviewed Jun 12, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/back/lto.rs Show resolved Hide resolved

oli-obk reviewed Jun 12, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/context.rs Outdated Show resolved Hide resolved

oli-obk reviewed Jun 12, 2025

View reviewed changes

compiler/rustc_codegen_llvm/src/context.rs Outdated Show resolved Hide resolved

ZuseZ4 mentioned this pull request Jun 15, 2025

Expose experimental LLVM features for GPU offloading rust-lang/rust-project-goals#109

Open

4 tasks

ZuseZ4 force-pushed the offload-host1 branch from c8d7349 to 1c1953d Compare June 17, 2025 04:02

rustbot added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Jun 17, 2025

This comment has been minimized.

Sign in to view

ZuseZ4 force-pushed the offload-host1 branch from 1c1953d to f185093 Compare June 17, 2025 21:35

This comment has been minimized.

Sign in to view

bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jun 18, 2025

ZuseZ4 force-pushed the offload-host1 branch 2 times, most recently from 100f9f3 to 0fb93f0 Compare June 18, 2025 21:55

This comment has been minimized.

Sign in to view

ZuseZ4 added 2 commits June 18, 2025 15:25

make more builder functions generic

3debe14

add -Zoffload=Enable flag, to enable gpu (host) code generation

22a499f

ZuseZ4 force-pushed the offload-host1 branch from 0fb93f0 to 5d3ce24 Compare June 18, 2025 23:47

This comment has been minimized.

Sign in to view

ZuseZ4 marked this pull request as ready for review June 19, 2025 00:38

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 19, 2025

ZuseZ4 commented Jun 19, 2025

View reviewed changes

ZuseZ4 requested a review from oli-obk June 24, 2025 21:59

ZuseZ4 commented Jun 24, 2025

View reviewed changes

oli-obk requested changes Jul 1, 2025

View reviewed changes

ZuseZ4 added 3 commits July 2, 2025 16:35

add various wrappers for gpu code generation

f89744f

gpu host code generation

34fab8c

add gpu offload codegen host side test

ed8ae06

ZuseZ4 force-pushed the offload-host1 branch from 95fba51 to ed8ae06 Compare July 2, 2025 23:57

gpu offload host code generation #142097

Are you sure you want to change the base?

gpu offload host code generation #142097

Conversation

ZuseZ4 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZuseZ4 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZuseZ4 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This comment has been minimized.

This comment has been minimized.

bors commented Jun 18, 2025

ZuseZ4 commented Jun 18, 2025

This comment has been minimized.

This comment has been minimized.

rustbot commented Jun 19, 2025

ZuseZ4 left a comment

Choose a reason for hiding this comment

ZuseZ4 Jun 19, 2025

Choose a reason for hiding this comment

oli-obk Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ZuseZ4 commented Jun 24, 2025

ZuseZ4 Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

bors commented Jun 26, 2025

oli-obk left a comment

Choose a reason for hiding this comment

oli-obk Jul 1, 2025

Choose a reason for hiding this comment

oli-obk Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oli-obk Jul 1, 2025

Choose a reason for hiding this comment

oli-obk Jul 1, 2025

Choose a reason for hiding this comment

oli-obk commented Jul 1, 2025

ZuseZ4 commented Jul 1, 2025

ZuseZ4 commented Jul 1, 2025

ZuseZ4 commented Jun 5, 2025 •

edited

Loading

ZuseZ4 commented Jun 10, 2025 •

edited

Loading

ZuseZ4 commented Jun 12, 2025 •

edited

Loading

ZuseZ4 Jun 24, 2025 •

edited

Loading