-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Prevent ABI changes affect EnzymeAD #142544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
/// `#[rustc_autodiff_no_abi_opt]`: internal marker applied to `#[rustc_autodiff]` primal functions | ||
/// whose argument layout may be sensitive to ABI-level optimizations. This marker prevents certain | ||
/// optimizations that could otherwise break compatibility with Enzyme's expectations. | ||
const RUSTC_AUTODIFF_NO_ABI_OPT = 1 << 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't say "certain optimizations", or the next person who comes along is going to make it so that these functions are treated as -O0
. Identify the actual problem: LLVM will modify the ABI of functions if it can identify them as fully internalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A change that bad should be caught by a reviewer, but I review a lot of PRs and I enjoy it when the codebase informs people of what is actually going on so they are more likely to have made changes that are consistent with the existing situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the feedback. I’ll make sure to keep comments as specific as possible to avoid any ambiguity going forward.
fn is_abi_opt_sensitive<'tcx>(tcx: TyCtxt<'tcx>, ty: Ty<'tcx>) -> bool { | ||
match ty.kind() { | ||
ty::Ref(_, inner, _) | ty::RawPtr(inner, _) => { | ||
match inner.kind() { | ||
ty::Slice(_) => { | ||
// Since we cannot guarantee that the slice length is large enough | ||
// to avoid optimization, we assume it is ABI-opt sensitive. | ||
return true; | ||
} | ||
ty::Array(elem_ty, len) => { | ||
let Some(len_val) = len.try_to_target_usize(tcx) else { | ||
return false; | ||
}; | ||
|
||
let pci = PseudoCanonicalInput { | ||
typing_env: TypingEnv::fully_monomorphized(), | ||
value: *elem_ty, | ||
}; | ||
|
||
if elem_ty.is_scalar() { | ||
let elem_size = | ||
tcx.layout_of(pci).ok().map(|layout| layout.size).unwrap_or(Size::ZERO); | ||
|
||
if elem_size.bytes() * len_val <= tcx.data_layout.pointer_size.bytes() * 2 { | ||
return true; | ||
} | ||
} | ||
} | ||
_ => {} | ||
} | ||
|
||
false | ||
} | ||
ty::FnPtr(_, _) => true, | ||
_ => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only matter when Enzyme is on, right?
Why are you ignoring ty::Array when it is not through ty::Ref? Does Enzyme not even deal in simple aggregates like that? I'm not even sure this is the correct layer to be examining things like this at, since it's well above the LLVM IR type layer. Multiple types in Rust source can wind up being lowered to the naive equivalent of this in the IR.
Anyway, please name this fn as specific to Enzyme, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this logic is only for Enzyme. I didn't add the ty::Array
logic yet because I'm still not sure on how to handle it as, for example, [f32; 2]
is lowered to i64
. As the number of args does not change, Enzyme may not have issues with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have determined this is blocked on me to have a proper solution for it. I would like it if you opened an issue before this PR lands that points to https://github.com/rust-lang/rust/blame/86d0aef80403f095d8bbabf44d9fdecfcd45f076/compiler/rustc_target/src/callconv/mod.rs#L708 and your new code here, and says that a new variant of the adjust_for_rust_abi
code that doesn't just mutate the arguments needs to exist so that this query can be answered without relying on mutable state that cannot be invoked idempotently.
@@ -175,6 +179,7 @@ impl CodegenFnAttrs { | |||
self.flags.contains(CodegenFnAttrFlags::NO_MANGLE) | |||
|| self.flags.contains(CodegenFnAttrFlags::RUSTC_STD_INTERNAL_SYMBOL) | |||
|| self.export_name.is_some() | |||
|| self.flags.contains(CodegenFnAttrFlags::RUSTC_AUTODIFF_NO_ABI_OPT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also suppress the dead_code
lint. I think the reason the symbol is still getting marked as dso_local
even with this change is because it has SymbolExportLevel::Rust
. Only for SymbolExportLevel::C
do we tell LTO to export the symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll look into that and try to ensure it has the minimal number of side effects possible. Thank you :)
☔ The latest upstream changes (presumably #142770) made this pull request unmergeable. Please resolve the merge conflicts. |
1035486
to
e243a3c
Compare
This comment has been minimized.
This comment has been minimized.
Some changes occurred in compiler/rustc_monomorphize/src/partitioning/autodiff.rs cc @ZuseZ4 |
This comment has been minimized.
This comment has been minimized.
// debug-NEXT: %_2 = load float, ptr %0, align 4, !alias.scope !7, !noalias !4 | ||
// debug-NEXT: %"'ipg2" = getelementptr inbounds float, ptr %"x'", i64 1 | ||
// debug-NEXT: %1 = getelementptr inbounds nuw float, ptr %x, i64 1 | ||
// debug-NEXT: %"_5'ipl" = load float, ptr %"'ipg2", align 4, !alias.scope !4, !noalias !7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove , !alias.scope !4, !noalias !7
and similar scope and metadata annotations. align can stay.
They are fragile, numbers might change and we want to avoid test failures because of it.
//@ no-prefer-dynamic | ||
//@ needs-enzyme | ||
|
||
// This does only test the funtion attribute handling for autodiff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't about function attributes, or? It's more about verifying that Rust types are lowered to LLVM-IR types in a way that we expect and which enzyme can handle. We also explicitely check release mode, to verify that LLVM's O3 pipeline does not rewrite function signatures into something that Enzyme can not handle anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was initially and forgot to remove that. I'll fix it and add a more detailed comment about what's this test for.
.non_enum_variant() | ||
.fields | ||
.iter() | ||
.map(|f| count_scalar_fields(tcx, f.ty(tcx, substs))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to recursively count and sum?
I think that anything behind a double indirection probably won't affect the size on the function abi, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing a bit, i think the recursive sumation (or any other way of counting the "non-splittable" fields) is necessary becase if the aggregate has more than 2 fields when flattened, it's behaving slightly different, even when under the pointer size. I'll adjust it to consider this cases.
|
||
let is_product = |t: Ty<'tcx>| matches!(t.kind(), ty::Tuple(_) | ty::Adt(_, _)); | ||
|
||
if layout.size() <= pointer_size * 2 && is_product(*ty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note that this is the magic number based on which LLVM might optimize?
I am not 100% sure about the recursive summation here. In general also, @oli-obk can you review this, as I just don't know if PseudoCanonicalInput and fully_monomorphize is something we want here. |
Note(Sa4dUs): As LLVM-IR opt passes are executed after passing LLVM to Enzyme, most of the cases have turned out to not be problematic. Anyways, we still test them to prevent any kind of regression.
Update `count_scalar_fields`->`count_leaf_fields` to support more types Add extra activities only if `count_scalar_fields` is leq 2 Logic can be optimized if needed Removed metadata specific fields from test to avoid future fails.
07b10dd
to
39d1efc
Compare
Once the solution is decent, I can optimize minor things. I leave it for the end to not optimize on something that is not correct. |
This PR handles ABI changes for autodiff input arguments to improve Enzyme compatibility. A couple of cases (like statics and small arrays) are still not handled.
r? @ZuseZ4