-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Add OperandValue::Uninit to improve lowering of MaybeUninit::uninit #142837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
MaybeImprove MaybeUninit I'm investigating possible solutions to #139355.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
I think the binary size increases in #142837 (comment) are from LLVM combining a big memset of undef with an i32 store of 0 into a big memset of 0, then lowering the memset as a bunch of movs. Which is #138625, and I suspect it's back because I only applied the fix for that issue in the codegen for Though this makes me wonder if we should have some more fundamental solution. What I'm really trying to lower is a deinitialize, but it's particularly goofy because the MIR we generate is immediately doing a deinitializing assignment to a just-allocated local. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
MaybeImprove MaybeUninit I'm investigating possible solutions to #139355.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
MaybeImprove MaybeUninit I'm investigating possible solutions to #139355.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
MaybeImprove MaybeUninit I'm investigating possible solutions to #139355.
This comment was marked as outdated.
This comment was marked as outdated.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (0620a6a): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.1%, secondary 4.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 689.715s -> 689.073s (-0.09%) |
1414ccc
to
ead2221
Compare
rustbot has assigned @workingjubilee. Use |
Some changes occurred in compiler/rustc_codegen_ssa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with nits
@@ -67,9 +67,14 @@ pub enum OperandValue<V> { | |||
/// `is_zst` on its `Layout` returns `true`. Note however that | |||
/// these values can still require alignment. | |||
ZeroSized, | |||
Uninit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some docs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- this is changing the "well if it's is_backend_immediate
then it's OperandValue::Immediate
" rule, for example, so it needs to be thought about carefully since it affects all the consumers of OperandValue
, potentially. (And thus probably needs those comments on the other variants updated too.)
For example, it's not clear to me why Immediate(cx.const_undef(…))
wouldn't be fine as representing undef for things with is_backend_immediate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, it's not clear to me why
Immediate(cx.const_undef(…))
wouldn't be fine as representing undef for things withis_backend_immediate
.
I'm not sure what you mean by "fine". It's valid to do so even with this PR, it just doesn't fix the missed optimization. It does improve the IR that we emit, if also combined with the change to MaybeUninit::uninit
. But all that it accomplishes is tripping over a different problem in LLVM. It also breaks the manybeuninit-nrvo codegen tests.
This is the diff I applied to try out your idea:
--- a/compiler/rustc_codegen_ssa/src/mir/operand.rs
+++ b/compiler/rustc_codegen_ssa/src/mir/operand.rs
@@ -170,6 +170,13 @@ pub(crate) fn from_const<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
OperandValue::Pair(a_llval, b_llval)
}
ConstValue::Indirect { alloc_id, offset } => {
+ if val.all_bytes_uninit(bx.tcx()) {
+ if let BackendRepr::Scalar(_) = layout.backend_repr {
+ let llval = bx.const_undef(bx.immediate_backend_type(layout));
+ let val = OperandValue::Immediate(llval);
+ return OperandRef { val, layout };
+ }
+ };
let alloc = bx.tcx().global_alloc(alloc_id).unwrap_memory();
return Self::from_const_alloc(bx, layout, alloc, offset);
}
@@ -591,6 +602,28 @@ impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> { | |||
} | |||
|
|||
impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, Result<V, abi::Scalar>> { | |||
fn update_uninit<Bx: BuilderMethods<'a, 'tcx, Value = V>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do these 2 methods do? Could you maybe write a brief doc comment for each?
// It is very helpful for codegen to know when are writing uninit bytes. MIR optimizations | ||
// currently do not const-propagate unions, but if we create the const manually that can be | ||
// trivially propagated. See #142837. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It is very helpful for codegen to know when are writing uninit bytes. MIR optimizations | |
// currently do not const-propagate unions, but if we create the const manually that can be | |
// trivially propagated. See #142837. | |
// It is very helpful for codegen to know when we are writing uninit bytes. MIR optimizations | |
// currently do not const-propagate unions, but if we create the const manually that can be | |
// trivially propagated. See #142837. |
// CHECK-LABEL: @create_ptr | ||
#[no_mangle] | ||
fn create_ptr() -> MaybeUninit<&'static str> { | ||
// CHECK-NEXT: start: | ||
// CHECK-NEXT: ret { ptr, i64 } undef | ||
MaybeUninit::uninit() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that for scalar pair cases like this I'm already fixing it in https://github.com/rust-lang/rust/pull/138759/files#diff-68480918205d32f0b23d06ba76c5fbd702b7dc842f0f5cf262db6e2e6ae3c630R51-R59 (That also handles partial uninit cases like None::<u32>
too.)
I wonder if it's only OperandValue::Ref
that needs to be handled specially as a result, rather than a different top-level variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already compiles to the ret undef
, I'm adding a codegen test for this because if you pick the wrong backend type for the ret
(which I did, repeatedly), everything works fine except that you hit an LLVM assertion when compiling more complicated code.
Yeah, this is tricky to solve on the LLVM side. We generally want to constant fold as early as possible, even if it means losing undef. In this case, constant folding a larger pattern (the whole undef splat) would allow preserving the undef, but it has already been lost at that point. |
Are you hitting these problems ever for partially-uninit values? Or is it always just the If it's just the latter, I think this might not be that hard. We could add an Basically, I'm worried about adding another variant to things that everyone has to think about in codegen. I can easily imagine code that's expecting |
I agree that fixing the MIR seems like a more systematic solution, I'm just wary of doing creative things to GVN and adding more miscompiles. Don't interpret the lack of recent A-rustlantis issues as evidence that they are in a good place; I've stopped fuzzing them because I found a string of reproducers that all reduced to the issues I've already filed so I stopped fuzzing MIR opts because working through the reduction was a waste of my time.
Yes that currently optimizes to a |
This is a fix for #139355 and it refines what I introduced in #138634 (which was itself a fix for #138625).
In my assessment, the reason we keep running into these weird optimization problems is that there is no holistic effort in LLVM to make uninit bytes stay uninit across optimizations. In the
[MaybeUninit<u8>; N]
case, SROA wants to run really early and turn as many aggregates into scalars as possible. To do this, tries to replace our memset with undef of a[i8 x N]
with a write of an integer where possible. This is an easy optimization to fix up (just write an undef integer) but the i8 value comes out of memory so we'd need to run mem2reg before SROA, but currently mem2reg runs immediately after SROA because SROA enables mem2reg.So my solution in this PR is to generate cleaner IR in the first place so that LLVM doesn't have to work so hard to fix our mess. There are two parts to my solution.
The library diff is tiny but very important. GVN does not const-propagate union aggregate rvalues, so we need to create the const manually. Perhaps we should also improve MIR optimizations to do this without library assistance, but that seems hard.
Then in
codegen_operand
we now generateOperandValue::Uninit
if we codegen amir::Operand::Constant
where all bytes of the constant are uninit. And this propagates the efficient lowering across the backend, usually doing nothing or lowering to a const undef.cc @scottmcm because you asked about this approach before: #138634 (comment)
In the previous PR (#138634) I said:
And now that I've learned more I think my reasoning was wrong. In every case I've looked at where
OperandValue::Uninit
kicks in, we are "initializing" a fresh allocation. So while this codegen could theoretically be worse, I think the approach in this PR has been proven out to be more robust.cc @nikic in case I'm mischaracterizing LLVM