Skip to content

Add OperandValue::Uninit to improve lowering of MaybeUninit::uninit #142837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jun 21, 2025

This is a fix for #139355 and it refines what I introduced in #138634 (which was itself a fix for #138625).

In my assessment, the reason we keep running into these weird optimization problems is that there is no holistic effort in LLVM to make uninit bytes stay uninit across optimizations. In the [MaybeUninit<u8>; N] case, SROA wants to run really early and turn as many aggregates into scalars as possible. To do this, tries to replace our memset with undef of a [i8 x N] with a write of an integer where possible. This is an easy optimization to fix up (just write an undef integer) but the i8 value comes out of memory so we'd need to run mem2reg before SROA, but currently mem2reg runs immediately after SROA because SROA enables mem2reg.

So my solution in this PR is to generate cleaner IR in the first place so that LLVM doesn't have to work so hard to fix our mess. There are two parts to my solution.

The library diff is tiny but very important. GVN does not const-propagate union aggregate rvalues, so we need to create the const manually. Perhaps we should also improve MIR optimizations to do this without library assistance, but that seems hard.

Then in codegen_operand we now generate OperandValue::Uninit if we codegen a mir::Operand::Constant where all bytes of the constant are uninit. And this propagates the efficient lowering across the backend, usually doing nothing or lowering to a const undef.

cc @scottmcm because you asked about this approach before: #138634 (comment)

In the previous PR (#138634) I said:

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something

And now that I've learned more I think my reasoning was wrong. In every case I've looked at where OperandValue::Uninit kicks in, we are "initializing" a fresh allocation. So while this codegen could theoretically be worse, I think the approach in this PR has been proven out to be more robust.

cc @nikic in case I'm mischaracterizing LLVM

@rustbot rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 21, 2025
@saethlin

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 21, 2025
bors added a commit that referenced this pull request Jun 21, 2025
MaybeImprove MaybeUninit

I'm investigating possible solutions to #139355.
@bors

This comment was marked as outdated.

@rust-log-analyzer

This comment has been minimized.

@bors

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rust-timer

This comment was marked as outdated.

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 21, 2025
@saethlin
Copy link
Member Author

I think the binary size increases in #142837 (comment) are from LLVM combining a big memset of undef with an i32 store of 0 into a big memset of 0, then lowering the memset as a bunch of movs. Which is #138625, and I suspect it's back because I only applied the fix for that issue in the codegen for Rvalue::Repeat. Perhaps now that I'm doing something to Rvalue::Use as well we will see a different behavior.

Though this makes me wonder if we should have some more fundamental solution. What I'm really trying to lower is a deinitialize, but it's particularly goofy because the MIR we generate is immediately doing a deinitializing assignment to a just-allocated local.

@saethlin

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 22, 2025
@bors

This comment was marked as outdated.

bors added a commit that referenced this pull request Jun 22, 2025
MaybeImprove MaybeUninit

I'm investigating possible solutions to #139355.
@bors

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rust-timer

This comment was marked as outdated.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 22, 2025
@saethlin

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rust-bors

This comment was marked as outdated.

rust-bors bot added a commit that referenced this pull request Jun 23, 2025
MaybeImprove MaybeUninit

I'm investigating possible solutions to #139355.
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 23, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment was marked as outdated.

@saethlin

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rust-bors

This comment was marked as outdated.

rust-bors bot added a commit that referenced this pull request Jun 23, 2025
MaybeImprove MaybeUninit

I'm investigating possible solutions to #139355.
@rust-bors

This comment was marked as outdated.

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0620a6a): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.3% [0.3%, 0.4%] 2
Regressions ❌
(secondary)
0.2% [0.2%, 0.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.3%, 0.4%] 2

Max RSS (memory usage)

Results (primary -1.1%, secondary 4.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
8.7% [8.7%, 8.7%] 1
Regressions ❌
(secondary)
4.2% [4.2%, 4.2%] 1
Improvements ✅
(primary)
-5.9% [-6.8%, -5.0%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.1% [-6.8%, 8.7%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.4%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [0.0%, 0.4%] 3

Bootstrap: 689.715s -> 689.073s (-0.09%)
Artifact size: 371.92 MiB -> 371.90 MiB (-0.00%)

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jun 23, 2025
@saethlin saethlin force-pushed the maybeuninit-codegen branch from 1414ccc to ead2221 Compare June 23, 2025 13:01
@saethlin saethlin changed the title MaybeImprove MaybeUninit Jun 23, 2025
@saethlin saethlin marked this pull request as ready for review June 23, 2025 22:51
@rustbot
Copy link
Collaborator

rustbot commented Jun 23, 2025

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 23, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 23, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Copy link
Member

@WaffleLapkin WaffleLapkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me with nits

@@ -67,9 +67,14 @@ pub enum OperandValue<V> {
/// `is_zst` on its `Layout` returns `true`. Note however that
/// these values can still require alignment.
ZeroSized,
Uninit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some docs here?

Copy link
Member

@scottmcm scottmcm Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- this is changing the "well if it's is_backend_immediate then it's OperandValue::Immediate" rule, for example, so it needs to be thought about carefully since it affects all the consumers of OperandValue, potentially. (And thus probably needs those comments on the other variants updated too.)

For example, it's not clear to me why Immediate(cx.const_undef(…)) wouldn't be fine as representing undef for things with is_backend_immediate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, it's not clear to me why Immediate(cx.const_undef(…)) wouldn't be fine as representing undef for things with is_backend_immediate.

I'm not sure what you mean by "fine". It's valid to do so even with this PR, it just doesn't fix the missed optimization. It does improve the IR that we emit, if also combined with the change to MaybeUninit::uninit. But all that it accomplishes is tripping over a different problem in LLVM. It also breaks the manybeuninit-nrvo codegen tests.

This is the diff I applied to try out your idea:

--- a/compiler/rustc_codegen_ssa/src/mir/operand.rs
+++ b/compiler/rustc_codegen_ssa/src/mir/operand.rs
@@ -170,6 +170,13 @@ pub(crate) fn from_const<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
                 OperandValue::Pair(a_llval, b_llval)
             }
             ConstValue::Indirect { alloc_id, offset } => {
+                if val.all_bytes_uninit(bx.tcx()) {
+                    if let BackendRepr::Scalar(_) = layout.backend_repr {
+                        let llval = bx.const_undef(bx.immediate_backend_type(layout));
+                        let val = OperandValue::Immediate(llval);
+                        return OperandRef { val, layout };
+                    }
+                };
                 let alloc = bx.tcx().global_alloc(alloc_id).unwrap_memory();
                 return Self::from_const_alloc(bx, layout, alloc, offset);
             }
@@ -591,6 +602,28 @@ impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> {
}

impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, Result<V, abi::Scalar>> {
fn update_uninit<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do these 2 methods do? Could you maybe write a brief doc comment for each?

Comment on lines +331 to +333
// It is very helpful for codegen to know when are writing uninit bytes. MIR optimizations
// currently do not const-propagate unions, but if we create the const manually that can be
// trivially propagated. See #142837.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It is very helpful for codegen to know when are writing uninit bytes. MIR optimizations
// currently do not const-propagate unions, but if we create the const manually that can be
// trivially propagated. See #142837.
// It is very helpful for codegen to know when we are writing uninit bytes. MIR optimizations
// currently do not const-propagate unions, but if we create the const manually that can be
// trivially propagated. See #142837.
@WaffleLapkin WaffleLapkin self-assigned this Jun 24, 2025
@WaffleLapkin WaffleLapkin added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 24, 2025
Comment on lines +26 to +32
// CHECK-LABEL: @create_ptr
#[no_mangle]
fn create_ptr() -> MaybeUninit<&'static str> {
// CHECK-NEXT: start:
// CHECK-NEXT: ret { ptr, i64 } undef
MaybeUninit::uninit()
}
Copy link
Member

@scottmcm scottmcm Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for scalar pair cases like this I'm already fixing it in https://github.com/rust-lang/rust/pull/138759/files#diff-68480918205d32f0b23d06ba76c5fbd702b7dc842f0f5cf262db6e2e6ae3c630R51-R59 (That also handles partial uninit cases like None::<u32> too.)

I wonder if it's only OperandValue::Ref that needs to be handled specially as a result, rather than a different top-level variant.

Copy link
Member Author

@saethlin saethlin Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already compiles to the ret undef, I'm adding a codegen test for this because if you pick the wrong backend type for the ret (which I did, repeatedly), everything works fine except that you hit an LLVM assertion when compiling more complicated code.

@nikic
Copy link
Contributor

nikic commented Jun 26, 2025

In my assessment, the reason we keep running into these weird optimization problems is that there is no holistic effort in LLVM to make uninit bytes stay uninit across optimizations. In the [MaybeUninit<u8>; N] case, SROA wants to run really early and turn as many aggregates into scalars as possible. To do this, tries to replace our memset with undef of a [i8 x N] with a write of an integer where possible. This is an easy optimization to fix up (just write an undef integer) but the i8 value comes out of memory so we'd need to run mem2reg before SROA, but currently mem2reg runs immediately after SROA because SROA enables mem2reg.

So my solution in this PR is to generate cleaner IR in the first place so that LLVM doesn't have to work so hard to fix our mess. There are two parts to my solution.

Yeah, this is tricky to solve on the LLVM side. We generally want to constant fold as early as possible, even if it means losing undef. In this case, constant folding a larger pattern (the whole undef splat) would allow preserving the undef, but it has already been lost at that point.

@scottmcm
Copy link
Member

scottmcm commented Jun 27, 2025

GVN does not const-propagate union aggregate rvalues, so we need to create the const manually. Perhaps we should also improve MIR optimizations to do this without library assistance, but that seems hard.

Are you hitting these problems ever for partially-uninit values? Or is it always just the MyUnion { blah: () } case?

If it's just the latter, I think this might not be that hard. We could add an Uninit, variant to https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/gvn/enum.Value.html, emitted only for Rvalue::Aggregate with https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.AggregateKind.html#variant.Adt having its field index set (marking a union) when that field is a ZST. Then GVN could Repeat that Uninit to also return Uninit, letting us make constants for this that are directly and clearly also Uninit, rather than codegen seeing repeat-of-constant. (Assuming, at least, that const { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() } already at least works?)

Basically, I'm worried about adding another variant to things that everyone has to think about in codegen. I can easily imagine code that's expecting () to always be OperandValue::ZeroSized, for example, but this PR would mean they might also be OperandValue::Uninit, if I'm reading things right. And unless I was thinking about this very specifically, I'd probably never write a test for something like getting a () via MaybeUninit::<(MaybeUninint<u8>, ())>::uninit().assume_init().1, so it'd be really easy to get codegen ICEs for wrong variants that we don't find for a while. If we can fix the MIR before codegen instead, and not have to think about it, that'd be nice.

@saethlin
Copy link
Member Author

saethlin commented Jun 27, 2025

I agree that fixing the MIR seems like a more systematic solution, I'm just wary of doing creative things to GVN and adding more miscompiles. Don't interpret the lack of recent A-rustlantis issues as evidence that they are in a good place; I've stopped fuzzing them because I found a string of reproducers that all reduced to the issues I've already filed so I stopped fuzzing MIR opts because working through the reduction was a waste of my time.

(Assuming, at least, that const { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() } already at least works?)

Yes that currently optimizes to a ret undef.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
9 participants