Add OperandValue::Uninit to improve lowering of MaybeUninit::uninit #142837

saethlin · 2025-06-21T17:30:02Z

This is a fix for #139355 and it refines what I introduced in #138634 (which was itself a fix for #138625).

In my assessment, the reason we keep running into these weird optimization problems is that there is no holistic effort in LLVM to make uninit bytes stay uninit across optimizations. In the [MaybeUninit<u8>; N] case, SROA wants to run really early and turn as many aggregates into scalars as possible. To do this, tries to replace our memset with undef of a [i8 x N] with a write of an integer where possible. This is an easy optimization to fix up (just write an undef integer) but the i8 value comes out of memory so we'd need to run mem2reg before SROA, but currently mem2reg runs immediately after SROA because SROA enables mem2reg.

So my solution in this PR is to generate cleaner IR in the first place so that LLVM doesn't have to work so hard to fix our mess. There are two parts to my solution.

The library diff is tiny but very important. GVN does not const-propagate union aggregate rvalues, so we need to create the const manually. Perhaps we should also improve MIR optimizations to do this without library assistance, but that seems hard.

Then in codegen_operand we now generate OperandValue::Uninit if we codegen a mir::Operand::Constant where all bytes of the constant are uninit. And this propagates the efficient lowering across the backend, usually doing nothing or lowering to a const undef.

cc @scottmcm because you asked about this approach before: #138634 (comment)

In the previous PR (#138634) I said:

It is technically correct to just do nothing. But if we actually do nothing, we may miss that this is de-initializing something

And now that I've learned more I think my reasoning was wrong. In every case I've looked at where OperandValue::Uninit kicks in, we are "initializing" a fresh allocation. So while this codegen could theoretically be worse, I think the approach in this PR has been proven out to be more robust.

cc @nikic in case I'm mischaracterizing LLVM

MaybeImprove MaybeUninit I'm investigating possible solutions to #139355.

saethlin · 2025-06-22T14:35:53Z

I think the binary size increases in #142837 (comment) are from LLVM combining a big memset of undef with an i32 store of 0 into a big memset of 0, then lowering the memset as a bunch of movs. Which is #138625, and I suspect it's back because I only applied the fix for that issue in the codegen for Rvalue::Repeat. Perhaps now that I'm doing something to Rvalue::Use as well we will see a different behavior.

Though this makes me wonder if we should have some more fundamental solution. What I'm really trying to lower is a deinitialize, but it's particularly goofy because the MIR we generate is immediately doing a deinitializing assignment to a just-allocated local.

MaybeImprove MaybeUninit I'm investigating possible solutions to #139355.

rust-timer · 2025-06-23T09:25:47Z

Finished benchmarking commit (0620a6a): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.3%, 0.4%]	2
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.3%, 0.4%]	2

Max RSS (memory usage)

Results (primary -1.1%, secondary 4.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	8.7%	[8.7%, 8.7%]	1
Regressions ❌ (secondary)	4.2%	[4.2%, 4.2%]	1
Improvements ✅ (primary)	-5.9%	[-6.8%, -5.0%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.1%	[-6.8%, 8.7%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.4%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.2%	[0.0%, 0.4%]	3

Bootstrap: 689.715s -> 689.073s (-0.09%)
Artifact size: 371.92 MiB -> 371.90 MiB (-0.00%)

rustbot · 2025-06-23T22:51:08Z

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-06-23T22:51:10Z

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

WaffleLapkin

r=me with nits

WaffleLapkin · 2025-06-24T20:57:27Z

compiler/rustc_codegen_ssa/src/mir/operand.rs

@@ -67,9 +67,14 @@ pub enum OperandValue<V> {
    /// `is_zst` on its `Layout` returns `true`. Note however that
    /// these values can still require alignment.
    ZeroSized,
+    Uninit,


Could you add some docs here?

Agreed -- this is changing the "well if it's is_backend_immediate then it's OperandValue::Immediate" rule, for example, so it needs to be thought about carefully since it affects all the consumers of OperandValue, potentially. (And thus probably needs those comments on the other variants updated too.)

For example, it's not clear to me why Immediate(cx.const_undef(…)) wouldn't be fine as representing undef for things with is_backend_immediate.

For example, it's not clear to me why Immediate(cx.const_undef(…)) wouldn't be fine as representing undef for things with is_backend_immediate.

I'm not sure what you mean by "fine". It's valid to do so even with this PR, it just doesn't fix the missed optimization. It does improve the IR that we emit, if also combined with the change to MaybeUninit::uninit. But all that it accomplishes is tripping over a different problem in LLVM. It also breaks the manybeuninit-nrvo codegen tests.

This is the diff I applied to try out your idea:

--- a/compiler/rustc_codegen_ssa/src/mir/operand.rs +++ b/compiler/rustc_codegen_ssa/src/mir/operand.rs @@ -170,6 +170,13 @@ pub(crate) fn from_const<Bx: BuilderMethods<'a, 'tcx, Value = V>>( OperandValue::Pair(a_llval, b_llval) } ConstValue::Indirect { alloc_id, offset } => { + if val.all_bytes_uninit(bx.tcx()) { + if let BackendRepr::Scalar(_) = layout.backend_repr { + let llval = bx.const_undef(bx.immediate_backend_type(layout)); + let val = OperandValue::Immediate(llval); + return OperandRef { val, layout }; + } + }; let alloc = bx.tcx().global_alloc(alloc_id).unwrap_memory(); return Self::from_const_alloc(bx, layout, alloc, offset); }

WaffleLapkin · 2025-06-24T21:01:45Z

compiler/rustc_codegen_ssa/src/mir/operand.rs

@@ -591,6 +602,28 @@ impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> {
 }

 impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, Result<V, abi::Scalar>> {
+    fn update_uninit<Bx: BuilderMethods<'a, 'tcx, Value = V>>(


What do these 2 methods do? Could you maybe write a brief doc comment for each?

WaffleLapkin · 2025-06-24T21:05:59Z

library/core/src/mem/maybe_uninit.rs

+        // It is very helpful for codegen to know when are writing uninit bytes. MIR optimizations
+        // currently do not const-propagate unions, but if we create the const manually that can be
+        // trivially propagated. See #142837.


Suggested change

// It is very helpful for codegen to know when are writing uninit bytes. MIR optimizations

// currently do not const-propagate unions, but if we create the const manually that can be

// trivially propagated. See #142837.

// It is very helpful for codegen to know when we are writing uninit bytes. MIR optimizations

// currently do not const-propagate unions, but if we create the const manually that can be

// trivially propagated. See #142837.

scottmcm · 2025-06-26T01:13:48Z

tests/codegen/maybeuninit.rs

+// CHECK-LABEL: @create_ptr
+#[no_mangle]
+fn create_ptr() -> MaybeUninit<&'static str> {
+    // CHECK-NEXT: start:
+    // CHECK-NEXT: ret { ptr, i64 } undef
+    MaybeUninit::uninit()
+}


Note that for scalar pair cases like this I'm already fixing it in https://github.com/rust-lang/rust/pull/138759/files#diff-68480918205d32f0b23d06ba76c5fbd702b7dc842f0f5cf262db6e2e6ae3c630R51-R59 (That also handles partial uninit cases like None::<u32> too.)

I wonder if it's only OperandValue::Ref that needs to be handled specially as a result, rather than a different top-level variant.

This already compiles to the ret undef, I'm adding a codegen test for this because if you pick the wrong backend type for the ret (which I did, repeatedly), everything works fine except that you hit an LLVM assertion when compiling more complicated code.

nikic · 2025-06-26T07:54:34Z

In my assessment, the reason we keep running into these weird optimization problems is that there is no holistic effort in LLVM to make uninit bytes stay uninit across optimizations. In the [MaybeUninit<u8>; N] case, SROA wants to run really early and turn as many aggregates into scalars as possible. To do this, tries to replace our memset with undef of a [i8 x N] with a write of an integer where possible. This is an easy optimization to fix up (just write an undef integer) but the i8 value comes out of memory so we'd need to run mem2reg before SROA, but currently mem2reg runs immediately after SROA because SROA enables mem2reg.

So my solution in this PR is to generate cleaner IR in the first place so that LLVM doesn't have to work so hard to fix our mess. There are two parts to my solution.

Yeah, this is tricky to solve on the LLVM side. We generally want to constant fold as early as possible, even if it means losing undef. In this case, constant folding a larger pattern (the whole undef splat) would allow preserving the undef, but it has already been lost at that point.

scottmcm · 2025-06-27T21:01:48Z

GVN does not const-propagate union aggregate rvalues, so we need to create the const manually. Perhaps we should also improve MIR optimizations to do this without library assistance, but that seems hard.

Are you hitting these problems ever for partially-uninit values? Or is it always just the MyUnion { blah: () } case?

If it's just the latter, I think this might not be that hard. We could add an Uninit, variant to https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/gvn/enum.Value.html, emitted only for Rvalue::Aggregate with https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.AggregateKind.html#variant.Adt having its field index set (marking a union) when that field is a ZST. Then GVN could Repeat that Uninit to also return Uninit, letting us make constants for this that are directly and clearly also Uninit, rather than codegen seeing repeat-of-constant. (Assuming, at least, that const { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() } already at least works?)

Basically, I'm worried about adding another variant to things that everyone has to think about in codegen. I can easily imagine code that's expecting () to always be OperandValue::ZeroSized, for example, but this PR would mean they might also be OperandValue::Uninit, if I'm reading things right. And unless I was thinking about this very specifically, I'd probably never write a test for something like getting a () via MaybeUninit::<(MaybeUninint<u8>, ())>::uninit().assume_init().1, so it'd be really easy to get codegen ICEs for wrong variants that we don't find for a while. If we can fix the MIR before codegen instead, and not have to think about it, that'd be nice.

saethlin · 2025-06-27T21:48:27Z

I agree that fixing the MIR seems like a more systematic solution, I'm just wary of doing creative things to GVN and adding more miscompiles. Don't interpret the lack of recent A-rustlantis issues as evidence that they are in a good place; I've stopped fuzzing them because I found a string of reproducers that all reduced to the issues I've already filed so I stopped fuzzing MIR opts because working through the reduction was a waste of my time.

(Assuming, at least, that const { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() } already at least works?)

Yes that currently optimizes to a ret undef.

rustbot added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 21, 2025