Skip to content

Add Write implementation for [MaybeUninit<u8>] #142849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Jun 21, 2025

There’s Write implement for slice of bytes so it’s not unreasonable to also have one for slice of MaybeUninit bytes.

This makes dealing with uninitialised memory slightly easier in some cases (though it sadly doesn’t eliminate use of unsafe once the data is written) and makes it possible to use existing APIs which Write data (e.g. serde_json::to_writer) to initialise memory.

@rustbot
Copy link
Collaborator

rustbot commented Jun 21, 2025

r? @ibraheemdev

rustbot has assigned @ibraheemdev.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 21, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

There’s `Write` implement for slice of bytes so it’s not unreasonable
to also have one for slice of `MaybeUninit` bytes.

This makes dealing with uninitialised memory slightly easier in some
cases (though it sadly doesn’t eliminate use of `unsafe` once the data
is written) and makes it possible to use existing APIs which `Write`
data (e.g. `serde_json::to_writer`) to initialise memory.
@rust-log-analyzer
Copy link
Collaborator

The job mingw-check-2 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
   |
47 |     let mut buf = Vec::with_capacity(1024);
   |                   ------------------------ type must be known at this point
...
51 |         let mut wr = &mut buf[..];
   |             ^^^^^^
   |
help: consider giving `wr` an explicit type, where the type for type parameter `T` is specified
   |
51 |         let mut wr: &mut [T] = &mut buf[..];
   |                   ++++++++++

error[E0283]: type annotations needed for `&mut [_]`
   --> library/std/src/io/impls/tests.rs:51:13
    |
51  |         let mut wr = &mut buf[..];
    |             ^^^^^^
52  |         for _ in 0..8 {
53  |             let _ = wr.write_all(&src);
    |                        --------- type must be known at this point
    |
note: multiple `impl`s satisfying `&mut [_]: io::Write` found
   --> library/std/src/io/impls.rs:428:1
    |
428 | impl Write for &mut [u8] {
    | ^^^^^^^^^^^^^^^^^^^^^^^^
...
487 | impl Write for &mut [mem::MaybeUninit<u8>] {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: consider giving `wr` an explicit type, where the type for mutable reference `&mut [_]` is specified
    |
51  |         let mut wr: &mut [T] = &mut buf[..];
    |                   ++++++++++

[RUSTC-TIMING] stdbenches test:true 8.489
[RUSTC-TIMING] num test:true 1.041
[RUSTC-TIMING] test test:true 16.154
@ibraheemdev
Copy link
Member

ibraheemdev commented Jun 29, 2025

This will likely need an ACP before an implementation PR can be reviewed. r? libs-api

@rustbot rustbot added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 29, 2025
@rustbot rustbot assigned Amanieu and unassigned ibraheemdev Jun 29, 2025
@Amanieu Amanieu added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jun 29, 2025
@hanna-kruppe
Copy link
Contributor

This makes dealing with uninitialised memory slightly easier in some cases (though it sadly doesn’t eliminate use of unsafe once the data is written) and makes it possible to use existing APIs which Write data (e.g. serde_json::to_writer) to initialise memory.

Note that there is a subtle soundness requirement for doing either of those things: you have to trust that the writing code actually initialized as many bytes as the updated &mut [MaybeUninit<u8>] suggests they did. I don’t think there’s any check you can do to ensure the buffer was actually initialized. For example, consider this function:

fn skip_writes(w: &mut &mut [MaybeUninit<u8>]) -> io::Result<()> {
    *w = &mut w[w.len()..];
    Ok(())
}

It’s a little harder to do the same in a function that just impl Write or &mut dyn Write. But Rust has very little if any parametricity so it’s probably safe to assume there are (or will be in the future) variants of skip_writes in those contexts too. For example, with impl Write + Any one can try to downcast to recover the concrete writer type.

@Amanieu
Copy link
Member

Amanieu commented Jul 1, 2025

Could you expand on the motivation for this change. Specifically, could you give a code example where this is used and you recover the number of bytes actually written to the buffer. We discussed this in the @rust-lang/libs-api meeting and it seems like this use case may perhaps be better serve with some sort of cursor API.

@mina86
Copy link
Contributor Author

mina86 commented Jul 2, 2025

Specifically, could you give a code example where this is used and you recover the number of bytes actually written to the buffer.

The length is recovered the same way it’s recovered when writing to &mut [u8]. You take the length at the start and from it subtract the length at the end. Example below.

We discussed this in the @rust-lang/libs-api meeting and it seems like this use case may perhaps be better serve with some sort of cursor API.

Why doesn’t that argument apply to &mut [u8]? If we have Write implementation for &mut [u8] it’s only natural to have one for &mut [MU<u8>] as well.


fn try_urlencoded_data_url(
    buf: &mut [MaybeUninit<u8>],
    ctype: &str,
    data: &str,
) -> Option<usize> {
    let capacity = buf.len();
    let mut cursor = Cursor(buf);

    cursor.write_all(b"'data:").ok()?;
    cursor.write_all(ctype.as_bytes()).ok()?;
    cursor.write_all(b",").ok()?;

    // URL-encoding
    for byte in data.bytes() {
        if byte != b'\'' && is_url_safe(byte) {
            cursor.write_all(core::slice::from_ref(&byte)).ok()?;
        } else {
            const HEX: [u8; 16] = *b"0123456789ABCDEF";
            cursor.write_all(&[
                b'%',
                HEX[usize::from(byte >> 4)],
                HEX[usize::from(byte & 15)],
            ]).ok()?;
        }
    }

    cursor.write_all(b"'").ok()?;
    Some(capacity - cursor.len())
}

struct Cursor<'a>(&'a mut [MaybeUninit<u8>]);
impl Cursor<'_> { fn len(&self) -> usize { self.0.len() } }
impl std::io::Write for Cursor<'_> { /* ... */ }

fn is_url_safe(byte: u8) { todo!() }
@hanna-kruppe
Copy link
Contributor

Now try to write the code that calls try_urlencoded_data_url and uses the written data as &[u8]. You’ll find that this requires an unsafe block whose soundness depends on the safe function try_urlencoded_data_url being functionally correct (returning Ok(n) only if it initialized the first n bytes of the buffer). This can be okay-ish if all code involved is in the same carefully written module, but it’s not very composable and also harder to audit for memory safety because the unsafe bits can’t be considered in isolation. In contrast, a variant of try_urlencoded_data_url writing to &mut [u8] can be incorrect (return a bogus length) without risk of UB in the caller. At worst it’ll give bogus data (duh) and an out of bounds length (leading to a panic if the caller uses it for slicing without checking).

Note that std already has a safe cursor API for writing to uninitialized buffers, std::io::Borrowed{Buf,Cursor}. It’s just unstable.

@mina86
Copy link
Contributor Author

mina86 commented Jul 2, 2025

You’ll find that this requires an unsafe block

Yes. Using MaybeUninit often involves unsafe blocks. That’s not a gotcha.

In contrast, a variant of try_urlencoded_data_url writing to &mut [u8] can be incorrect (return a bogus length) without risk of UB in the caller.

It’s also slower.

Note that std already has a safe cursor API for writing to uninitialized buffers, std::io::Borrowed{Buf,Cursor}. It’s just unstable.

Yes, I’m keenly aware of it and there’s a reason I call it Rust’s worst feature. Using BorrowedBuf is like bringing a tank to kill a mosquito.

@hanna-kruppe
Copy link
Contributor

It’s not about unsafe being needed at all, it’s about how you can encapsulate the unsafe to make its soundness depends on as little trusted code as possible.

@tgross35
Copy link
Contributor

tgross35 commented Jul 2, 2025

There is some topical discussion going on at #t-libs > Future of read_buf about the design at https://docs.rs/buffer-trait/latest/buffer_trait/, which seems to resolve a few of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-libs-api-nominated Nominated for discussion during a libs-api team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
7 participants