Description
Not sure if this is an Internal Compiler Error or a Bug, apologies if I picked the wrong one. (To be honest I'm also not sure if this is Rust's fault or LLVM so I will post something to the LLVM folks as well).
Recently I have been running into some strange errors which seem to only occur on some architectures. See
Plonky3/Plonky3#729
Plonky3/Plonky3#905
For more info about the situation where these arose. I managed to make a minimal example in Godbolt so I thought it made sense to post this here.
Code
Probably easier to understand in the following Godbolt link: https://godbolt.org/z/rrrj3eobd
Basically we define a simple function which adds integers mod a prime P
. We then use this on vectors of length 4
. What you find is that if you check (x + y) + z = x + (y + z)
manually, everything works as expected and they agree. If however we define a function which computes (x + y) + z
, x + (y + z)
and then checks that they are equal we get an error. I agree this seems totally bizarre, please check out the Godbolt.
use std::array;
/// The Baby Bear prime: 2^31 - 2^27 + 1.
const P: u32 = 2013265921;
// To help read the assembly, note: 2^32 - P = 2281701375
/// Addition modulo P.
///
/// Inputs are asusmed to be < P.
/// Assuming this, outputs will also be < P.
#[inline(always)]
fn add(lhs: u32, rhs: u32) -> u32 {
let mut sum = lhs + rhs; // Never overflows as inputs are < P < 2^31.
let (corr_sum, over) = sum.overflowing_sub(P);
// over is false if sum >= P and true if sum < P.
if !over {
sum = corr_sum;
}
sum
}
/// Addition modulo P in a degree 4 extension.
///
/// Identical to 4 additions in parallel.
#[unsafe(no_mangle)]
pub fn add_bb_deg_4_ext(lhs: [u32; 4], rhs: [u32; 4]) -> [u32; 4] {
array::from_fn(|i| add(lhs[i], rhs[i]))
}
/// Compute (lhs + mid) + rhs
#[unsafe(no_mangle)]
pub fn add_bb_assoc_l(lhs: [u32; 4], mid: [u32; 4], rhs: [u32; 4]) -> [u32; 4] {
let lhs = add_bb_deg_4_ext(lhs, mid);
add_bb_deg_4_ext(lhs, rhs)
}
/// Compute lhs + (mid + rhs)
#[unsafe(no_mangle)]
pub fn add_bb_assoc_r(lhs: [u32; 4], mid: [u32; 4], rhs: [u32; 4]) -> [u32; 4] {
let rhs = add_bb_deg_4_ext(mid, rhs);
add_bb_deg_4_ext(lhs, rhs)
}
/// Check that (lhs + mid) + rhs = lhs + (mid + rhs)
#[unsafe(no_mangle)]
pub fn check_assoc(lhs: [u32; 4], mid: [u32; 4], rhs: [u32; 4]) {
let assoc_l = add_bb_assoc_l(lhs, mid, rhs);
let assoc_r = add_bb_assoc_r(lhs, mid, rhs);
assert_eq!(assoc_l, assoc_r);
}
fn main() -> () {
let lhs = [252551971, 694974649, 213757600, 1325013984];
let mid = [506156623, 97664653, 1234719014, 1349792299];
let rhs = [1626423134, 1338438783, 786682629, 1311208151];
// Check all elements are < P
assert!(lhs.iter().all(|&x| x < P));
assert!(mid.iter().all(|&x| x < P));
assert!(rhs.iter().all(|&x| x < P));
// Let's manually check that
// (lhs + mid) + rhs = lhs + (mid + rhs)
let assoc_l = add_bb_assoc_l(lhs, mid, rhs);
let assoc_r = add_bb_assoc_r(lhs, mid, rhs);
// Check that (x + y) + z = x + (y + z)
println!("{assoc_l:?}");
println!("{assoc_r:?}");
assert_eq!(assoc_l, assoc_r);
println!("The two assoc's are equal");
// Everything up until here runs.
// Now let's check the same thing using our
// function check_assoc
check_assoc(lhs, mid, rhs);
// This fails with compilation option:
// -O -C target-cpu=znver4 -C opt-level=3
// among others.
}
Meta
The error occurs when compiling with current rustc 1.880, beta and nightly. It seems to have been introduced in the move to LLVM 20
as these errors do not appear on nightly-2025-02-17
but things begin failing on nightly-2025-02-18
.
The error occurs when using the compiler flags: -O -C target-cpu=znver4 -C opt-level=3
.
It does not occur with opt-level=0, 1, 2
, it does however occur with some other target cpus in particular mic_avx512
and znver5
but not with others such as skylake, skylake_avx512, alderlake, raptorlake
. (If you want a complete list I can go through and check them all)
Error output
See the Godbolt link for more details. Essentially what seems to be happening is that something goes wrong in the vectorization code? It's hard to say exactly what though. The compiled code for check_assoc
seems to be reasonable? We are pretty lost as to what is going on.