-
Notifications
You must be signed in to change notification settings - Fork 79
fix 128bits ctlz intrinsincs UB #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
It looks like the some CI jobs are failing on
which is some code I didn't touch. Can my changes have some side effects ? Any hints ? |
Not sure it's related. I opened a new dummy PR to check if the tests pass, though. |
The CI passed, so the problem is in here. I suspect this is because you changed the return type: it used to return an array type and your change makes it return a |
The return type of the
which will get an element in the array and cast it as result_type which is Edit: formatting |
Ok, finally found the line pointing in my change The problem I see is that the if branch should have been taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay: I've been very busy.
src/intrinsic/mod.rs
Outdated
let result = self.current_func() | ||
.new_local(None, array_type, "count_loading_zeroes_results"); | ||
|
||
// Algorithm from: https://stackoverflow.com/a/28433850/389119 updated to check for high 64bits being 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems your new algorithm significantly diverges from the one linked here.
If so, could you please replace this link with another pointing to the algorithm you used? If you can't, could you please add comments below to make this easier to understand?
Hi, sorry for the delay, I missed your reply notification and got caught up by IRL stuff. I'm going to add a comment on the new algorithm which is:
This adds an if to the generated code but we need to check for 0 before calling I think there is also a problem of UB in intrinsic/simd.rs let index = bx.context.new_rvalue_from_long(bx.i32_type, i as i64);
let value = bx.extract_element(vector, index).to_rvalue();
let value_type = value.get_type();
let element = if name == sym::simd_ctlz {
bx.count_leading_zeroes(value_type.get_size() as u64 * 8, value)
} else {
bx.count_trailing_zeroes(value_type.get_size() as u64 * 8, value)
}; where 0 is not special cased like in intrinsic/mod.rs. So I'll move the 0 special case handling in As found by @FractalFir, the same problems apply to trailing_zeros / cttz |
1f5f68e
to
68a69d8
Compare
There is still a diff in the handling of clz and ctz intrinsics in the fn count_leading_zeroes(&mut self, width: u64, arg: RValue<'gcc>) -> RValue<'gcc> {
...
if arg_type.is_uint(self.cx) {
"__builtin_clz"
} vs fn count_trailing_zeroes(&mut self, _width: u64, arg: RValue<'gcc>) -> RValue<'gcc> {
...
if arg_type.is_uchar(self.cx) || arg_type.is_ushort(self.cx) || arg_type.is_uint(self.cx) {
// NOTE: we don't need to & 0xFF for uchar because the result is undefined on zero.
("__builtin_ctz", self.cx.uint_type)
} They both have the same comment // TODO(antoyo): write a new function Type::is_compatible_with(&Type) and use it here
// instead of using is_uint(). Should I add the |
@antoyo Do you have any hints to reproduce the |
If you compile gcc yourself, you can apply this patch. Otherwise, you can try setting this variable to false, but I'm not sure if this will have the same behavior. |
|
||
let zero_hi = self.const_uint(high.get_type(), 0); | ||
let cond = self.gcc_icmp(IntPredicate::IntNE, high, zero_hi); | ||
block.end_with_conditional(self.location, cond, ctlz_then_block, ctlz_else_block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is likely to be because gcc_icmp
creates new blocks for non-native 128-bit integers.
So, you can probably fix the issue with this:
block.end_with_conditional(self.location, cond, ctlz_then_block, ctlz_else_block); | |
let block = self.llbb(); | |
block.end_with_conditional(self.location, cond, ctlz_then_block, ctlz_else_block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the condition must be created on the else_block that is passed as an argument like in gcc_icmp for example.
But that gave me a hint.
If I make the code look like
if width == 128 && !self.cx.supports_128bit_integers {
// this code will exhibit UB when arg is 0
self.count_leading_zeroes_nonzero_impl(...)
} else {
let result_type = self.u32_type;
let result = self.current_func().new_local(None, result_type, "zeros");
let then_block = self.current_func().new_block("zero_then");
let else_block = self.current_func().new_block("zero_else");
let after_block = self.current_func().new_block("zero_after");
let zero = self.cx.const_uint(arg.get_type(), 0);
let cond = self.gcc_icmp(IntPredicate::IntEQ, arg, zero);
self.llbb().end_with_conditional(None, cond, then_block, else_block);
let zero_result = self.cx.gcc_uint(self.u32_type, width);
then_block.add_assignment(None, result, zero_result);
then_block.end_with_jump(None, after_block);
self.switch_to_block(else_block);
let res = self.count_leading_zeroes_nonzero_impl(width, arg);
else_block.add_assignment(None, result, res);
else_block.end_with_jump(None, after_block);
self.switch_to_block(after_block);
result.to_rvalue()
}
The compilation pass. So, you're right, there must be something with the comparison of the 128bit arg when the context doesn't support 128bit integers.
What's weird is that I made exactly the same change on the count_trailing_zeros
function and it compiles fine.
What's even weirder is that when I copy and rename count_trailing_zeros*
functions to count_leading_zeros
, I get the compiler error.
That leads me to think there is something elsewhere that treats count_trailing_zeros
and count_leading_zeros
differently regarding 128bit integers.
Making a special case for 128bit integer on platform without 128bit integers doesn't look like a good idea and I'd prefer to find the real root of the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that gcc_icmp
will not only create new blocks, but will end the current block because it has conditions.
In order to workaround this issue, you need to use llbb()
to get the new current block that will still be in your else
branch.
Fixes #604
This is the GIMPLE code generated