| Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently, sparse do all its inlining at the tree level, during
constant expansion. To not mix-up the evaluation of the original
function body in case the address of an inline function is taken or
when the function can't otherwise be inlined, the statements and
symbols lists of inline functions are kept in separated fields.
Then, if the original body must be evaluated it must first be
'uninlined' to have a copy in the usual fields.
This make sense when dealing with the definition of the function.
But, when using typeof() on functions, the resulting type doesn't
refer to this definition, it's just a copy of the type and only
of the type. There shouldn't be any reasons to uninline anything.
However, the distinction between 'full function' and 'type only'
is not made during evaluation and the uninlining attempt produce
a lot of "warning: unreplaced symbol '...'" because of the lack
of a corresponding definition.
Fix this by not doing the uninlining if the symbol lack a definition.
Note: It would maybe be more appropriate for EXPR_TYPE to use
a stripped-own version of evaluate_symbol() doing only the
examination of the return and argument types, bypassing the
attempt to uninline the body and evaluate the initializer and
the statements since there is none of those for an EXPR_TYPE.
Link: https://lore.kernel.org/all/202206191726.wq70mbMK-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
into next
* no needs to use MARK_CURRENT_DELETED() for multi-jumps
* canonicalize ((x & M) == M) --> ((x & M) != 0) when M is a power-of-2
* simplify AND(x >= 0, x < C) --> (unsigned)x < C
* simplify TRUNC(x) {==,!=} C --> AND(x,M) {==,!=} C
* remove early simplification of casts during evaluation
* but this back as simplificaion of TRUNC(NOT(x)) --> NOT(TRUNC(x))
|
|
The goal is double:
1) be able to do the NOT operation on the smaller type
2) more importantly, give the opportunity to the TRUNC to
cancel with a previous ZEXT if there is one.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Such compares with a signed value are relatively common and can be
easily be simplified into a single unsigned compare. So, do it.
Note: This simplification triggers only 27 times in a x86-64 defconfig
kernel. I expected more but I suppose it's because most checks
aren't done against a constant or are done with unsigned values.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, signed compares against a constant are canonicalized
toward the smallest possible constant. So, the following
canonicalization are done:
x < 256 --> x <= 255
x < -2047 --> x <= -2048
This has two advantages:
1) it maximalizes the number of constants possible for a given bit size.
2) it allows to remove all < and all >=
But it has also a serious disadvantages: a simple comparison against
zero, like:
x >= 0
is canonicalized into:
x > -1
Which can be more costly for some architectures if translated as such ,
is also less readable than the version using 0 and is also sometimes
quite more complicated to match in some simplification patterns.
So, canonicalize it using 'towards 0' / using the smallest constant
in absolute value.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* fix and improve the check that protects try_to_simplify_bb()
* fix remove_merging_phisrc() with duplicated CFG edges.
|
|
and remove one that didn't made much sense.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The current implementation of remove_merging_phisrc() can't work correctly
when these phi-sources belong to a basic block with several children
to the same target basic block (this happens commonly with OP_SWITCH).
Fix this by directly scanning the source basic block for the presence
of any phi-source. Once identified, the processing is kept unchanged:
remove these phi-sources if a sibling phi-source will 'overwrite' them
in the target block.
Fixes: 2fdaca9e7175e62f08d259f5cb3ec7c9725bba68
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In a phi-node,pseudo_list_size() can't be used for counting its arguments
because VOIDs must be ignored.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Like for CBR-BR conversion, when a target BB containing one or several
phi-nodes is removed from an OP_SWITCH, the corresponding phi-source
must be removed from the phi-node.
However this is not done yet.
Changing this by adding some code to convert_to_jump() to remove all
phi-sources from the discarded targets if the converted instruction
is an OP_SWITCH.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If a conditional branch has identical targets, it should be
converted to a simple jump.
This is done but using its own code.
Change this by using the existing convert_to_jump() instead.
This also allows any redundant phi-sources to be removed.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
When a parent is removed from a BB containing one or several phi-nodes,
the corresponding phi-sources must be removed from the phi-node.
However this is not done and consequentially:
* it becomes impossibly to correctly reason about the flow of values
through these phi-nodes.
* simplifications are missed (e.g. if-conversion).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
and same for its dual: ((x & M) != M) --> ((x & M) == 0)
Beside the canonicalization itself, these simplifications are
useful because the compare against 0 can often be further
simplified (for example when it is used by OP_CBR or OP_SELECT).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
These tests are written by testing if the comparisons are equal
to their expected value: 0 or 1. So, a compare of a compare
but such compares of compare have their own simplification
which defeats what's tested here.
So, rewrite the test to avoid such compares of compare.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
When computing the absolute value using an expression like:
(a > 0) ? a : -a
it's irrelevant to use '>' or '>=', both will give the same
result since 0 is its own negation.
Canonicalize these equivalent expressions, such that OP_GE
is always used.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Both compares and OP_SELECT are anti-symmetrical: swapping
the arguments is equivalent to inversing the condition.
As consequence, when combined, they're symmetrical:
swapping the arguments of the compare (or equivalently reversing
the direction of the compare) and swapping the operand of the
OP_SELECT is a no-op, both forms are equivalent.
So, canonicalize these to always use OP_GT or OP_GE.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Modify the constants to canonicalize (x < C) to (x <= (C-1)) and
(x <= C) to (x > (C-1)). This choice is partially arbitrary but
1) it's the one with the smallest positive constants,
2) it eliminates all OP_SET_LT & OP_SET_GE with a constant.
A disadvantage of this choice is that we lost some compares with 0:
(x < 0) is now canonicalized into (x <= -1).
Note: Another good choice would be to canonicalize using the
smallest absolute constants. This would keep compares with 0
but would also keep the 4 kinds of comparison.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Compares with SMIN + 1 or SMAX - 1 are equivalent to an equality
testing. For example, (x < SMIN + 1) is the same as (x == SMIN).
Canonicalize these to the equality testing since these are usually
simpler to handle.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The remaining compares with SMIN or SMAX are equivalent to an
equality testing. For example, (x < SMAX) is the same as (x != SMAX).
Canonicalize these to the equality testing since these are usually
simpler to handle.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Simplify away signed compares with SMIN or SMAX which can be
statically be determined to be always true or always false.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed compares miss some simplifications/canonicalizations.
Add some testcases for them.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Commit a1c1b9236d5d ("cmp: simplify sext(x) cmps {SMAX,SMIN}")
had a double error (wrong size and wrong compare direction) which
was hidden because of too narrow testcases.
So, fix the simplification and extend the testcases.
Fixes: a1c1b9236d5d4af1681a45ced26f8350bd7721c2
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A logical shift right followed by a sign extension is equivalent
to an arithmetic shift. Teach this to sparse.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* factorize (x OP1 z) OP2 (y OP1 z) into (x OP2 y) OP1 z
* factorize SHIFT(x, s) OP SHIFT(y, s) into SHIFT((x OP y), s)
* factorize SEL(x, OP(y,z), y) into OP(SEL(x, z, 0), y)
* convert SEL(x & BIT1, BIT2, 0) into SHIFT(x & BIT1, S)
|
|
Convert an expression like:
(x & (1 << A)) ? (1 << B) : 0
into:
(x & (1 << A)) << (B - A)
or:
(x & (1 << A)) >> (A - B)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
'Factorize' and expression like:
x ? (y | z) : y;
into
(x ? z : 0) | y;
and some positional variants as well as replacing '|' by '+' or '^'.
Note: it's not very clear if this is really an improvement but it
allows some very nice simplification of 'bits translations'.
Note: the same can be done for others operations, for example it can
be done for '&' if '0' (the neuter value for '|', '+' and '^')
by '~0' (same with '*' and '1').
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add some testcase related to the simplification of expressions like:
if (val1 & BIT1)
val2 |= BIT2;
into
val2 |= (val1 & BIT1) {SHL/LSR} |BIT2-BIT1|;
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Factorize bitwise OPs of shifts with identical counts.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Factorize common distributive operations:
(x * z) + (y * z) --> (x + y) * z
(x | z) & (y | z) --> (x & y) | z
(x & z) | (y & z) --> (x | y) & z
(x & z) ^ (y & z) --> (x ^ y) & z
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add some testcases for factorizations of:
(x * z) + (y * z) --> (x + y) * z
(x | z) & (y | z) --> (x & y) | z
(x & z) | (y & z) --> (x | y) & z
(x & z) ^ (y & z) --> (x ^ y) & z
and
(x << s) | (y << s) --> ((x | y) << s)
and same for &, ^, LSR and ASR.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A phi-node is called 'trivial' if it only reference itself and a
single other value. In this case the only possible value for the
phi-node is this single other value which can thus replace the
phi-node.
However, the current code get this slightly wrong when the first
referenced value is itself and not the other value.
Fix this by moving up the test checking if it references itself.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* put PSEUDO_ARGS and PSEUDO_REGs in canonical order too
* simplify (~x {&,|,^} x) --> {0,~0,~0}
* simplify ((x cmp y) {&,|,^} (x !cmp y)) --> {0,1,1}
|
|
* simplification of computed gotos with 1 or 2 targets
|
|
Simplify bitwise operations on a compare and its complement
into 0 (for &) or 1 for (| and ^).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Simplify bitwise operations on a pseudo and its complement
into 0 (for &) or ~0 for (| and ^).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, only binops containing PSEUDO_VAL, SYM or ARG were
put in canonical order. This means that binops containing only
PSEUDO_REGs are not ordered. This is not directly
a problem for CSE because commutativity is taken in account but:
* more combination need to be checked during simplification
* 'anti-commutative' operations like (a > b) & (b < a) are not
recognized as such.
So, take PSEUDO_REGs in account when checking if operands
are in canonical order.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, only binops containing PSEUDO_VAL or PSEUDO_SYM were
put in canonical order. This means that binops containing only
PSEUDO_ARGs or PSEUDO_REGs are not ordered. This is not directly
a problem for CSE because commutativity is taken in account but:
* more combination need to be checked during simplification
* 'anti-commutative' operations like (a > b) & (b < a) are not
recognized as such.
So, as a first step, also take PSEUDO_ARGs in account when checking
if operands are in canonical order.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Convert OP_SETVAL of a label into a new instruction: OP_LABEL.
There is 2 reasons to do this:
*) there is slightly less checking to be done in later phases
(since OP_SETVAL can be for labels but also strings)
*) OP_SETVAL is CSEd but this is largely useless because this
instruction is hashed on the expression's address and these
are (most) often not shared. With a separate instruction
for label expressions, their CSE is now OK because the hashing
is done on the BB.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A computed goto having as operand a select of 2 statically known addresses
(OP_SETVAL/EXPR_LABEL) is equivalent to a simple conditional branch.
Simplify such computed goto into the corresponding OP_CBR
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If the OP_COMPUTEDGOTO's source pseudo is defined by a single
OP_SETVAL/EXPR_LABEL, then the corresponding basic block is the
only possible destination and the computed goto can then be
simplified into a simple branch.
So, convert such computed goto into a simple OP_BR which may
then participate in other flow simplifications.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The linearization step sometimes creates a lot of intermediate
basic blocks, often containing just a branch. Their presence often
make things more complicated than needed (more work to do in later
phases, visual clutter when inspection the IR 'by hand') and they
can sometimes, indirectly hinder some optimizations.
Happily, most of them can trivially be optimized away.
So, add a CFG simplification phase running very early and doing:
*) jump threading (eliminate jump to jump)
*) merge single-child/sinle-parents basic blocks.
These changes slightly decrease the number of 'context imbalance'
warnings (32 less on a total of 995 warnings) and the size of
the generated IR (only ~0.4% but this is very significant relatively
to most other simplifications).
They also seem to improve the kernel tests' running time:
before after
real 4m19.261s real 4m17.548s
user 72m03.634s user 71m34.642s
sys 29m05.573s sys 29m01.856s
but it's probably just noise.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, in the main optimization loop, simplify_memops()
is only called if REPEAT_SYMBOL_CLEANUP have been set. This
has (at least) two problems:
1) simplify_memops() may itself create other 'symbol cleanup'
opportunities. That's fine and when it happens REPEAT_SYMBOL_CLEANUP
is correctly set but this is directly lost because repeat_phase
is cleared just after. So, loads and stores are not always
optimized away as they should.
2) Loads & stores are not always done directly on symbols, in
fact, it often happens that they are done on some PSEUDO_REG.
Here too, loads and stores are not always optimized away as
they should.
So, call simplify_memops() unconditionally.
Note: this have only very small effects on the tests' running time:
before after after too
real 4m18.001s real 4m18.655s real 4m19.4
user 71m32.911s user 72m02.701s user 72m06.6
sys 29m06.523s sys 29m01.721s sys 29m06.8
which is under the noise I usually have.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
When merging two basic blocks, it may happen that both of theses
blocks contain a phi-source for the same phi-node. In this case,
only the phi-source from the bottom BB must be taken in account,
it kinda overwrites the value from the top BB and the phi-source
from the top BB must be ignored, in fact it must be removed.
However, it is not the case and this extra phi-source creates
different kinds of problems. Among other things, it hinders
further simplifications. For example, the following code:
extern int array[2];
static inline int stupid_select(int idx)
{
if (idx)
idx = 0;
return array[idx];
}
int select(void)
{
int d = stupid_select(-1);
return d;
}
should boil down to a simple dereference of the array with
an index of zero, like:
select:
load.32 %r8 <- 0[array]
ret.32 %r8
but currently gives:
select:
phisrc.32 %phi3(idx) <- $0xffffffff
phisrc.32 %phi4(idx) <- $0
phi.32 %r12(idx) <- %phi3(idx), %phi4(idx)
sext.64 %r5 <- (32) %r12(idx)
mul.64 %r6 <- %r5, $4
add.64 %r7 <- %r6, array
load.32 %r8 <- 0[%r7]
ret.32 %r8
This patch takes care of the problem by:
* when merging 2 BBs, check when reaching a phi-source in the bottom BB
* if one is found, look after sibling phi-sources
* remove such sibling if belonging to the top BB.
With this change, the code above gives:
select:
phisrc.32 %phi4(idx) <- $0
phi.32 %r12(idx) <- %phi4(idx)
sext.64 %r5 <- (32) %r12(idx)
mul.64 %r6 <- %r5, $4
add.64 %r7 <- %r6, array
load.32 %r8 <- 0[%r7]
ret.32 %r8
which can the be simplified into the expected result.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
When merging BBs, phi-sources from the bottom BB should 'overwrite'
the ones from the top BB which should be ignored.
Add a testcase from the incoming fix.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Reduced testcases (with creduce, of course) often needlessly have
undefined variables. Since these are untouched by the simplification
code and should not be present in source code, they should be avoided
in optimization testcases.
So, defines 'x' to some value other than 0.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* simplify & canonicalize compares
|
|
The dual simplification select(x, 0, x) --> 0 was already
done but this one was forgotten, so add it now.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
This simplification is already handled but explicitly kills it's 2
operands while this is done automatically when killing the instruction.
Also, it uses replace_with_value(0) which needs to recreate the
pseudo for 0 while it's already available via its operands.
So, changes to use replace_with_pseudo() and without the unneeded kills.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Compare instructions with both operands sign or zero-extended
from the same original size are equivalent to a compare of
the original values. If the values were zero-extended, a signed
compare becomes an unsigned one.
Simplify away the sign/zero-extensions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
An unsigned compare of a zero-extended value against a
constant outside of the original range is statically known.
Simplify to the corresponding 0/1.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A signed compare of a zero-extended value against a constant
outside of the original range is statically known.
Simplify to the corresponding 0/1.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A unsigned compare of a sign-extended value against a value
bigger than the original SMAX is equivalent to a signed
compare against 0.
Canonicalize to the signed compare against 0.
Note: ultimately both forms are just a test of the sign bit.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A signed compare of a sign-extended value against a constant
outside of the original range is statically known.
Simplify to the corresponding 0/1.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
When doing a compare of a zero-extended value against a constant,
this extension can be dropped and the comparison done on the
original type if the constant is within the original range and
signed compares become the corresponding unsigned one.
Simplify away these sign-extensions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
When doing a compare of a sign-extended value against a constant
the, sign-extension can be dropped and the comparison done on the
original type if the constant is within the original range.
Simplify away these sign-extensions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Unsigned <= or > against SMAX are equivalent to testing if the
value is positive or not (when interpreted as a signed number).
Canonicalize to this positive/negative test since it only needs
the constant 0 which make it easier to handle at later steps.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Unsigned compares with UMAX (or UMAX-1) are equivalent to
equality tests. These are preferable since it's easier to reason
about them in other simplifications.
So canonicalize these compares to equality tests.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
These compares are always true or false, so simplify them.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
An unsigned comparison like (x < 3) is equivalent to (x <= 2).
Canonicalize '<' & '>=' to '<=' & '>', such that the smallest
constant is used.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If the condition of a select instruction is a equality test of the
select's operands, then the result of the select is always the same as
its second operand. Same for the first operand with an inequality test.
Simplify away these selects.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If the condition of a select is also a select, with constant
but non-zero operands, then the result of this inner select
is always true and the outer select can be replaced by its
second operand.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If the condition of a select is also a select but with constant
operands, some simplification can be done:
* if the second operand is 0, the original condition can be used,
* idem if the first operand is 0s but the operand must be swapped.
Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The current testcase, because it's just checking test-linearize's
output as-is, is very sensitive to small simplification changes.
Fix this by changing the tests into equivalent tests and then just
checking that these tests return '1'. This allows to test only what
really matters for canonicalization and make these tests very robust.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The return type of IR instructions is stored in the field
::type of struct instruction and this struct has no space
to hold the type of the operand(s). This is not a problem
for most instructions because there is an easy way to get
the operands' type. For example, for binops both types
must be the same so they are used interchangeably.
However, for compare instructions both types can be different
and there is no easy way to get the type of the operands.
Currently, this is ignored and creates some errors. It
also blocks simplifications that need this type information.
But compares instructions need only 2 operands, there is
thus one 'slot' left. So, use this slot for the operands' type.
This solves the current errors, allows new simplifications
and has very little impact on existing code. Of course,
this type information needs now to be tracked and adjusted
whenever the operands change or an instruction is changed
into a compare.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Because of the lack of type information, compare instruction are
not always handled correctly. So, add some testcases for this.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The current tags check-output-contains/excludes/pattern are
quite powerful and the new check-output-match is easy to use
but it can be even simpler. Indeed, a lot of IR simplifications/
canonicalizations can be tested by checking that the expression
to be tested is equivalent to another one. This is less precise
than some more specific tests but:
* it's a big advantage because it's less sensitive to 'noise'
like the exact number used by the pseudos or to the results
of some new simplifications or canonicalizations
* very often, this equivalence is what really matters and not
the exact transformation.
So, add a new utra-simple-to-use tag: just ask that all functions
of the tests return the same specified value (usually 1):
check-output-returns: <value>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The current tags check-output-contains/excludes/pattern are
quite powerful, universal, but they often need 'complex' regular
expressions with escaping which make them not so nice to read.
For testing IR results, a very common pattern is:
this instruction must have this (kind of) operand.
So, make a new tag for this. It does nothing than can't be done
with done with the previous ones, on the contrary, but is much
simpler to use:
check-output-match(instruction): operand
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* simplify and canonicalize unsigned compares
* basic unop simplifications
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Some unsigned compares against 0 or 1 are equivalent to testing
equality with 0 (x <= 0, x > 0, x < 1, x >= 1).
Canonicalize them to this later, more common form.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Some unsigned compares against 0 are always true or always false
(x < 0 or x >= 0). Simplify them.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add a few testcases for the simplification of unary operations.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* essential OP_ADD & OP_SUB simplifications
|
|
The patterns used here were based on looser semantic for OP_{SEXT,TRUNC}.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* teach sparse about -funsigned-bitfields
* let plain bitfields default to signed
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Not really a simplification in itself but it make some other
simplification a little easier (already because there is one
argument less to be matched).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Do this simplification once for all associative binops.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
At expansion phase, when simplified, all constants are truncated
to the size of the operations that generate them.
This should be done during simplification too because:
*) if some constants are sometimes truncated and sometimes
sign-extended, CSE will miss some opportunities.
*) it's not possible to sign-extend them because it's not
always known if the constant is used in a signed context or not.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add some testcases about basic simplifications of additions
and subtractions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, Sparse treats 'plain' bitfields as unsigned.
However, this is this is inconsistent with how non-bitfield integers
are handled and with how GCC & clang handle bitfields.
So, teach sparse about '-funsigned-bitfields' and by default treat
these bitfields are signed, like done by GCC & clang and like done
for non-bitfield integers.
Also, avoid plain bitfields in IR related testcases.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Sparse complains when a shift amount is too big for the size
of its operand or if it's negative.
However, it does this even for expressions that are never evaluated.
It's especially annoying in the kernel for type generic macros,
for example the ones in arch/*/include/asm/cmpxchg.h
So, remove all warnings done at expansion time and avoid any
simplifications of such expressions. Same, at linearization
and optimization time but in this case mark the instructions as
'tainted' to inhibit any further simplifications. Finally, at the
end of the optimization phase, warn for the tainted instructions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If the flag '-m64' is used on a 32-bit architecture/machine having
int32_t set to 'long', then these int32_t are forced to 64-bit ...
So, ignore the effect of -m64 on these archs and ignore
'64-bit only' tests on them.
Reported-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Tested-by: Uwe Kleine-König <uwe@kleine-koenig.org>
|
|
The test optim/cse-size fials on 32-bit because it needs
two integers of different size but uses int & long.
These two types have indeed different sizes on 64-bit
(LP64) but not on 32-bit (ILP32).
Fix this by using short & int.
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* remove more complex phi-nodes
|
|
In a set of related phi-nodes and phi-sources if all phi-sources
but one correspond to the target of one of the phi-sources, then
no phi-nodes is needed and all %phis can be replaced by the unique
source.
For example, code like:
int test(void);
int foo(int a)
{
while (test())
a ^= 0;
return a;
}
used to produce an IR with a phi-node for 'a', like:
foo:
phisrc.32 %phi2(a) <- %arg1
br .L4
.L4:
phi.32 %r7(a) <- %phi2(a), %phi3(a)
call.32 %r1 <- test
cbr %r1, .L2, .L5
.L2:
phisrc.32 %phi3(a) <- %r7(a)
br .L4
.L5:
ret.32 %r7(a)
but since 'a ^= 0' is a no-op, the value of 'a' is in fact
never mofified. This can be seen in the phi-node where its
second operand (%phi3) is the same as its target (%r7). So
the only possible value for 'a' is the one from the first
operand, its initial value (%arg1).
Once this trivial phi-nodes is removed, the IR is the expected:
foo:
br .L4
.L4:
call.32 %r1 <- test
cbr %r1, .L4, .L5
.L5:
ret.32 %arg1
Removing these trivial phi-nodes will usually trigger other
simplifications, especially those concerning the CFG.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Trivial phi-nodes are phi-nodes having an unique possible outcome.
So, there is nothing to join and the phi-node target can be replaced
by the unique value.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Accesses to volatiles must, of course, not be optimized away.
For this, we need to check to type associated to the memory access.
Currently this is done by checking if the type of the result of
the memops is volatile or not. Usualy, the type of the result is
the same as the one of the access so everything is good but for
bitfields, the memop is not done with the type of the bitfield
itself but to its base type. Since this base type is unrelated
to the access type, it is generaly not marked as volatile even
when the access to the bitfield is volatile.
Fix this by using the true type of the access to set the field
struct instruction::is_volatile.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Accesses to bitfields must, of course, not be optimized away.
This is currently not the case.
Add a testcase for it.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* do 'classical' SSA conversion (via the iterated dominance frontier).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* fix buggy recursion in kill_dead_stores()
* kill dead stores again after memops simplification is done.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* simplify TRUNC((x & M') | y, N)
* simplify AND(SHIFT(a | b, S), M)
* simplify TRUNC(SHIFT(a | b, S), N)
|
|
The simplification of TRUNC(SHIFT(a | b, S), N) can be done by
combining the effective mask corresponding to TRUNC(_, N) with
the one corresponding to SHIFT(_, S).
This allows to also simplify signed bitfields. For example, code like:
struct s {
signed int :2;
signed int f:3;
};
int bfs(struct s s, int a)
{
s.f = a;
return s.f;
}
is now simplified into the minimal:
bfs:
trunc.3 %r4 <- (32) %arg2
sext.32 %r11 <- (3) %r4
ret.32 %r11
The simplification is done by calling simplify_mask_shift() with
the mask corresponding to TRUNC(_, N).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The simplification of AND(SHIFT(a | b, S), M) can be done by combining
the mask M with the effective mask corresponding to SHIFT(_, S).
This instruction pattern is generated when accessing bitfields,
for example, code like:
struct u {
unsigned int :2;
unsigned int f:3;
};
int bfu(struct u s, int a)
{
s.f = a;
return s.f;
}
is now simplified into the minimal:
bfu:
and.32 %r11 <- %arg2, $7
ret.32 %r11
The simplification is done by introducing a small helper,
simplify_mask_shift(), doing the pattern matching and then calling
simplify_mask_shift_or() with the mask M.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A N-bit truncate is not much different than ANDing with
a N-bit mask and so some simplifications done for AND can
also be done for TRUNC. For example for code like this:
char foo(int x, int y) { return (x & 0xffff) | y; }
the mask is unneeded and the function should be equivalent to:
char foo(int x, int y) { return x | y; }
The simplification in this patch does exactly this, giving:
foo:
or.32 %r4 <- %arg1, %arg2
trunc.8 %r5 <- (32) %r4
ret.8 %r5
while previously the mask was not optimized away:
foo:
and.32 %r2 <- %arg1, $0xffff
or.32 %r4 <- %r2, %arg2
trunc.8 %r5 <- (32) %r4
ret.8 %r5
This simplification is especially important for signed bitfields
because the TRUNC+ZEXT of unsigned bitfields is simplified into
an OP_AND but this is, of course, not the case for the TRUNC+SEXT
of signed bitfields.
Do the simplification by calling simplify_mask_or(), initialy used
for OP_AND, but with the effective mask corresponding to TRUNC(x, N):
$mask(N).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
|
|
The instructions SHL(AND(x, M), S) can be simplified into SHL(x, S)
if (M << S) == (-1 << S).
For example, code like:
unsigned foo(unsigned x)
{
return (x & 0x000fffff) << 12;
}
is now optimized into:
foo:
shl.32 %r3 <- %arg1, $12
ret.32 %r3
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The instructions SHL(AND(x, M), S) can be simplified to 0
if (M << S) == 0.
For example code like:
unsigned foo(unsigned x)
{
return (x & 0xfff00000) << 12;
}
is now simplified into:
foo:
ret.32 $0
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The instructions LSR(AND(x, M), S) are already simplified into
AND(LSR(x, S), (M >> S)) but only if AND(x, M) has a single user.
However, if (M >> S) == (-1 >> S), the AND part is redundant and the
whole can always directly be simplified into LSR(x, S).
For example, code like:
unsigned foo(unsigned x)
{
unsigned t = (x & 0xfffff000);
return ((t >> 12) ^ (x >> 12)) & t;
}
is now optimized into:
foo:
ret.32 $0
because (t >> 12) is simplified into (x >> 12).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The instructions LSR(AND(x, M), S) are already simplified into
AND(LSR(x, S), (M >> S)) but only if AND(x, M) has a single user.
However, if (M >> S) == 0, they can always directly be simplified to 0.
For example code like:
unsigned foo(unsigned x)
{
unsigned t = (x & 0x00000fff);
return (t >> 12) & t;
}
is now simplified into:
foo:
ret.32 $0
while previously it was:
foo:
and.32 %r2 <- %arg1, $0xfff
lsr.32 %r4 <- %r2, $12
and.32 %r6 <- %r4, %r2
ret.32 %r6
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The pattern LSR(AND(x, M), S) is already generically simplified into
((x >> S) & (M >> S)) but only if the sub-expression AND(x, M) is not
shared with some other expressions because the simplification modify it.
But for some special cases the expression can be simplified even if
the sub-expression is shared because the simplification doesn't need
to modify this AND(x, M) part.
Add the testcases for LSR and the incoming SHL.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The same simplifications done for LSR can be done for SHL
(with the appropriate mask).
For example, code like:
int foo(int a, int b)
{
return ((a & 0xfff00000) | b) << 12;
}
is now optimized into:
foo:
shl.32 %r5 <- %arg2, $12
ret.32 %r5
while previously it was:
foo:
and.32 %r2 <- %arg1, $0xfff00000
or.32 %r4 <- %r2, %arg2
shl.32 %r5 <- %r4, $12
ret.32 %r5
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If the effective mask (M) corresponding to OP(_, K) is different
than the combined mask (C & M), then the inner mask (C) can be
replaced by the smaller (C & M), giving:
OP((x | M'), K) with M' == (C & M).
For example, code like:
int foo(int x)
{
return (x | 0xfffffff0) & 0xfff;
}
is now simplified into:
foo:
or.32 %r2 <- %arg1, $0xff0
and.32 %r3 <- %r2, $0xfff
ret.32 %r3
while previously, the constant was not reduced:
foo:
or.32 %r2 <- %arg1, $0xfffffff0
and.32 %r3 <- %r2, $0xfff
ret.32 %r3
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In an expression like OP((x | C), K), if the effective mask (M)
corresponding to OP(_, K) is equal to the combined mask (C & M),
then the OR operation is unneeded and can be replaced by M itself,
giving: OP(M, K).
In mathematical terms:
0) ((x | C) & M) = ((x & M) | (C & M))
1) (C & M) = M
2) ((x & M) | (C & M)) = ((x & M) | M) = M
and so OP((x | C), K) -> OP(M, K).
For example, code like:
unsigned int foo(int x)
{
return (x | 7) & 2;
}
is now simplified into:
foo:
ret.32 $2
which previously was not optimized
foo:
or.32 %r2 <- %arg1, $7
and.32 %r3 <- %r2, $2
ret.32 %r3
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In an expression like OP((x | C), K), if the inner constant
(C) and the effective mask (M) corresponding to OP(_, K) are
disjoint ((C & M) == 0), then the OR with the constant is
unneeded and can be optimized away since the constant will be
'cleared' by the mask, giving: OP(x, K)
For example, code like:
int foo(int x)
{
return (x | 0xfffff000) & 0xfff;
}
is now optimized into:
foo:
and.32 %r3 <- %arg1, $0xfff
ret.32 %r3
while previously the OR mask was not optimized away:
foo:
or.32 %r2 <- %arg1, $0xfffff000
and.32 %r3 <- %r2, $0xfff
ret.32 %r3
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, in simplify_mask_or_and(), only the cases where
(M' & M) == 0 or (M' & M) == M are simplified. However, if the
combined mask (M' & M) is different than the original inner mask
(M'), this inner mask can be replaced by the smaller combined one,
giving: OP(((x & (M' & M)) | y), K).
For example, code like:
int foo(int x, int y)
{
return (((x & 0xfffffff0) | y) & 0xfff);
}
is now simplified into:
foo:
and.32 %r2 <- %arg1, $0xff0
or.32 %r4 <- %r2, %arg2
and.32 %r5 <- %r4, $0xfff
ret.32 %r5
while previously, the mask was not reduced:
foo:
and.32 %r2 <- %arg1, $0xfffffff0
...
Note: this is not a very effective simplification like directly
removing an instruction, nevertheless the smaller mask can
trigger other simplifications and may also be advantageous
for a subsequent code generation phase.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In an expression like this, if the inner mask (M') 'covers' all the
bits of the effective mask (M) corresponding to OP(_, K), IOW if
(M' & M) == M, then the inner mask (M') is unneeded and can be
optimized away, giving: OP((x | y), K).
For example, with this simplification, code like:
int foo(int x, int y)
{
return (((x & 0x0fffffff) | y) & 0xfff);
}
is now optimized into the minimal:
foo:
or.32 %r4 <- %arg1, %arg2
and.32 %r5 <- %r4, $0xfff
ret.32 %r5
while previously, the inner mask was not optimized away:
foo:
and.32 %r2 <- %arg1, $0xfffffff
or.32 %r4 <- %r2, %arg2
and.32 %r5 <- %r4, $0xfff
ret.32 %r5
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In simplify_mask_or(), two calls to simplify_mask_or_and() are
made: one for each operand of the OR instruction.
These two calls are guarded by a test checking the presence of
the AND instruction. If it is the case for the first operand,
the second operand is never considered for simplification,
even if no simplifications have been made.
Fix this by testing the return value of simplify_and_or_mask()
and let's do the second call if no simplifications could be
done in the first one.
For example, code like:
int foo(int x, int y, int a)
{
return ((a & 0xf000) | (x & y)) & 0x0fff;
}
was expectedly simplified into:
foo:
and.32 %r5 <- %arg1, %arg2
and.32 %r7 <- %r5, $0xfff
ret.32 %r7
while the same code with the operands of the OR swapped was not:
int foo(int x, int y, int a)
{
return ((x & y) | (a & 0xf000)) & 0x0fff;
}
resulted in the non-optimized:
foo:
and.32 %r3 <- %arg1, %arg2
and.32 %r5 <- %arg3, $0xf000
or.32 %r6 <- %r3, %r5
and.32 %r7 <- %r6, $0xfff
ret.32 %r7
but now the simplification can also be done:
foo:
and.32 %r3 <- %arg1, %arg2
and.32 %r7 <- %r3, $0xfff
ret.32 %r7
Note: it would be simpler to unconditionally do both calls
but this is unsafe because some of the concerned
instructions, needed in the second call, could have
been simplified away in the first one.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The extraction & insertion of bitfields is made of relatively
complex combinations of SHL/LSR/AND/OR and TRUNC/ZEXT/SEXT.
Add a few testcases showing the effectiveness of their
simplification and to catch possible future regressions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
There is a potential problem when the second side of the OR
is simplified away.
Add 2 testcases to catch possible regressions here.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* add simplification of TRUNC(TRUNC((x))
* add simplification of SHL(LSR(x), S), S)
|
|
The simplification of ZEXT(ZEXT(x)) was already added but
its dual, TRUNC(TRUNC(x)), was not.
Add it now.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The simplification of ((x << S) >> S) was already added
but its dual was forgotten.
Add it now.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add the testcase before doing the simplification.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add the testcase before doing the simplification.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The testcase was named with the wrong operation order
(inner op first instead of outer-first).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
This is especially usefull when simplifying code
accessing bitfields.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
This transformation is especially usefull when simplifying code
accessing bitfields or for other masking manipulations.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
This is a common pattern for bitfield simplification.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
If an expression is already a boolean (constant) expression,
converting it to a boolean expression is a no-op.
In this case, return early, this avoids to create intermediate
code that will need to be simplified away at some later stage.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since the OP_SETCC instructions can only return a 0 or a 1,
a subsequent AND mask can often be heavily simplified.
Simplify the mask value.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since the OP_SETCC instructions can only return a 0 or a 1,
a truncation won't change the value and the OP_SETCC
can be changed to directly return the extended size.
Remove the unneeded truncate.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since the OP_SETCC instructions can only return a 0 or a 1, a
sign-extension won't change the value if the size is wider than 1.
The OP_SETCC can thus be changed to directly return the extended size.
Remove the unneeded extension.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since the OP_SETCC instructions can only return a 0 or a 1,
a zero-extension won't change the value and the OP_SETCC
can be changed to directly return the extended size.
Remove the unneeded extension.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Convert the OP_TRUNC into an OP_AND with the appropriate mask.
This a greater chance to simplify with previous instructions
and to correspond to a register-size operand.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since the OP_SETCC instructions can only return a 0 or a 1,
a compare-not-equal-zero or compare-equal-1 is a no-op
if the operand have been masked with 1.
Remove the no-op comparison (if the size matches).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Conditional branches, or more exactly OP_CBR, can't accept
arbitrary expression as condition. it is required to have
an integer value.
Fix this by adding a comparison against zero.
|
|
Add some tests in preparation of some bug-fixing and
simplification in linearize_logical()linearize_conditional().
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* some simplifications of OP_SETNE & OP_SETEQ with 0 and 1
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
These two comparisons are no-ops when the operand is boolean.
Simplify them away.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A sign-extension has no effect on the result of a comparison
with 0 or with 1 when the original size is bigger than 1.
Optimize away the extension.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A zero-extension has no effect on the result of a comparison with 0 or 1.
Optimize away the extension.
Note: this simplification was already done but only when the
original size was 1 but it can be done for all sizes.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Add some basic testcase for these relatively common
simplification opportunities.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* several simplifications involving casts and/or bitfields
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* give a correct & sensible warning on negative or over-sized shifts.
* add conservative simplification of such shifts.
* do not optimize the meaningless shift:
* any shift with a negative count
* OP_ASRs with an over-sized shift count.
* try to give a correct negative/too-big error message during simplification.
* simplify chains of shifts.
* simplify ZEXT + ASR into ZEXT + LSR
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
After an OP_ZEXT, it is guaranteed that the sign bit is zero.
So, an arithmetic right-shift following an OP_ZEXT gives the
same result as an logical right-shift which is simpler to handle
and further optimize.
So, when preceded by an OP_ZEXT, replace OP_ASR by OP_LSR.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since an LSR with an in-range shift count will insert a zero in
the MSB, a subsequent ASR will be equivalent to an LSR of the same
count or equivalently, the combinaison the these two shifts is
equivalent to a single LSR with a shift count equal to the sum
of the two initial counts.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A shift of a shift (of the same kind) is equivalent to a single
shitf with a count equal to the sum of the two initial counts.
In addition:
* for LSRs & SHLs, if the sum is >= to the instruction size, then
the result is zero.
* for ASRs, if the sum is >= to the instruction size, then
the result is the same as a shift of a count of size - 1.
Implement these simplifications if both shift counts are in range.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Floating-point values are displayed using the printf
format "%Lf" but this is the format without exponent
(and with default precision of 6 digit).
However, by its nature, this format is very imprecise.
For example, *all* values smaller than 0.5e-6 are displayed
as "0.000000".
Improve this by using the "%Le" format which always use
an exponent and thus maximize the precision.
Note: ultimately, we should display them exactly, for
example by using "%La", but this will requires C99.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Like GCC, we don't want to touch over-sized ASR but over-sized
LSRs & SHLs degenerate to 0.
Add testcases covering all cases.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A sign-extension following a zero-extension can always
be simplified into a single zero-extension since the
intermediate value is guaranted to have a zero sign bit.
Simplify away such instructions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A zero-extension following another zero-extension can always
be simplified into a single zero-extension.
Simplify away such instructions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A sign-extension following another sign-extension can always
be simplified into a single sign-extension.
Simplify away such instructions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
An OP_AND with a constant value following a OP_ZEXT can often
be simplified:
* the constant mask can be made smaller
* the whole masking is redundant and can be removed.
Do these two simplifications depending on the initial mask
and the sizes of the instructions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A OP_AND with a constant value followed by a sign or zero-extension
can often be moved after the extension.
This is good because it can trigger the cancellation of the OP[ZS]EXT
with a preceding OP_TRUNC, a common situation with bitfields.
Note: This 'simplification' is less desirable if there is
no preceding OP_TRUNC, for example, in a sequence like:
and.32 %t <- %a, $mask
*ext.64 %r <- %t
If needed, this can be filtered out at some later stage.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
sparse contains some code to simplify an AND masking
followed by a truncating cast.
However, this simplification is problematic because it
doesn't keep the sizes consistent. For example, code like:
and.32 %r3 <- %r2, $0x7fff
trunc.16 %r4 <- (32) %r3
will be discarded with %r3 used in place of %r4.
This is correct in the mathematical sense but %r4 had a size
of 16 while %r3 has a size of 32, so using %r3 in place of %r4
will make the sizes inconsistent with unexpected consequences.
We can more or less fix this by using another transformation
that preserve the sizes:
trunc.16 %r3 <- (32) %r2
and.16 %r4 <- %r3, $0x7fff
which in itself doesn't optimize anything but:
1) the mask may be smaller
2) may trigger other simplification with the TRUNC or the AND.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
An OP_SEXT or a OP_ZEXT followed by a truncate to a size smaller
than the original size is unneeded, the same result can be obtained
by doing the truncate directly on the original value.
Dualy, an OP_SEXT or a OP_ZEXT followed by a truncate to a size greater
than the original size doesn't need the truncate, the same result can be
obtained by doing the extend directly on the original value.
Rearrange the inputs (src & orig_type) to bypass the unneeded operation.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
An OP_ZEXT/OP_SEXT following by a OP_TRUNC to the original size is a NOP.
Simplify away such OP_TRUNC.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
There is several difficulties some related to unclear semantic
of our IR instructions and/or type evaluation.
Add testcases trying to cover this area.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In the mathematical sense, the result of a left-shift by an
amount bigger than the operand size equals zero.
Do the corresponding simplification.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In the mathematical sense, the result of LSR by an amount bigger
than the operand size equals zero.
Do the corresponding simplification.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
This activate the new SSA conversion that will be used
to replace simplify_symbol_usage() which created invalid
SSA (phi-nodes were placed where the value was needed
instead of where the paths meet, also and partially related,
it was possible for a phi-node to have more operands/sources
than the BB it was in had parents).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
A few tests are added, some have been renamed to better
refect their purposes. Finally, some checks have been added
or tweaked.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In kill_dead_stores_bb(), after having scanned the current BB,
we may recurse into the parent BBs. But we must do this only if
we're sure that the store there may not be needed in another BB.
In other words, we must do this only if the current BB is the only
child of the parent.
However, if one of the parent has more than one child, instead
of trying the next parent, the function stops there and returns.
Furthermore, the check is done with an open loop instead of using
the helper bb_list_size().
Fix this by using bb_list_size() to check if the parent has only
one child, do the recursion if it is the case and try the next
parent if not.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, dead stores are removed after the initial
promotion of symbols to pseudos (simplify_one_symbol()).
But more complex promotions are done later during memops
simplification (simplify_loads()) and dead stores should be
removed there too but aren't.
Fix this by calling kill_dead_stores() after memops
simplification.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* cse: try the next pair instead of keeping the first instruction
* cse: compare casts only by kind a size, not by C type.
|
|
* optimize away OP_UTPTR & OP_PTRTU which are nops.
|
|
Now that all casts to or from a pointer are between a pointer
and a pointer-sized unsigned integer, from an optimization
PoV, they are all no-ops.
So, optimize them away at simplification time.
Note: casts between pointers (OP_PTRCAST) should also be
optimized away but the original type is used for a
number a things (for example in check_access()) and
can't be optimized away so simply (yet).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
validation/linear/* should not contain testcases that are
optimization dependent and validation/*.c should not contain
tests using 'test-linearize', only those using 'sparse'.
Move some cast-related testcases accordingly.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Now that OP_AND_BOOL and OP_OR_BOOL are always given boolean
operands, they are just a special case of 1 bit OP_AND & OP_OR.
To avoid to have to repeat CSE, simplification patterns, ...
better to generate plain OP_AND & OP_OR instead.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The function add_convert_to_bool() was added to give a
boolean context to logical expressions but did this onl
for integers.
Fix this for floating-point expressions by adding the proper
comparison to 0.0.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Because of C's integer promotion, in code like 'a == 0',
the operand 'a' must be promoted to int. So, if 'a' is
of type 'bool', it results in following linearization:
zext.32 %t <- (1) %a
setne.32 %r <- %t, $0
While this promotion is required by the standard at C level,
here, from an operational PoV, the zero-extension is unneeded
since the result will be the same without it.
Change this by simplifying away such zero-extensions.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
This is yet another simple identity with the potential to trigger
more simplifications.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|