Age | Commit message (Collapse) | Author | Files | Lines |
|
add_instruction_to_end() and insert_last_instruction() do exactly
the same thing but with the arguments in the opposite order.
So, replace add_instruction_to_end() by insert_last_instruction().
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Convert OP_SETVAL of a label into a new instruction: OP_LABEL.
There is 2 reasons to do this:
*) there is slightly less checking to be done in later phases
(since OP_SETVAL can be for labels but also strings)
*) OP_SETVAL is CSEd but this is largely useless because this
instruction is hashed on the expression's address and these
are (most) often not shared. With a separate instruction
for label expressions, their CSE is now OK because the hashing
is done on the BB.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
OP_SYMADDR take a single operand 'symbol' but this instruction
is very much like other unops and using the same operand's name
allow to avoid some special cases.
So, s/symbol/src/ for OP_SYMADDRs.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* do 'classical' SSA conversion (via the iterated dominance frontier).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The function bb_dominates() do a dominance query
on two BB but do this by walking the CFG tree.
Now that we have the dominance tree, we don't need anymore
to do this walk, we can use the dom tree itself.
This patch just replace the calls bb_dominates() by
domtree_dominates() since they have a slightly different
interface.
Note: in the next version, it'll be better/simpler to rename
domtree_dominates() by bb_dominates().
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
* cse: try the next pair instead of keeping the first instruction
* cse: compare casts only by kind a size, not by C type.
|
|
Now that cast instructions are more finely grained, it's
not needed to compare the original type of the casts,
only the original size can matter.
So, do not hash & compare the original types but only the
orignal sizes. This allow much more casts instructions to be
CSEed away.
Note: like noted in the code, even the original size shouldn't
matter as identical sources should implies identical
original sizes but this can't yet be guaranted.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Now that these instructions are not generated anymore,
we can remove all related code, defines and doc.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Let's suppose we have a few instructions: A, B, C & D,
all congruent insn_compare()
The current way the CSE is done is:
The first try will be try_to_cse(A,B). If this succeeds,
A & B have been merged (AB) and the next try will be done
with the next instruction:
try_to_cse(AB,C).
That's good. However, if it fails, the next try will be:
try_to_cse(A,C).
If this one also fails, the next one will be:
try_to_cse(A,D)
And if this one also fails, nothing else is done.
In other words, all attempts are done with A. If it happens that
A can't be eliminated (because it doesn't dominate B, C & D and
is not dominated by them), no eliminations are done because the
other pairs, not involving A are never tried.
Ideally, we should try all possible pairs: A-B, A-C & A-D but
also B-C, B-D & C-D. However this is quadratic and can be a bit
costly. An easy & cheap way to improve the current situation is,
in case of failure, to not reuse the original first instruction
but the other one. So instead of testing:
A-B, A-C, A-D, ...
we will test:
A-B, B-C, C-D, ...
with the result that an 'bad' instruction can't block anymore
the following pairs.
So, when try_to_cse() fails, do not retry by keeping the first
instructions, but retry using the second one.
In practice, this seems to work fairly well.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Casts to integer used to be done with only 2 instructions:
OP_CAST & OP_SCAST.
Those are not very convenient as they don't reflect the real
operations that need to be done.
This patch specialize these instructions in:
- OP_TRUNC, for casts to a smaller type
- OP_ZEXT, for casts that need a zero extension
- OP_SEXT, for casts that need a sign extension
- Integer-to-integer casts of the same size are considered as
a NOPs and are, in fact, never emitted.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently all casts to pointers are processed alike. This is
simple but rather unconvenient in later phases as this
correspond to different operations that obeys to different
rules and which later need extra checks.
Change this by using a specific instructions (OP_UTPTR) for
[unsigned] integer to pointers.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently all casts to pointers are processed alike.
This is simple but rather unconvenient as it correspond to
different operations that obeys to different rules and
which later need extra checks.
Change this by using a specific instructions (OP_UTPTR) for
unsigned integer to pointers.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
free_ptr_list() is a macro doing the typechecking.
It's thus not only unnneded but counter-productive to
cast its argument to the untyped 'struct ptr_list*'.
Remove these casts.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
The code for the inner optimization loop was in the same file
than the one for CSE. Now that the CSE have a well defined
interface, we can move this inner loop together with
the main optimization loop in optimize.c
This move make the code better structured and make it easier
to understand the optimization logic and make any experiment
or needed changes to it.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Now that the CSE have a well defined interface, we can
make it public, which will help us to untangle the CSE
part from the whole optimization logic.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, the same function doing the CSE's elimination phase
also contains the inner part of the optimization loop.
Moving the CSE elimination part in a seperate function will
allow to make a clear interface for CSE and move this inner
optimization loop out of the CSE file.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, the simplification of individual instructions
is done at the beginnig of the function doing the hashing
for CSE.
Separate the two help a little bit to understand the code
there and making any changes if needed.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Currently, we have OP_MULS & OP_MULU but unless it's full,
widening multiplication both must give exactly the same
result (the world run on 2's complement CPUs now, right?).
Also, the IR doesn't have widening multiplication but
only instruction where both operands and the result have
the same size.
So, since theer is no reasons to keep 2 instructions,
merge OP_MULS & OP_MULU into a single one: OP_MUL.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Before the introduction of OP_SETFVAL, floating-point were
created via OP_SETVAL whose CSE is done by comparing the
pointer of the corresponding expression without any
interpretation of this pointer.
As consequence, even if two OP_SETVAL have two identical
expressions (value), in most cases the corresponding pointers
are not identical, completly inhibiting the CSE of OP_SETVALs.
Fix the CSE of floating-point literals by directly using
the value given by the new OP_SETFVAL.
Note: to respect some of the subtilities of floating-point,
the equality comparison of two literals is not done on
the floating-point value itself but bit-by-bit on its
binary representation (as such we can continue to make the
distinction between +0.0 & -0.0, handle NaNs, ...).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Since commit "1c182507c (fix support of floating-point compare)",
CSE wasn't done anymore on floating-point compare.
Fix this by adding the two missing 'case OP_FPCMP ...'
Fixes: 1c182507c3981aa20193c68d7cfd32d750b571cf
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Floating-point arithmetic is quite different from the
arithmetic on integers or the one of real numbers.
In particular, most transformations, simplifications that can
be done on integers are invalid when done on floats.
For example:
- associativity doesn't hold
- distributivity doesn't hold
- comparison is tricky & complex
This is because (among others things):
- limited precision, rounding everywhere
- presence of signed zeroes
- presence of infinities
- presence of NaNs (signaling or quiet)
- presence of numbers without inverse
- several kind of exceptions.
Since they don't follow the same rules as their integer
counterpart, better to give them a specific opcode
instead of having to test the type of the operands at
each manipulation.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
In commit (51cfbc90a5 "fix: kill unreachable BBs after killing a child")
kill_unreachable_bbs() was called during simplification in order
to avoid creating fake cycles between phis and issuing bogus
"crazy programmer" warnings.
However, simplification is done via cse.c:clean_up_insns()
which is looping over all BBs while kill_unreachable_bbs()
loops over all BBs *and* can delete some of them which may
corrupt the looping in clean_up_insns().
Fix this by:
1) refuse to emit the "crazy programmer" warning if there
is a potential dead BB
2) move kill_unreachable_bbs() in the main cleanup loop
which will avoid nested ep->bbs loop.
Note: this solution is preferable to some others because
the "crazy programmer" condition happens very rarely.
It this thus better to delay this check than to call
kill_unreachable_bbs() preventively.
Note: the reproducer is one with very broken syntax but nothing
forbid the same situation to happen with a valid program.
Fixes: 51cfbc90a5e1462fcd624a1598ecd985a508a5d6
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
|
|
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Christopher Li <sparse@chrisli.org>
|
|
By definition a commutative operation give the same result if its
operands are exchanged.
CSE can use this property when comparing instructions to find
more equivalent instructions and thus eliminate more of them..
Implement this by special-casing commutative ops in insn_compare().
Note: this come for free when all instructions are properly
canonicalized, which is not yet the case.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Christopher Li <sparse@chrisli.org>
|
|
Instructions removed during CSE had their ->bb simply set to NULL
and the usage of their operand was not adjusted.
Fix that by calling kill_instruction().
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Christopher Li <sparse@chrisli.org>
|
|
cse_one_instruction() contains some code to be executed when
an OP_PHI is eliminated. This code is roughly equivalent to
what is done in kill_instruction() for OP_PHI but not complete
since it doesn't recursively try to kill others instructions
which are used only by this OP_PHI.
Fix this and avoid duplicated code by replacing this code by
a call to kill_instruction().
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Christopher Li <sparse@chrisli.org>
|
|
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamil Dudka <kdudka@redhat.com>
Signed-off-by: Christopher Li <sparse@chrisli.org>
|
|
Without this patch test-linearize fails on a simple example:
static void test(int i)
{
while (i) {
if (i)
test(0);
i++;
}
}
It generates a conditional jump depending on an uninitialized value
which is obviously not in the input code.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamil Dudka <kdudka@redhat.com>
Signed-off-by: Christopher Li <sparse@chrisli.org>
|
|
Profiling showed that Sparse spent much of its time doing CSE, which involved
traversing every bucket of insn_hash_table repeatedly. insn_hash_table had
64K buckets, but never more than a few dozen entries. Shrink insn_hash_table
to 256 entries. This makes Sparse 2-4 times faster for me.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
|
|
Signed-off-by: Josh Triplett <josh@freedesktop.org>
|
|
It's better than nothing, but I need to re-do cast linearization some
day. Right now we squirrell away the whole original type, but the fact
is, we only need to save away a few bits of "type of cast".
We need to know if it's sign-extending, and whether the source or
destination are floating point values. And the source size, of course.
But we don't really need to know the detailed original type.
Oh, well. This makes at least trivial things look much better.
|
|
This trivial patch remove the warning when I try to compile sparse
in with sparse.
Another warning I did not fix is:
../pre-process.c:554:18: warning: bad constant expression
The offending line is:
struct arg args[nargs];
|
|
error reporting.
|
|
This is OP_MUL/OP_DIV/OP_MOD/OP_SHR.
We actually do the constant simplifications still wrong, but
now the information is all there.
|
|
(symbol addresses).
They are pretty different. Symbol addresses have special meaning during
various phases, from symbol simplification to CSE.
|
|
It looked good on paper. Not so good when actually trying
to generate code.
|
|
We already did this, we just didn't have a visible prototype.
|
|
We don't care _where_ a PHI-node is, since the real location
is at the phi sources. So unlike other instructions, we do not
bother with any dominance analysis - we can just combine them
blindly across basic block borders.
It might make sense to totally disconnect the PHI nodes from
the flow graph to make this lack of positional information
even more explicit. That might also allow us to combine more
basic blocks.
|
|
This allows us to always see which pseudos are nonlocally affected
by the phi source.
We can only do this after the instruction flow is fixed, together
with the OP_DEATHNOTE phase.
|
|
Umm. The compiler even warned about it, I don't see how I
could miss it. I'm a maroon.
|
|
This was originally done so that "struct instruction" would be
smaller, but it has since grown anyway (for the memops), so splitting
OP_SEL into two instructions no longer makes sense, and makes it
harder on CSE.
|
|
regular CSE.
The CSE case is totally different: since both children use the same
pseudos, there can be no question that a common parent wouldn't
have that pseudo live.
|
|
We should drop symbol types by now, and base things on just raw
sizes. Anything else just doesn't work from a CSE standpoint.
Some things may care about actual types later, notably the type-
based alias analysis. Those types still exist in the cast nodes.
Also, make the printout be more readable.
|
|
CSE it together with the OP_SETCC that goes with it ;)
|
|
Hey, it's simple, and it happens.
|
|
...but we can't merge phi-sources after packing. Or rather, we
could, but we'll have to be more careful about not re-ordering
them, which we aren't. Sort it out later.
|
|
It's a bit more simple-minded than the symbol simplification,
and it can be more costly. So we start off with the (cheap)
symbol simplification algorithm, and then use this new more
generic phase later.
This allows us to remove extra loads (and, in theory, stores,
but the dead store elimination is so simple-minded right now
that it's effectively useless).
|
|
Make sure to clear them when turning operations into NOP's. And when
doing OP_LOAD->OP_PHI conversion, just re-use the target pseudo,
which means that we don't need to unnecessarily move all the uses
over.
|
|
repeating CSE itself.
In particular, some symbol address simplifications imply that
we should repeat symbol simplification, since things may have
improved.
|
|
Also, allow marking a pseudo unused by setting it to VOID,
which just disables further renaming of that use. This is
needed to make phi-source instructions (that can be shared
among multiple phi-nodes thanks to CSE) be collectable.
We used to just mark the phi source dead, but that was
wrong, since _another_ phi node might be using it.
|
|
|
|
It may be a nice readability thing, but with the usage lists
we really mustn't change a location that may be on a list.
|
|
that we keep track of usage notes.
The usage trackign tracks the address of the phi pseudo, so
sorting the list would need to update that too.
This sucks, because it means we can't turn the phi-nodes into
canonical format, and thus not CSE phi-nodes as effectively.
|
|
They's _slightly_ special in that they depend on their bb,
so we can only combine them within one basic block, but that's
easily handled by making the bb be part of the hashing and
comparison.
|
|
|
|
The "half-way" point was making the phi be an instruction,
and some of that half-way stuff remained.
|
|
The instruction contains everything "struct phi" had, namely
source bb and pseudo. And generating a new pseudo means that
we not only have the pointer to the instruction that defines
the phi-node (in pseudo->def), we also end up with automatic
phi-node usage tracking (pseudo->users).
The changes may look large, but almost all of this is just
direct search-and-replace of the old phi-node usage.
|
|
We can turn trivial phi-nodes into OP_SEL instructions instead,
and generally improve code flow.
This doesn't remove the empty blocks this results in yet, so
it's not quite as dramatic as it could be, but it does the hard
work. Empty block removal is trivial.
|
|
|
|
instruction.
|
|
|
|
|
|
That is - simple instructions with no users.
|
|
Show the source that the thing was replaced with.
|
|
Instead of using the "convert_load_insn()" that converts them to
OP_LNOP's, which ends up confusing instruction printout horribly.
|
|
This makes us able to CSE stuff where one subexpression
directly dominates another.
But we still don't do any fancy code motion in the non-
dominance case.
|
|
|
|
It ain't very smart yet, but give it time.
In particular, right now it gathers information for the
whole function, but it only does the actual replacement
if the instructions are in the same basic block.
|