aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/validation/linear
AgeCommit message (Collapse)AuthorFilesLines
2020-10-20Merge branch 'bf-sign' into nextLuc Van Oostenryck4-5/+44
* teach sparse about -funsigned-bitfields * let plain bitfields default to signed
2020-10-08fix evaluation of pointer to bool conversionsLuc Van Oostenryck1-1/+0
The pointer to bool conversion used an indirect intermediate conversion to an int because the pointer was compared to 0 and not to a null pointer. The final result is the same but the intermediate conversion generated an unneeded OP_PTRTOU instruction which made some tests to fail. Fix this by directly comparing to a null pointer of the same type as the type to convert. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-09-16teach sparse about -funsigned-bitfieldsLuc Van Oostenryck4-5/+44
Currently, Sparse treats 'plain' bitfields as unsigned. However, this is this is inconsistent with how non-bitfield integers are handled and with how GCC & clang handle bitfields. So, teach sparse about '-funsigned-bitfields' and by default treat these bitfields are signed, like done by GCC & clang and like done for non-bitfield integers. Also, avoid plain bitfields in IR related testcases. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-09-07builtin: teach sparse to linearize __builtin_fma()Luc Van Oostenryck1-0/+19
The support for the linearization of builtins was already added for __builtin_unreachable() but this builtin has no arguments and no return value. So, to complete the experience of builtin linearization, add the linearization of __builtin_fma(). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-17fix evaluate_ptr_add() when sizeof(offset) != sizeof(pointer)Luc Van Oostenryck2-0/+173
For a binary op, both sides need to be converted to the resulting type of the usual conversion. For a compound-assignment (which is equivalent to a binary op followed by an assignment), the LHS can't be so converted since its type needs to be preserved for the assignment, so only the RHS is converted at evaluation and the type of the RHS is used at linearization to convert the LHS. However, in the case of pointer arithmetics, a number of shortcuts are taken and as a result additions with mixed sizes can be produced producing invalid IR. So, fix this by converting the RHS to the same size as pointers, as done for 'normal' binops. Note: On 32-bit kernel, this patch also removes a few warnings about non size-preserving casts. It's fine as these warnings were designed for when an address would be stored in an integer, not for storing an offset like it's the case here. Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-11bug-assign-op0.c: fix test on 32-bit buildsRamsay Jones1-5/+5
This test was failing on 32-bit because it made the assumption that 'long' is always 64-bit. Fix this by using 'long long' when 64-bit is needed. Fixes 36a75754ba161b4ce905390cf5b0ba9b83b34cd2 Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-06shift-assign: restrict shift count to unsigned intLuc Van Oostenryck1-1/+0
After the RHS of shift-assigns had been integer-promoted, both gcc & clang seems to restrict it to an unsigned int. This only make a difference when the shift count is negative and would it make it UB. Better to have the same generated code, so make the same here. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-06shift-assign: fix linearization of shift-assignLuc Van Oostenryck2-2/+0
The result of a shift-assigns has the same type as the left operand but the shift itself must be done on the promoted type. The usual conversions are not done for shifts. The problem is that this promoted type is not stored explicitly in the data structure. This is specific to shift-assigns because for other operations, for example add-assign, the usual conversions must be done and the resulting type can be found on the RHS. Since at linearization, the LHS and the RHS must have the same type, the solution is to cast the RHS to LHS's promoted type during evaluation. This solve a bunch of problems with shift-assigns, like doing logical shift when an arithmetic shift was needed. Fixes: efdefb100d086aaabf20d475c3d1a65cbceeb534 Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-06shift-assign: add more testcases for bogus linearizationLuc Van Oostenryck2-0/+374
The usual conversions must not be applied to shifts. This causes problems for shift-assigns. So, add testcases for all combinations of size and signedness. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-07-06testsuite: add testcase for bogus linearization of >>= & /=Luc Van Oostenryck1-0/+115
When doing a shift operation, both arguments are subjected to integer promotion and the type of the result is simply the type of the promoted left operand. Easy. But for a shift-assignment, things are slightly more complex: -) 'a >>= n' should be equivalent to 'a = a >> n' -) but the type of the result must be the type of the left operand *before* integer promotion. Currently, the linearization code use the type of the right operand to infer of the type of the operation. But simply changing the code to use the type of the left operand will also be wrong (for example for signed/unsigned divisions). Nasty. For example, the following C code: int s = ...; s >>= 11U; is linearized as a logical shift: lsr.32 %r2 <- %arg1, $11 while, of course it's an arithmetic shift that is expected: asr.32 %r2 <- %arg1, $11 So, add a testcase for these. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21bad-goto: check declaration of label expressionsLuc Van Oostenryck1-1/+0
Issue an error when taking the address of an undeclared label and mark the function as improper for linearization since the resulting IR would be invalid. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21bad-goto: jumping inside a statement expression is an errorLuc Van Oostenryck2-2/+0
It's invalid to jump inside a statement expression. So, detect such jumps, issue an error message and mark the function as useless for linearization since the resulting IR would be invalid. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21bad-goto: catch labels with reserved namesLuc Van Oostenryck1-1/+0
If a reserved name is used as the destination of a goto, its associated label won't be valid and at linearization time no BB will can be created for it, resulting in an invalid IR. So, catch such gotos at evaluation time and mark the function to not be linearized. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21bad-goto: reorganize testcases and add some moreLuc Van Oostenryck7-5/+92
Reorganize the testcases related to the 'scope' of labels and add a few new ones. Also, some related testcases have some unreported errors other than the features being tested. This is a problem since such tescases can still fail after the feature being tested is fixed or implemented. So, fix these testcases or split them so that they each test a unique feature. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21bad-goto: add testcases for linearization of invalid labelsLuc Van Oostenryck1-0/+19
A goto to a reserved or a undeclared label will generate an IR with a branch to a non-existing BB. Bad. Add a testcase for these. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21bad-goto: add testcase for 'jump inside discarded expression statement'Luc Van Oostenryck1-0/+28
A goto done into an piece of code discarded at expand or linearize time will produce an invalid IR. Add a testcase for it. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-03-20teach sparse to linearize __builtin_unreachable()Luc Van Oostenryck2-2/+0
__builtin_unreachable() is one of the builtin that shouldn't be ignored at IR level since it directly impact the CFG. So, add the infrastructure put in place in the previous patch to generate the OP_UNREACH instruction instead of generating a call to a non-existing function "__builtin_unreachable()". Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-03-20add an implicit __builtin_unreachable() for __noreturnLuc Van Oostenryck1-1/+0
The semantic of a __noreturn function is that ... it doesn't return. So, insert an instruction OP_UNREACH after calls to such functions. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-03-20add testcases for OP_UNREACHLuc Van Oostenryck3-7/+58
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-10-09"graph" segfaults on top-level asmLuc Van Oostenryck1-0/+1
The "graph" binary segfaults on this input: asm(""); with gdb saying (edited for clarity): Program received signal SIGSEGV, Segmentation fault. in graph_ep (ep=0x7ffff7f62010) at graph.c:52 (gdb) p ep->entry $1 = (struct instruction *) 0x0 Sadly, the commit that introduced this crash: 15fa4d60e ("topasm: top-level asm is special") was (part of a bigger series) meant to fix crashes because of such toplevel asm statements. Toplevel ASM statements are quite abnormal: * they are toplevel but anonymous symbols * they should be limited to basic ASM syntax but are not * they are given the type SYM_FN but are not functions * there is nothing to evaluate or expand about it. These cause quite a few problems including crashes, even before the above commit. So, before handling them more correctly and instead of adding a bunch of special cases here and there, temporarily take the more radical approach of stopping to add them to the list of toplevel symbols. Fixes: 15fa4d60ebba3025495bb34f0718764336d3dfe0 Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Analyzed-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-09-27asm: linearization of output memory operands is differentLuc Van Oostenryck1-1/+0
ASM memory operands are considered by GCC as some kind of implicit reference. Their linearization should thus not create any storage statement: the storage is done by the ASM code itself. Adjust the linearization of such operands accordingly. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-09-27asm: add test evaluation, expansion & linearization of ASM operandsLuc Van Oostenryck1-0/+24
ASM statements are quite complex. Add some tests to catch some potential errors. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-09-26expand: add missing expansion of compound literalsLuc Van Oostenryck1-1/+0
Compound literals, like all other expressions, need to be be expanded before linearization, but this is currently not done. As consequence, some builtins are unexpectedly still present, same for EXPR_TYPEs, ... with error messages like: warning: unknown expression at linearization. Fix this by adding the missing expansion of compound literals. Note: as explained in the code itself, it's not totally clear how compound literals can be identified after evaluation. The code here consider all anonymous symbols with an initializer as being a compound literal. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-02-04target.c: ignore -m64 on archs where int32_t is a longLuc Van Oostenryck9-0/+9
If the flag '-m64' is used on a 32-bit architecture/machine having int32_t set to 'long', then these int32_t are forced to 64-bit ... So, ignore the effect of -m64 on these archs and ignore '64-bit only' tests on them. Reported-by: Uwe Kleine-König <uwe@kleine-koenig.org> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Tested-by: Uwe Kleine-König <uwe@kleine-koenig.org>
2018-09-08fix linearization of non-constant switch-casesLuc Van Oostenryck1-1/+0
The linearization of switches & cases makes the assumption that the expressions for the cases are constants (EXPR_VALUE). So, the corresponding values are dereferenced without checks. However, if the code uses a non-constant case, this dereference produces a random value, probably one corresponding to some pointers belonging to the real type of the expression. Fix this by checking during linearization the constness of the expression and ignore the non-constant ones. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-08add testcase for non-constant switch-caseLuc Van Oostenryck1-0/+38
Switches with non-constant cases are currently linearized using as value the bit pattern present in the expression, creating more or less random multijmps. Add a basic testcase to catch this. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06Merge branches 'missing-return' and 'fix-logical-phi' into tipLuc Van Oostenryck13-90/+288
* fix linearization/SSA when missing a return * fix linearization/SSA of (nested) logical expressions
2018-09-06fix linearization of nested logical exprLuc Van Oostenryck4-93/+90
The linearization of nested logical expressions is not correct regarding the phi-nodes and their phi-sources. For example, code like: extern int a(void); int b(void); int c(void); static int foo(void) { return (a() && b()) && c(); } gives (optimized) IR like: foo: phisrc.32 %phi1 <- $0 call.32 %r1 <- a cbr %r1, .L4, .L3 .L4: call.32 %r3 <- b cbr %r3, .L2, .L3 .L2: call.32 %r5 <- c setne.32 %r7 <- %r5, $0 phisrc.32 %phi2 <- %r7 br .L3 .L3: phi.32 %r8 <- %phi2, %phi1 ret.32 %r8 The problem can already be seen by the fact that the phi-node in L3 has 2 operands while L3 has 3 parents. There is no phi-value for L4. The code is OK for non-nested logical expressions: linearize_cond_branch() takes the sucess/failure BB as argument, generate the code for those branches and there is a phi-node for each of them. However, with nested logical expressions, one of the BB will be shared between the inner and the outer expression. The phisrc will 'cover' one of the BB but only one of them. The solution is to add the phi-sources not before but after and add one for each of the parent BB. This way, it can be guaranteed that each parent BB has its phisrc, whatever the complexity of the sub- expressions. With this change, the generated IR becomes: foo: call.32 %r2 <- a phisrc.32 %phi1 <- $0 cbr %r2, .L4, .L3 .L4: call.32 %r4 <- b phisrc.32 %phi2 <- $0 cbr %r4, .L2, .L3 .L2: call.32 %r6 <- c setne.32 %r8 <- %r6, $0 phisrc.32 %phi3 <- %r8 br .L3 .L3: phi.32 %r1 <- %phi1, %phi2, %phi3 ret.32 %r1 Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06add tests for nested logical exprLuc Van Oostenryck1-0/+49
Nested logical expressions are not correctly linearized. Add a test for all possible combinations of 2 logical operators. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06fix ordering of phi-node operandLuc Van Oostenryck2-5/+4
The linearization of logical '&&' create a phi-node with its operands in the wrong order relatively to the parent BBs. Switch the order of the operands for logical '&&'. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06add testcases for wrong ordering in phi-nodesLuc Van Oostenryck4-0/+55
In valid SSA there is a 1-to-1 correspondance between each operand of a phi-node and the parents BB. However, currently, this is not always respected. Add testcases for the known problems. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06return nothing only in void functionsLuc Van Oostenryck1-1/+0
Currently, the code for the return is only generated if the effectively return a type or a value with a size greater than 0. But this mean that a non-void function with an error in its return expression is considered as a void function for what the generated IR is concerned, making things incoherent. Fix this by using the declared type instead of the type of the return expression. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06use UNDEF for missing returnsLuc Van Oostenryck5-5/+0
If a return statement is missing in the last block, the generated IR will be invalid because the number of operands in the exit phi-node will not match the number or parent BBs. Detect this situation and insert an UNDEF for the missing value. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06topasm: top-level asm is specialLuc Van Oostenryck1-0/+7
Top-level ASM statements are parsed as fake anonymous functions. Obviously, they have few in common with functions (for example, they don't have a return type) and mixing the two makes things more complicated than needed (for example, to detect a top-level ASM, we had to check that the corresponding symbol (name) had a null ident). Avoid potential problems by special casing them and return early in linearize_fn(). As consequence, they now don't have anymore an OP_ENTRY as first instructions and can be detected by testing ep->entry. Note: It would be more logical to catch them even erlier, in linearize_symbol() but they also need an entrypoint and an active BB so that we can generate the single statement. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-05add testcases for missing return in last blockLuc Van Oostenryck6-0/+97
In this case the phi-node created for the return value ends up with a missing operand, violating the semantic of the phi-node: map one value with each predecessor. Add testcases for these missing returns. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-01fix linearization of unreachable switch (with reachable label).Luc Van Oostenryck1-1/+0
An unreachable/inactive switch statement is currently not linearized. That's nice because it avoids to create useless instructions. However, the body of the statement can contain a label which can be reachable. If so, the resulting IR will contain a branch to an unexisting BB. Bad. For example, code like: int foo(int a) { goto label; switch(a) { default: label: break; } return 0; } (which is just a complicated way to write: int foo(int a) { return 0; }) is linearized as: foo: br .L1 Fix this by linearizing the statement even if not active. Note: it seems that none of the other statements are discarded if inactive. Good. OTOH, statement expressions can also contains (reachable) labels and thus would need the same fix (which will need much more work). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-01add tescase for unreachable label in switchLuc Van Oostenryck1-0/+20
or more exactly, an unreachable switch statement but containing a reachable label. This is valid code but is curently wrongly linearized. So, add a testcase for it. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-25Merge branch 'ssa' into tipLuc Van Oostenryck5-60/+90
* do 'classical' SSA conversion (via the iterated dominance frontier). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-07fix instruction size & type in linearize_inc_dec()Luc Van Oostenryck2-68/+75
If the ++ or -- operator is used on a bitfield, the addition or subtraction is done with the size of the bitfield. So code like: struct { int f:3; } s; ... s->f++; will generate intermediate code like: add.3 %r <- %a, $1 This is not incorrect from the IR point of view but CPUs have only register-sized instructions, like 'add.32'. So, these odd-sized instruction have one or two implicit masking/extend that should better make explicit. Fix this by casting to and from the base type when these operators are used on bitfields. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06limit the mask used for bitfield insertionLuc Van Oostenryck1-6/+6
The mask used for bitfield insertion is as big as the integers used internally by sparse. Elsewhere in the code, constants are always truncated to the size of the instructions using them. It's also displaying concerned instructions oddly. For example: and.32 %r2 <- %r1, 0xfffffffffffffff0 Fix this by limiting the mask to the size of the instruction. Fixes: a8e1df573 ("bitfield: extract linearize_bitfield_insert()") Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06simplify linearize_logical()Luc Van Oostenryck1-92/+68
The linearized code for logical expressions looks like: .Lc ... condition 1 ... cbr %c, .L1, .L2 .L1 %phisrc %phi1 <- $1 br .Lm .L2 ... condition 2 ... %phisrc %phi2 <- %r br .Lm .Lm %phi %r <- %phi1, %phi2 But .L1 can easily be merged with .Lc: .Lc ... condition 1 ... %phisrc %phi1 <- $1 cbr %c, .Lm, .L2 .L2 ... condition 2 ... %phisrc %phi2 <- %r br .Lm .Lm %phi %r <- %phi1, %phi2 Do this simplification which: * creates less basic blocks & branches * do at linearization time a simplification not done later. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06expand linearize_conditional() into linearize_logical()Luc Van Oostenryck1-127/+111
linearize_logical() call linearize_conditional() but needs additional tests there and generate code more complicated than needed. Change this by expanding the call to linearize_conditional() and make the obvious simplification concerning the shortcut expressions 0 & 1. Also, removes the logical-specific parts in linearize_conditional(), since there are now unneeded. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06fix linearize_conditional() for logical opsLuc Van Oostenryck1-1/+0
The function linearize_conditional(), normaly used for conditionals (c ? a : b) is also used to linearize the logical ops || and &&. For conditionals, the type evaluation ensure that both LHS & RHS have consistent types. However, this is not the case when used for logical ops. This creates 2 separated but related problems: * the operands are not compared with 0 as required by the standard (6.5.13, 6.5.14). * both operands can have different, incompatible types and thus it's possible to have a phi-node with sources of different, incompatible types, which doesn't make sense. Fix this by: * add a flag to linearize_conditional() telling if it's used for a conditional or for a logical op. * when used for logical ops: * first compare the operands againts zero * convert the boolean result to the expression's type. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06conditional branches can't accept arbitrary expressionsLuc Van Oostenryck1-5/+5
Conditional branches, or more exactly OP_CBR, can't accept arbitrary expression as condition. it is required to have an integer value. Fix this by adding a comparison against zero.
2018-08-04add testcase for linearize_logical()Luc Van Oostenryck1-0/+300
Add some tests in preparation of some bug-fixing and simplification in linearize_logical()linearize_conditional(). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-25Merge branch 'optim-cast' into tipLuc Van Oostenryck3-0/+57
* several simplifications involving casts and/or bitfields Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-24use "%Le" to display floatsLuc Van Oostenryck2-13/+13
Floating-point values are displayed using the printf format "%Lf" but this is the format without exponent (and with default precision of 6 digit). However, by its nature, this format is very imprecise. For example, *all* values smaller than 0.5e-6 are displayed as "0.000000". Improve this by using the "%Le" format which always use an exponent and thus maximize the precision. Note: ultimately, we should display them exactly, for example by using "%La", but this will requires C99. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-23add testcases for casts & bitfield insertion/extractionLuc Van Oostenryck3-0/+57
There is several difficulties some related to unclear semantic of our IR instructions and/or type evaluation. Add testcases trying to cover this area. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-01testsuite: improve mem2reg testcasesLuc Van Oostenryck1-25/+0
A few tests are added, some have been renamed to better refect their purposes. Finally, some checks have been added or tweaked. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-01testsuite: reorganize tests for compound literalsLuc Van Oostenryck3-0/+55
Split the existing test in 2 as it contains 2 different cases. Also move the test to 'linear/' subdir. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-29cast: reorganize testcases for cast optimizationLuc Van Oostenryck1-405/+0
validation/linear/* should not contain testcases that are optimization dependent and validation/*.c should not contain tests using 'test-linearize', only those using 'sparse'. Move some cast-related testcases accordingly. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-26cast: simplify TRUNC + ZEXT to ANDLuc Van Oostenryck1-106/+0
A truncation followed by a zero-extension to the original size, which is produced when loading a storing bitfields, is equivalent to a simple AND masking. Often, this AND can then trigger even more optimizations. So, replace TRUNC + ZEXT instructions by the equivalent AND. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: keep instruction sizes consistentLuc Van Oostenryck2-11/+189
The last instruction of linearize_load_gen() ensure that loading a bitfield of size N results in a object of size N. Also, we require that the usual binops & unops use the same type on their operand and result. This means that before anything can be done on the loaded bitfield it must first be sign or zero- extended in order to match the other operand's size. The same situation exists when storing a bitfield but there the extension isn't done. We can thus have some weird code like: trunc.9 %r2 <- (32) %r1 shl.32 %r3 <- %r2, ... where a bitfield of size 9 is mixed with a 32 bit shift. Avoid such mixing of size and always zero extend the bitfield before storing it (since this was the implicitly desired semantic). The combination TRUNC + ZEXT can then be optimised later into a simple masking operation. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: specialize integer castsLuc Van Oostenryck5-99/+97
Casts to integer used to be done with only 2 instructions: OP_CAST & OP_SCAST. Those are not very convenient as they don't reflect the real operations that need to be done. This patch specialize these instructions in: - OP_TRUNC, for casts to a smaller type - OP_ZEXT, for casts that need a zero extension - OP_SEXT, for casts that need a sign extension - Integer-to-integer casts of the same size are considered as a NOPs and are, in fact, never emitted. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: make casts from pointer always size preservingLuc Van Oostenryck1-84/+86
Currently casts from pointers can be done to any integer type. However, casts to (or from) pointers are only meaningful if it preserves the value and thus done between same-sized objects. To avoid to have to worry about sign/zero extension while doing casts to pointers it's good to not have to deal with such casts. Do this by doing first a cast to an unsigned integer of the same size as a pointer and then, if needed, doing to cast to the final type. As such we have only to support pointer casts to unsigned integers of the same size and on the other hand we have the generic integer-to-interger casts we to support anyway. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: add support for -Wpointer-to-int-castLuc Van Oostenryck1-1/+1
It's relatively common to cast a pointer to an unsigned long, for example to make some bit operations. It's much less sensical to cast a pointer to an integer smaller (or bigger) than a pointer is. So, emit a diagnostic for this, under the control of a new warning flag: -Wpointer-to-int-cast. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: specialize cast from pointersLuc Van Oostenryck4-4/+42
Currently all casts to pointers are processed alike. This is simple but rather unconvenient in later phases as this correspond to different operations that obeys to different rules and which later need extra checks. Change this by using a specific instructions (OP_UTPTR) for [unsigned] integer to pointers. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: make pointer casts always size preservingLuc Van Oostenryck1-30/+32
Currently casts to pointers can be done from any integer types. However, casts to (or from) pointers are only meaningful if value preserving and thus between objects of the same size. To avoid to have to worry about sign/zero extension while doing casts to pointers it's good to only have to deal with the value preserving ones. Do this by doing first, if needed, a cast an integer of the same size as a pointer before doing the cast to a pointer. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: specialize casts from unsigned to pointersLuc Van Oostenryck1-5/+5
Currently all casts to pointers are processed alike. This is simple but rather unconvenient as it correspond to different operations that obeys to different rules and which later need extra checks. Change this by using a specific instructions (OP_UTPTR) for unsigned integer to pointers. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: specialize floats to integer conversionLuc Van Oostenryck3-9/+11
Currently, casts from floats to integers are processed like integers (or any other type) to integers. This is simple but rather unconvenient as it correspond to different operations that obeys to different rules and which later need extra checks. Change this by directly using specific instructions: - FCVTU for floats to unsigned integers - FCVTS for floats to signed integers Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: handle NO-OP castsLuc Van Oostenryck1-0/+15
Some casts, the ones which doesn't change the size or the resulting 'machine type', are no-op. Directly simplify away such casts. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: specialize FPCAST into [USF]CVTFLuc Van Oostenryck1-10/+10
Currently, all casts to a floating point type use OP_FPCAST. This is maybe simple but rather uncovenient as it correspond to several quite different operations that later need extra checks. Change this by directly using different instructions for the different cases: - FCVTF for float-float conversions - UCVTF for unsigned integer to floats - SCVTF for signed integer to floats and reject attempts to cast a pointer to a float. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23cast: reorg testcases related to castsLuc Van Oostenryck6-0/+858
* merge the tests about implicit & explicit casts in a single file as there was a lot of redundancy. * shuffle the tests to linear/ or optim/ Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-16testsuite: fix missing returnLuc Van Oostenryck1-8/+8
Some non-void functions in the testcases miss a return. Add the missing return or make the function as returning void. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-05-06use function-like syntax for __range__Luc Van Oostenryck1-0/+31
One of sparse's extension to the C language is an operator to check ranges. This operator takes 3 operands: the expression to be checked and the bounds. The syntax for this operator is such that the operands need to be a 3-items comma separated expression. This is a bit weird and doesn't play along very well with macros, for example. Change the syntax to a 3-arguments function-like operator. NB. Of course, this will break all existing uses of this extension not using parenthesis around the comma expression but there doesn't seems to be any. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-03-18fix-return: remove special case for single returnLuc Van Oostenryck2-27/+3
During the linearization of a function, returns are directly linearized as phi-sources and the exit BB contains the corresponding phi-node and the unique OP_RET. There is also a kind of optimization that is done if there is only a single a return statement and thus a single phi-source: the phi-source and the phi-node is simply ignored and the unique value is directly used by the OP_RET instruction. While this optimization make sense it also has some cons: - the phi-node and the phi-source are created anyway and will need to be removed during cleanup. - the corresponding optimization need to be done anyway during simplification - it's only a tiny special case which save very litte. So, keep things simple and generic and leave this sort of simplification for the cleanup/simplification phase. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-03-11testsuite: fix problem with double-escaping in patternsLuc Van Oostenryck8-11/+11
Since the patterns in the testcases are evaluated in the shell script, the backslash used to escape characters special to the pattern need itself to be escaped. Theer is a few cases where it wasn't done so, partly because 'format -l' gave a single escape in its template. Fix all occurences neededing this double-escape as well as the 'format -l' template. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-01-02Merge branches 'fix-expand-bitfield-deref', 'fix-fpops-cse', 'null-expr', ↵Luc Van Oostenryck10-0/+262
'size-unsized-arrays' and 'master' into tip
2017-12-28add more testcases for function designator dereferenceLuc Van Oostenryck1-0/+13
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21fix expansion of constant bitfield dereferenceLuc Van Oostenryck1-1/+0
During the expansion of a dereference, it's checked if the initializer corrresponding to the offset we're interested in is a constant. If it's the case, the dereference can be avoided and the constant given as initializer can be used instead. However, it's not enough to check for the offset since, for bitfields there are (usualy) several distinct fields at the same offset. Currently, the first initializer matching the offset is selected and, if a constant, its value is used for the result of the dereferencing of the whole structure. Fix this by refusing such expansion if the constant value correspond to a bitfield. Reported-by: Dibyendu Majumdar <mobile@majumdar.org.uk> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21add testcase for constant bitfield dereferenceLuc Van Oostenryck1-0/+28
Reported-by: Dibyendu Majumdar <mobile@majumdar.org.uk> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21Merge branches 'deref-fun-ptr' and 'deref-base-type' into tipLuc Van Oostenryck2-0/+62
2017-12-21dereference of a function is a no-opLuc Van Oostenryck4-4/+0
For the '*' operator and functions, the C standard says: "If the operand points to a function, the result is a function designator; ... If the operand has type ‘pointer to type’, the result has type ‘type’". but also (C11 6.3.2.1p4): "(except with 'sizeof' ...) a function designator with type ‘function returning type’ is converted to an expression that has type ‘pointer to function returning type’". This means that in dereferencement of a function-designator is a no-op since the resulting expression is immediately back converted to a pointer to the function. The change effectively drop any dereferencement of function types during their evaluation. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21add testcases for multiple deref of callsLuc Van Oostenryck4-4/+19
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21fix linearize (*fun)()Luc Van Oostenryck3-3/+0
A function call via a function pointer can be written like: fp(), (fp)() or (*fp)() In the latter case the dereference is unneeded but legal and idiomatic. However, the linearization doesn't handle this unneeded deref and leads to the generation of a load of the pointer: int foo(int a, int (*fun)(int)) { (*fun)(a); } gives something like: foo: load %r2 <- 0[%arg2] call.32 %r3 <- %r2, %arg1 ret.32 %r3 This happens because, at linearization, the deref is dropped but only if the sub-expression is a symbol and the test for node is not done. Fix this by using is_func_type() to test the type of all call expressions. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21add testcases for the linearization of callsLuc Van Oostenryck7-0/+179
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21fix: evaluate_dereference() unexamined base typeLuc Van Oostenryck2-2/+0
Examination of a pointer type doesn't examine the corresponding base type (this base type may not yet be complete). So, this examination must be done later, when the base type is needed. However, in some cases it's possible to call evaluate_dereference() while the base type is still unexamined. Fix this by adding the missing examine_symbol_type() on the base type. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21add testcases for unexamined base typeLuc Van Oostenryck2-0/+64
evaluate_dereference() lacks an explicit examination of the base type. Most of the time, the base type has already been examined via another path, but in some case, it's not. The symptom here is the dereferenced value having a null size. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-08fix: add missing degenerate() for logical notLuc Van Oostenryck1-1/+0
Expressions involving the logical-not '!' does not call degenerate(). Since the result type is always 'int' and thus independent of the expression being negated, this has no effect on the type-checking but the linearization is wrong. For example, code like: int foo(void) { if (!arr) return 1; return 0; } generates: foo: load %r6 <- 0[arr] seteq.32 %r7 <- VOID, $0 ret.32 %r7 The 'load' being obviously wrong. Fix this by adding the missing degenerate(). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-07add testcases linearization of degenerated arrays/functionsLuc Van Oostenryck3-0/+110
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13Merge branches 'testcases-bugs', 'testcases-bugs-optim' and ↵Luc Van Oostenryck2-0/+55
'testcases-mem2reg' into tip
2017-11-13add test case for superfluous cast with volatilesLuc Van Oostenryck1-0/+14
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13add testcase for return & inlineLuc Van Oostenryck1-0/+24
The linearization of 'return' statements must correctly take in account some implementation details of the inlining. As such, it deserves its own testcase. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13add testcase for __builtin_unreachable()Luc Van Oostenryck1-0/+31
__builtin_unreachable()'s semantic has consequences on the CFG and this should be taken in account for: * checking for undefined variables * checking when control reaches end of non-void function * context checking * ... Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13add test case for memory to register problemLuc Van Oostenryck1-0/+25
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13dump-ir: make it more flexibleLuc Van Oostenryck1-1/+1
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13dump-ir: rename -fdump-linearize to -fdump-irLuc Van Oostenryck1-1/+1
as it will be used for dumping the IR not only just after linearization but after other passes too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-06-15fix: add missing examine in evaluate_dereference()Luc Van Oostenryck1-0/+19
sparse use lazy type evaluation. This evaluation is done via the examine_*() functions, which we must insure to have been called when type information is needed. However, it seems that this is not done for expressions with multiple level of dereferencing. There is (at least) two symptoms: 1) When the inner expression is complex and contains a typeof: a bogus error message is issued, either "error: internal error: bad type in derived(11)" or "error: cannot dereference this type", sometimes followed by another bogus "warning: unknown expression (...)". 2) This one is only visible with test-linearize but happen even on a plain double deref: the result of the inner deref is typeless. Obviously the first symptom is a consequence of the second one. Fix this by adding a call to examine_symbol_type() at the beginning of evaluate_dereference(). Note: This fixes all the 17 "cannot dereference" and 19 "internal error" present on the Linux kernel while using sparse on a x86-64 allyesconfig (most coming from the call of rcu_dereference_sched() in cpufreq_update_util()). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-05-19fix implicit zero initializer.Luc Van Oostenryck3-0/+171
The C standard requires that, when initializing an aggregate, all fieds not explicitly initialized shall be implicity zero-initialized (more exactly "the same as objects that have static storage duration" [6.7.9.21]). Until now sparse didn't did this. Fix this (when an initializer is present and the object not a scalar) by first storing zeroes in the whole object before doing the initialization of each fields explicitly initialized. Note 1: this patch initialize the *whole* aggregate while the standard only requires that existing fields are initialized. Thanks to Linus to notice this. Note 2: this implicit initialization is not needed if all fields are explicitly initialized but is done anyway, for the moment. Note 3: the code simplify nicely when there is a single field that is initialized, much less so when there is several ones. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-05-19add test case for linearize_initializer() of bitfieldsLuc Van Oostenryck1-0/+27
In linearize_initializer(), 'ad->bit_size' & 'ad->bit_offset' were never set, making the correct initialization impossible (a bit_size of zero being especially bad, resulting in a mask of -1 instead of 0). This is now fixed since 'bit_size' & 'bit_offset' are taken directly from 'result_type'. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>