sparse/sparse-dev.git - Sparse's development tree with unstable git history

Age	Commit message (Collapse)	Author	Files	Lines
2020-10-20	Merge branch 'bf-sign' into next	Luc Van Oostenryck	4	-5/+44
	* teach sparse about -funsigned-bitfields * let plain bitfields default to signed
2020-10-08	fix evaluation of pointer to bool conversions	Luc Van Oostenryck	1	-1/+0
	The pointer to bool conversion used an indirect intermediate conversion to an int because the pointer was compared to 0 and not to a null pointer. The final result is the same but the intermediate conversion generated an unneeded OP_PTRTOU instruction which made some tests to fail. Fix this by directly comparing to a null pointer of the same type as the type to convert. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-09-16	teach sparse about -funsigned-bitfields	Luc Van Oostenryck	4	-5/+44
	Currently, Sparse treats 'plain' bitfields as unsigned. However, this is this is inconsistent with how non-bitfield integers are handled and with how GCC & clang handle bitfields. So, teach sparse about '-funsigned-bitfields' and by default treat these bitfields are signed, like done by GCC & clang and like done for non-bitfield integers. Also, avoid plain bitfields in IR related testcases. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-09-07	builtin: teach sparse to linearize __builtin_fma()	Luc Van Oostenryck	1	-0/+19
	The support for the linearization of builtins was already added for __builtin_unreachable() but this builtin has no arguments and no return value. So, to complete the experience of builtin linearization, add the linearization of __builtin_fma(). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-17	fix evaluate_ptr_add() when sizeof(offset) != sizeof(pointer)	Luc Van Oostenryck	2	-0/+173
	For a binary op, both sides need to be converted to the resulting type of the usual conversion. For a compound-assignment (which is equivalent to a binary op followed by an assignment), the LHS can't be so converted since its type needs to be preserved for the assignment, so only the RHS is converted at evaluation and the type of the RHS is used at linearization to convert the LHS. However, in the case of pointer arithmetics, a number of shortcuts are taken and as a result additions with mixed sizes can be produced producing invalid IR. So, fix this by converting the RHS to the same size as pointers, as done for 'normal' binops. Note: On 32-bit kernel, this patch also removes a few warnings about non size-preserving casts. It's fine as these warnings were designed for when an address would be stored in an integer, not for storing an offset like it's the case here. Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-11	bug-assign-op0.c: fix test on 32-bit builds	Ramsay Jones	1	-5/+5
	This test was failing on 32-bit because it made the assumption that 'long' is always 64-bit. Fix this by using 'long long' when 64-bit is needed. Fixes 36a75754ba161b4ce905390cf5b0ba9b83b34cd2 Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-06	shift-assign: restrict shift count to unsigned int	Luc Van Oostenryck	1	-1/+0
	After the RHS of shift-assigns had been integer-promoted, both gcc & clang seems to restrict it to an unsigned int. This only make a difference when the shift count is negative and would it make it UB. Better to have the same generated code, so make the same here. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-06	shift-assign: fix linearization of shift-assign	Luc Van Oostenryck	2	-2/+0
	The result of a shift-assigns has the same type as the left operand but the shift itself must be done on the promoted type. The usual conversions are not done for shifts. The problem is that this promoted type is not stored explicitly in the data structure. This is specific to shift-assigns because for other operations, for example add-assign, the usual conversions must be done and the resulting type can be found on the RHS. Since at linearization, the LHS and the RHS must have the same type, the solution is to cast the RHS to LHS's promoted type during evaluation. This solve a bunch of problems with shift-assigns, like doing logical shift when an arithmetic shift was needed. Fixes: efdefb100d086aaabf20d475c3d1a65cbceeb534 Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-08-06	shift-assign: add more testcases for bogus linearization	Luc Van Oostenryck	2	-0/+374
	The usual conversions must not be applied to shifts. This causes problems for shift-assigns. So, add testcases for all combinations of size and signedness. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-07-06	testsuite: add testcase for bogus linearization of >>= & /=	Luc Van Oostenryck	1	-0/+115
	When doing a shift operation, both arguments are subjected to integer promotion and the type of the result is simply the type of the promoted left operand. Easy. But for a shift-assignment, things are slightly more complex: -) 'a >>= n' should be equivalent to 'a = a >> n' -) but the type of the result must be the type of the left operand before integer promotion. Currently, the linearization code use the type of the right operand to infer of the type of the operation. But simply changing the code to use the type of the left operand will also be wrong (for example for signed/unsigned divisions). Nasty. For example, the following C code: int s = ...; s >>= 11U; is linearized as a logical shift: lsr.32 %r2 <- %arg1, $11 while, of course it's an arithmetic shift that is expected: asr.32 %r2 <- %arg1, $11 So, add a testcase for these. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21	bad-goto: check declaration of label expressions	Luc Van Oostenryck	1	-1/+0
	Issue an error when taking the address of an undeclared label and mark the function as improper for linearization since the resulting IR would be invalid. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21	bad-goto: jumping inside a statement expression is an error	Luc Van Oostenryck	2	-2/+0
	It's invalid to jump inside a statement expression. So, detect such jumps, issue an error message and mark the function as useless for linearization since the resulting IR would be invalid. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21	bad-goto: catch labels with reserved names	Luc Van Oostenryck	1	-1/+0
	If a reserved name is used as the destination of a goto, its associated label won't be valid and at linearization time no BB will can be created for it, resulting in an invalid IR. So, catch such gotos at evaluation time and mark the function to not be linearized. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21	bad-goto: reorganize testcases and add some more	Luc Van Oostenryck	7	-5/+92
	Reorganize the testcases related to the 'scope' of labels and add a few new ones. Also, some related testcases have some unreported errors other than the features being tested. This is a problem since such tescases can still fail after the feature being tested is fixed or implemented. So, fix these testcases or split them so that they each test a unique feature. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21	bad-goto: add testcases for linearization of invalid labels	Luc Van Oostenryck	1	-0/+19
	A goto to a reserved or a undeclared label will generate an IR with a branch to a non-existing BB. Bad. Add a testcase for these. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-05-21	bad-goto: add testcase for 'jump inside discarded expression statement'	Luc Van Oostenryck	1	-0/+28
	A goto done into an piece of code discarded at expand or linearize time will produce an invalid IR. Add a testcase for it. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-03-20	teach sparse to linearize __builtin_unreachable()	Luc Van Oostenryck	2	-2/+0
	__builtin_unreachable() is one of the builtin that shouldn't be ignored at IR level since it directly impact the CFG. So, add the infrastructure put in place in the previous patch to generate the OP_UNREACH instruction instead of generating a call to a non-existing function "__builtin_unreachable()". Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-03-20	add an implicit __builtin_unreachable() for __noreturn	Luc Van Oostenryck	1	-1/+0
	The semantic of a __noreturn function is that ... it doesn't return. So, insert an instruction OP_UNREACH after calls to such functions. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2020-03-20	add testcases for OP_UNREACH	Luc Van Oostenryck	3	-7/+58
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-10-09	"graph" segfaults on top-level asm	Luc Van Oostenryck	1	-0/+1
	The "graph" binary segfaults on this input: asm(""); with gdb saying (edited for clarity): Program received signal SIGSEGV, Segmentation fault. in graph_ep (ep=0x7ffff7f62010) at graph.c:52 (gdb) p ep->entry $1 = (struct instruction ) 0x0 Sadly, the commit that introduced this crash: 15fa4d60e ("topasm: top-level asm is special") was (part of a bigger series) meant to fix crashes because of such toplevel asm statements. Toplevel ASM statements are quite abnormal: they are toplevel but anonymous symbols * they should be limited to basic ASM syntax but are not * they are given the type SYM_FN but are not functions * there is nothing to evaluate or expand about it. These cause quite a few problems including crashes, even before the above commit. So, before handling them more correctly and instead of adding a bunch of special cases here and there, temporarily take the more radical approach of stopping to add them to the list of toplevel symbols. Fixes: 15fa4d60ebba3025495bb34f0718764336d3dfe0 Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Analyzed-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-09-27	asm: linearization of output memory operands is different	Luc Van Oostenryck	1	-1/+0
	ASM memory operands are considered by GCC as some kind of implicit reference. Their linearization should thus not create any storage statement: the storage is done by the ASM code itself. Adjust the linearization of such operands accordingly. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-09-27	asm: add test evaluation, expansion & linearization of ASM operands	Luc Van Oostenryck	1	-0/+24
	ASM statements are quite complex. Add some tests to catch some potential errors. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-09-26	expand: add missing expansion of compound literals	Luc Van Oostenryck	1	-1/+0
	Compound literals, like all other expressions, need to be be expanded before linearization, but this is currently not done. As consequence, some builtins are unexpectedly still present, same for EXPR_TYPEs, ... with error messages like: warning: unknown expression at linearization. Fix this by adding the missing expansion of compound literals. Note: as explained in the code itself, it's not totally clear how compound literals can be identified after evaluation. The code here consider all anonymous symbols with an initializer as being a compound literal. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2019-02-04	target.c: ignore -m64 on archs where int32_t is a long	Luc Van Oostenryck	9	-0/+9
	If the flag '-m64' is used on a 32-bit architecture/machine having int32_t set to 'long', then these int32_t are forced to 64-bit ... So, ignore the effect of -m64 on these archs and ignore '64-bit only' tests on them. Reported-by: Uwe Kleine-König <uwe@kleine-koenig.org> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Tested-by: Uwe Kleine-König <uwe@kleine-koenig.org>
2018-09-08	fix linearization of non-constant switch-cases	Luc Van Oostenryck	1	-1/+0
	The linearization of switches & cases makes the assumption that the expressions for the cases are constants (EXPR_VALUE). So, the corresponding values are dereferenced without checks. However, if the code uses a non-constant case, this dereference produces a random value, probably one corresponding to some pointers belonging to the real type of the expression. Fix this by checking during linearization the constness of the expression and ignore the non-constant ones. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-08	add testcase for non-constant switch-case	Luc Van Oostenryck	1	-0/+38
	Switches with non-constant cases are currently linearized using as value the bit pattern present in the expression, creating more or less random multijmps. Add a basic testcase to catch this. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	Merge branches 'missing-return' and 'fix-logical-phi' into tip	Luc Van Oostenryck	13	-90/+288
	* fix linearization/SSA when missing a return * fix linearization/SSA of (nested) logical expressions
2018-09-06	fix linearization of nested logical expr	Luc Van Oostenryck	4	-93/+90
	The linearization of nested logical expressions is not correct regarding the phi-nodes and their phi-sources. For example, code like: extern int a(void); int b(void); int c(void); static int foo(void) { return (a() && b()) && c(); } gives (optimized) IR like: foo: phisrc.32 %phi1 <- $0 call.32 %r1 <- a cbr %r1, .L4, .L3 .L4: call.32 %r3 <- b cbr %r3, .L2, .L3 .L2: call.32 %r5 <- c setne.32 %r7 <- %r5, $0 phisrc.32 %phi2 <- %r7 br .L3 .L3: phi.32 %r8 <- %phi2, %phi1 ret.32 %r8 The problem can already be seen by the fact that the phi-node in L3 has 2 operands while L3 has 3 parents. There is no phi-value for L4. The code is OK for non-nested logical expressions: linearize_cond_branch() takes the sucess/failure BB as argument, generate the code for those branches and there is a phi-node for each of them. However, with nested logical expressions, one of the BB will be shared between the inner and the outer expression. The phisrc will 'cover' one of the BB but only one of them. The solution is to add the phi-sources not before but after and add one for each of the parent BB. This way, it can be guaranteed that each parent BB has its phisrc, whatever the complexity of the sub- expressions. With this change, the generated IR becomes: foo: call.32 %r2 <- a phisrc.32 %phi1 <- $0 cbr %r2, .L4, .L3 .L4: call.32 %r4 <- b phisrc.32 %phi2 <- $0 cbr %r4, .L2, .L3 .L2: call.32 %r6 <- c setne.32 %r8 <- %r6, $0 phisrc.32 %phi3 <- %r8 br .L3 .L3: phi.32 %r1 <- %phi1, %phi2, %phi3 ret.32 %r1 Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	add tests for nested logical expr	Luc Van Oostenryck	1	-0/+49
	Nested logical expressions are not correctly linearized. Add a test for all possible combinations of 2 logical operators. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	fix ordering of phi-node operand	Luc Van Oostenryck	2	-5/+4
	The linearization of logical '&&' create a phi-node with its operands in the wrong order relatively to the parent BBs. Switch the order of the operands for logical '&&'. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	add testcases for wrong ordering in phi-nodes	Luc Van Oostenryck	4	-0/+55
	In valid SSA there is a 1-to-1 correspondance between each operand of a phi-node and the parents BB. However, currently, this is not always respected. Add testcases for the known problems. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	return nothing only in void functions	Luc Van Oostenryck	1	-1/+0
	Currently, the code for the return is only generated if the effectively return a type or a value with a size greater than 0. But this mean that a non-void function with an error in its return expression is considered as a void function for what the generated IR is concerned, making things incoherent. Fix this by using the declared type instead of the type of the return expression. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	use UNDEF for missing returns	Luc Van Oostenryck	5	-5/+0
	If a return statement is missing in the last block, the generated IR will be invalid because the number of operands in the exit phi-node will not match the number or parent BBs. Detect this situation and insert an UNDEF for the missing value. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-06	topasm: top-level asm is special	Luc Van Oostenryck	1	-0/+7
	Top-level ASM statements are parsed as fake anonymous functions. Obviously, they have few in common with functions (for example, they don't have a return type) and mixing the two makes things more complicated than needed (for example, to detect a top-level ASM, we had to check that the corresponding symbol (name) had a null ident). Avoid potential problems by special casing them and return early in linearize_fn(). As consequence, they now don't have anymore an OP_ENTRY as first instructions and can be detected by testing ep->entry. Note: It would be more logical to catch them even erlier, in linearize_symbol() but they also need an entrypoint and an active BB so that we can generate the single statement. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-05	add testcases for missing return in last block	Luc Van Oostenryck	6	-0/+97
	In this case the phi-node created for the return value ends up with a missing operand, violating the semantic of the phi-node: map one value with each predecessor. Add testcases for these missing returns. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-01	fix linearization of unreachable switch (with reachable label).	Luc Van Oostenryck	1	-1/+0
	An unreachable/inactive switch statement is currently not linearized. That's nice because it avoids to create useless instructions. However, the body of the statement can contain a label which can be reachable. If so, the resulting IR will contain a branch to an unexisting BB. Bad. For example, code like: int foo(int a) { goto label; switch(a) { default: label: break; } return 0; } (which is just a complicated way to write: int foo(int a) { return 0; }) is linearized as: foo: br .L1 Fix this by linearizing the statement even if not active. Note: it seems that none of the other statements are discarded if inactive. Good. OTOH, statement expressions can also contains (reachable) labels and thus would need the same fix (which will need much more work). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-09-01	add tescase for unreachable label in switch	Luc Van Oostenryck	1	-0/+20
	or more exactly, an unreachable switch statement but containing a reachable label. This is valid code but is curently wrongly linearized. So, add a testcase for it. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-25	Merge branch 'ssa' into tip	Luc Van Oostenryck	5	-60/+90
	* do 'classical' SSA conversion (via the iterated dominance frontier). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-07	fix instruction size & type in linearize_inc_dec()	Luc Van Oostenryck	2	-68/+75
	If the ++ or -- operator is used on a bitfield, the addition or subtraction is done with the size of the bitfield. So code like: struct { int f:3; } s; ... s->f++; will generate intermediate code like: add.3 %r <- %a, $1 This is not incorrect from the IR point of view but CPUs have only register-sized instructions, like 'add.32'. So, these odd-sized instruction have one or two implicit masking/extend that should better make explicit. Fix this by casting to and from the base type when these operators are used on bitfields. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06	limit the mask used for bitfield insertion	Luc Van Oostenryck	1	-6/+6
	The mask used for bitfield insertion is as big as the integers used internally by sparse. Elsewhere in the code, constants are always truncated to the size of the instructions using them. It's also displaying concerned instructions oddly. For example: and.32 %r2 <- %r1, 0xfffffffffffffff0 Fix this by limiting the mask to the size of the instruction. Fixes: a8e1df573 ("bitfield: extract linearize_bitfield_insert()") Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06	simplify linearize_logical()	Luc Van Oostenryck	1	-92/+68
	The linearized code for logical expressions looks like: .Lc ... condition 1 ... cbr %c, .L1, .L2 .L1 %phisrc %phi1 <- $1 br .Lm .L2 ... condition 2 ... %phisrc %phi2 <- %r br .Lm .Lm %phi %r <- %phi1, %phi2 But .L1 can easily be merged with .Lc: .Lc ... condition 1 ... %phisrc %phi1 <- $1 cbr %c, .Lm, .L2 .L2 ... condition 2 ... %phisrc %phi2 <- %r br .Lm .Lm %phi %r <- %phi1, %phi2 Do this simplification which: * creates less basic blocks & branches * do at linearization time a simplification not done later. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06	expand linearize_conditional() into linearize_logical()	Luc Van Oostenryck	1	-127/+111
	linearize_logical() call linearize_conditional() but needs additional tests there and generate code more complicated than needed. Change this by expanding the call to linearize_conditional() and make the obvious simplification concerning the shortcut expressions 0 & 1. Also, removes the logical-specific parts in linearize_conditional(), since there are now unneeded. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06	fix linearize_conditional() for logical ops	Luc Van Oostenryck	1	-1/+0
	The function linearize_conditional(), normaly used for conditionals (c ? a : b) is also used to linearize the logical ops \|\| and &&. For conditionals, the type evaluation ensure that both LHS & RHS have consistent types. However, this is not the case when used for logical ops. This creates 2 separated but related problems: * the operands are not compared with 0 as required by the standard (6.5.13, 6.5.14). * both operands can have different, incompatible types and thus it's possible to have a phi-node with sources of different, incompatible types, which doesn't make sense. Fix this by: * add a flag to linearize_conditional() telling if it's used for a conditional or for a logical op. * when used for logical ops: * first compare the operands againts zero * convert the boolean result to the expression's type. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-08-06	conditional branches can't accept arbitrary expressions	Luc Van Oostenryck	1	-5/+5
	Conditional branches, or more exactly OP_CBR, can't accept arbitrary expression as condition. it is required to have an integer value. Fix this by adding a comparison against zero.
2018-08-04	add testcase for linearize_logical()	Luc Van Oostenryck	1	-0/+300
	Add some tests in preparation of some bug-fixing and simplification in linearize_logical()linearize_conditional(). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-25	Merge branch 'optim-cast' into tip	Luc Van Oostenryck	3	-0/+57
	* several simplifications involving casts and/or bitfields Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-24	use "%Le" to display floats	Luc Van Oostenryck	2	-13/+13
	Floating-point values are displayed using the printf format "%Lf" but this is the format without exponent (and with default precision of 6 digit). However, by its nature, this format is very imprecise. For example, all values smaller than 0.5e-6 are displayed as "0.000000". Improve this by using the "%Le" format which always use an exponent and thus maximize the precision. Note: ultimately, we should display them exactly, for example by using "%La", but this will requires C99. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-23	add testcases for casts & bitfield insertion/extraction	Luc Van Oostenryck	3	-0/+57
	There is several difficulties some related to unclear semantic of our IR instructions and/or type evaluation. Add testcases trying to cover this area. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-01	testsuite: improve mem2reg testcases	Luc Van Oostenryck	1	-25/+0
	A few tests are added, some have been renamed to better refect their purposes. Finally, some checks have been added or tweaked. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-07-01	testsuite: reorganize tests for compound literals	Luc Van Oostenryck	3	-0/+55
	Split the existing test in 2 as it contains 2 different cases. Also move the test to 'linear/' subdir. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-29	cast: reorganize testcases for cast optimization	Luc Van Oostenryck	1	-405/+0
	validation/linear/* should not contain testcases that are optimization dependent and validation/*.c should not contain tests using 'test-linearize', only those using 'sparse'. Move some cast-related testcases accordingly. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-26	cast: simplify TRUNC + ZEXT to AND	Luc Van Oostenryck	1	-106/+0
	A truncation followed by a zero-extension to the original size, which is produced when loading a storing bitfields, is equivalent to a simple AND masking. Often, this AND can then trigger even more optimizations. So, replace TRUNC + ZEXT instructions by the equivalent AND. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: keep instruction sizes consistent	Luc Van Oostenryck	2	-11/+189
	The last instruction of linearize_load_gen() ensure that loading a bitfield of size N results in a object of size N. Also, we require that the usual binops & unops use the same type on their operand and result. This means that before anything can be done on the loaded bitfield it must first be sign or zero- extended in order to match the other operand's size. The same situation exists when storing a bitfield but there the extension isn't done. We can thus have some weird code like: trunc.9 %r2 <- (32) %r1 shl.32 %r3 <- %r2, ... where a bitfield of size 9 is mixed with a 32 bit shift. Avoid such mixing of size and always zero extend the bitfield before storing it (since this was the implicitly desired semantic). The combination TRUNC + ZEXT can then be optimised later into a simple masking operation. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: specialize integer casts	Luc Van Oostenryck	5	-99/+97
	Casts to integer used to be done with only 2 instructions: OP_CAST & OP_SCAST. Those are not very convenient as they don't reflect the real operations that need to be done. This patch specialize these instructions in: - OP_TRUNC, for casts to a smaller type - OP_ZEXT, for casts that need a zero extension - OP_SEXT, for casts that need a sign extension - Integer-to-integer casts of the same size are considered as a NOPs and are, in fact, never emitted. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: make casts from pointer always size preserving	Luc Van Oostenryck	1	-84/+86
	Currently casts from pointers can be done to any integer type. However, casts to (or from) pointers are only meaningful if it preserves the value and thus done between same-sized objects. To avoid to have to worry about sign/zero extension while doing casts to pointers it's good to not have to deal with such casts. Do this by doing first a cast to an unsigned integer of the same size as a pointer and then, if needed, doing to cast to the final type. As such we have only to support pointer casts to unsigned integers of the same size and on the other hand we have the generic integer-to-interger casts we to support anyway. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: add support for -Wpointer-to-int-cast	Luc Van Oostenryck	1	-1/+1
	It's relatively common to cast a pointer to an unsigned long, for example to make some bit operations. It's much less sensical to cast a pointer to an integer smaller (or bigger) than a pointer is. So, emit a diagnostic for this, under the control of a new warning flag: -Wpointer-to-int-cast. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: specialize cast from pointers	Luc Van Oostenryck	4	-4/+42
	Currently all casts to pointers are processed alike. This is simple but rather unconvenient in later phases as this correspond to different operations that obeys to different rules and which later need extra checks. Change this by using a specific instructions (OP_UTPTR) for [unsigned] integer to pointers. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: make pointer casts always size preserving	Luc Van Oostenryck	1	-30/+32
	Currently casts to pointers can be done from any integer types. However, casts to (or from) pointers are only meaningful if value preserving and thus between objects of the same size. To avoid to have to worry about sign/zero extension while doing casts to pointers it's good to only have to deal with the value preserving ones. Do this by doing first, if needed, a cast an integer of the same size as a pointer before doing the cast to a pointer. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: specialize casts from unsigned to pointers	Luc Van Oostenryck	1	-5/+5
	Currently all casts to pointers are processed alike. This is simple but rather unconvenient as it correspond to different operations that obeys to different rules and which later need extra checks. Change this by using a specific instructions (OP_UTPTR) for unsigned integer to pointers. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: specialize floats to integer conversion	Luc Van Oostenryck	3	-9/+11
	Currently, casts from floats to integers are processed like integers (or any other type) to integers. This is simple but rather unconvenient as it correspond to different operations that obeys to different rules and which later need extra checks. Change this by directly using specific instructions: - FCVTU for floats to unsigned integers - FCVTS for floats to signed integers Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: handle NO-OP casts	Luc Van Oostenryck	1	-0/+15
	Some casts, the ones which doesn't change the size or the resulting 'machine type', are no-op. Directly simplify away such casts. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: specialize FPCAST into [USF]CVTF	Luc Van Oostenryck	1	-10/+10
	Currently, all casts to a floating point type use OP_FPCAST. This is maybe simple but rather uncovenient as it correspond to several quite different operations that later need extra checks. Change this by directly using different instructions for the different cases: - FCVTF for float-float conversions - UCVTF for unsigned integer to floats - SCVTF for signed integer to floats and reject attempts to cast a pointer to a float. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-23	cast: reorg testcases related to casts	Luc Van Oostenryck	6	-0/+858
	* merge the tests about implicit & explicit casts in a single file as there was a lot of redundancy. * shuffle the tests to linear/ or optim/ Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-06-16	testsuite: fix missing return	Luc Van Oostenryck	1	-8/+8
	Some non-void functions in the testcases miss a return. Add the missing return or make the function as returning void. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-05-06	use function-like syntax for __range__	Luc Van Oostenryck	1	-0/+31
	One of sparse's extension to the C language is an operator to check ranges. This operator takes 3 operands: the expression to be checked and the bounds. The syntax for this operator is such that the operands need to be a 3-items comma separated expression. This is a bit weird and doesn't play along very well with macros, for example. Change the syntax to a 3-arguments function-like operator. NB. Of course, this will break all existing uses of this extension not using parenthesis around the comma expression but there doesn't seems to be any. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-03-18	fix-return: remove special case for single return	Luc Van Oostenryck	2	-27/+3
	During the linearization of a function, returns are directly linearized as phi-sources and the exit BB contains the corresponding phi-node and the unique OP_RET. There is also a kind of optimization that is done if there is only a single a return statement and thus a single phi-source: the phi-source and the phi-node is simply ignored and the unique value is directly used by the OP_RET instruction. While this optimization make sense it also has some cons: - the phi-node and the phi-source are created anyway and will need to be removed during cleanup. - the corresponding optimization need to be done anyway during simplification - it's only a tiny special case which save very litte. So, keep things simple and generic and leave this sort of simplification for the cleanup/simplification phase. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-03-11	testsuite: fix problem with double-escaping in patterns	Luc Van Oostenryck	8	-11/+11
	Since the patterns in the testcases are evaluated in the shell script, the backslash used to escape characters special to the pattern need itself to be escaped. Theer is a few cases where it wasn't done so, partly because 'format -l' gave a single escape in its template. Fix all occurences neededing this double-escape as well as the 'format -l' template. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2018-01-02	Merge branches 'fix-expand-bitfield-deref', 'fix-fpops-cse', 'null-expr', ↵	Luc Van Oostenryck	10	-0/+262
	'size-unsized-arrays' and 'master' into tip
2017-12-28	add more testcases for function designator dereference	Luc Van Oostenryck	1	-0/+13
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	fix expansion of constant bitfield dereference	Luc Van Oostenryck	1	-1/+0
	During the expansion of a dereference, it's checked if the initializer corrresponding to the offset we're interested in is a constant. If it's the case, the dereference can be avoided and the constant given as initializer can be used instead. However, it's not enough to check for the offset since, for bitfields there are (usualy) several distinct fields at the same offset. Currently, the first initializer matching the offset is selected and, if a constant, its value is used for the result of the dereferencing of the whole structure. Fix this by refusing such expansion if the constant value correspond to a bitfield. Reported-by: Dibyendu Majumdar <mobile@majumdar.org.uk> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	add testcase for constant bitfield dereference	Luc Van Oostenryck	1	-0/+28
	Reported-by: Dibyendu Majumdar <mobile@majumdar.org.uk> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	Merge branches 'deref-fun-ptr' and 'deref-base-type' into tip	Luc Van Oostenryck	2	-0/+62

2017-12-21	dereference of a function is a no-op	Luc Van Oostenryck	4	-4/+0
	For the '*' operator and functions, the C standard says: "If the operand points to a function, the result is a function designator; ... If the operand has type ‘pointer to type’, the result has type ‘type’". but also (C11 6.3.2.1p4): "(except with 'sizeof' ...) a function designator with type ‘function returning type’ is converted to an expression that has type ‘pointer to function returning type’". This means that in dereferencement of a function-designator is a no-op since the resulting expression is immediately back converted to a pointer to the function. The change effectively drop any dereferencement of function types during their evaluation. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	add testcases for multiple deref of calls	Luc Van Oostenryck	4	-4/+19
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	fix linearize (*fun)()	Luc Van Oostenryck	3	-3/+0
	A function call via a function pointer can be written like: fp(), (fp)() or (fp)() In the latter case the dereference is unneeded but legal and idiomatic. However, the linearization doesn't handle this unneeded deref and leads to the generation of a load of the pointer: int foo(int a, int (fun)(int)) { (*fun)(a); } gives something like: foo: load %r2 <- 0[%arg2] call.32 %r3 <- %r2, %arg1 ret.32 %r3 This happens because, at linearization, the deref is dropped but only if the sub-expression is a symbol and the test for node is not done. Fix this by using is_func_type() to test the type of all call expressions. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	add testcases for the linearization of calls	Luc Van Oostenryck	7	-0/+179
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	fix: evaluate_dereference() unexamined base type	Luc Van Oostenryck	2	-2/+0
	Examination of a pointer type doesn't examine the corresponding base type (this base type may not yet be complete). So, this examination must be done later, when the base type is needed. However, in some cases it's possible to call evaluate_dereference() while the base type is still unexamined. Fix this by adding the missing examine_symbol_type() on the base type. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-21	add testcases for unexamined base type	Luc Van Oostenryck	2	-0/+64
	evaluate_dereference() lacks an explicit examination of the base type. Most of the time, the base type has already been examined via another path, but in some case, it's not. The symptom here is the dereferenced value having a null size. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-08	fix: add missing degenerate() for logical not	Luc Van Oostenryck	1	-1/+0
	Expressions involving the logical-not '!' does not call degenerate(). Since the result type is always 'int' and thus independent of the expression being negated, this has no effect on the type-checking but the linearization is wrong. For example, code like: int foo(void) { if (!arr) return 1; return 0; } generates: foo: load %r6 <- 0[arr] seteq.32 %r7 <- VOID, $0 ret.32 %r7 The 'load' being obviously wrong. Fix this by adding the missing degenerate(). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-12-07	add testcases linearization of degenerated arrays/functions	Luc Van Oostenryck	3	-0/+110
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13	Merge branches 'testcases-bugs', 'testcases-bugs-optim' and ↵	Luc Van Oostenryck	2	-0/+55
	'testcases-mem2reg' into tip
2017-11-13	add test case for superfluous cast with volatiles	Luc Van Oostenryck	1	-0/+14
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13	add testcase for return & inline	Luc Van Oostenryck	1	-0/+24
	The linearization of 'return' statements must correctly take in account some implementation details of the inlining. As such, it deserves its own testcase. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13	add testcase for __builtin_unreachable()	Luc Van Oostenryck	1	-0/+31
	__builtin_unreachable()'s semantic has consequences on the CFG and this should be taken in account for: * checking for undefined variables * checking when control reaches end of non-void function * context checking * ... Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13	add test case for memory to register problem	Luc Van Oostenryck	1	-0/+25
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13	dump-ir: make it more flexible	Luc Van Oostenryck	1	-1/+1
	Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-11-13	dump-ir: rename -fdump-linearize to -fdump-ir	Luc Van Oostenryck	1	-1/+1
	as it will be used for dumping the IR not only just after linearization but after other passes too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-06-15	fix: add missing examine in evaluate_dereference()	Luc Van Oostenryck	1	-0/+19
	sparse use lazy type evaluation. This evaluation is done via the examine_*() functions, which we must insure to have been called when type information is needed. However, it seems that this is not done for expressions with multiple level of dereferencing. There is (at least) two symptoms: 1) When the inner expression is complex and contains a typeof: a bogus error message is issued, either "error: internal error: bad type in derived(11)" or "error: cannot dereference this type", sometimes followed by another bogus "warning: unknown expression (...)". 2) This one is only visible with test-linearize but happen even on a plain double deref: the result of the inner deref is typeless. Obviously the first symptom is a consequence of the second one. Fix this by adding a call to examine_symbol_type() at the beginning of evaluate_dereference(). Note: This fixes all the 17 "cannot dereference" and 19 "internal error" present on the Linux kernel while using sparse on a x86-64 allyesconfig (most coming from the call of rcu_dereference_sched() in cpufreq_update_util()). Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-05-19	fix implicit zero initializer.	Luc Van Oostenryck	3	-0/+171
	The C standard requires that, when initializing an aggregate, all fieds not explicitly initialized shall be implicity zero-initialized (more exactly "the same as objects that have static storage duration" [6.7.9.21]). Until now sparse didn't did this. Fix this (when an initializer is present and the object not a scalar) by first storing zeroes in the whole object before doing the initialization of each fields explicitly initialized. Note 1: this patch initialize the whole aggregate while the standard only requires that existing fields are initialized. Thanks to Linus to notice this. Note 2: this implicit initialization is not needed if all fields are explicitly initialized but is done anyway, for the moment. Note 3: the code simplify nicely when there is a single field that is initialized, much less so when there is several ones. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
2017-05-19	add test case for linearize_initializer() of bitfields	Luc Van Oostenryck	1	-0/+27
	In linearize_initializer(), 'ad->bit_size' & 'ad->bit_offset' were never set, making the correct initialization impossible (a bit_size of zero being especially bad, resulting in a mask of -1 instead of 0). This is now fixed since 'bit_size' & 'bit_offset' are taken directly from 'result_type'. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>