| Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
This is almost a total rewrite of the macro expansion. It now does
lazy-copy or the arguments, with the realization that if an argument
only shows up once in the macro expansion (quite common), we don't need
to allocate a new copy of the argument token stream, we can just insert
it as-is.
I hope it's reasonably clean and readable; review would be obviously welcome.
|
|
Ugh. Line-splicing was badly b0rken and overcomplicated to boot.
Moved to nextchar(), nextchar() split into fast inlined version and full
(non-inline) one it falls back on.
See comments in the testcases below
|
|
If we get a newline in the middle of a comment, stream->pos.newline
is set and remains set. In effect, newlines in the middle of multiline
comments drift to its end. This is wrong - e.g.
#define A /* foo
*/ B
is 100% legitimate - since every comment is treated as if replaced with
single space, the above is equivalent to
#define A B
Current code treats it as
#define A
B
which is bogus. Fix is trivial - we simply should restore ->newline we had
at the beginning of comment once we finish skipping it.
BTW, I'm starting to put new testcases into subdirectories - by translation
phase (multibyte character mapping == phase 1, line-splicing == phase 2,
tokenizer == phase 3, macro-expansion == phase 4, remapping from source
charset to execution charset == phase 5, merging adjacent string constants
== phase 6, conversion of tokens from preprocessor to compiler ones and
translation proper == phase 7, linking == phase 8). We obviously don't
have many of those (no multibyte handling, source and execution charsets
are identical and having no backend we obviously do not link anything),
but it's easier to keep track of what's what in the tests that way. We
already have way too many preprocessor<n>.c in there and going for saner
names will only create confusion between preprocessor and parser tests.
I'm not moving existing testcases - that's just for new ones...
|
|
This does two things:
a) get C99 varargs to use the same mechanism as gcc ones use - all
we need is to internally convert final ... in the list to ident[__VA_ARGS__]
and set its ->next to &variable_argument. After that we don't need any
special-casing for them.
b) take parsing the argument list in a separate function and does
it with all sanity checks; normally it ends up with fewer tests than old
variant.
|
|
> 1) no ## in the beginning or end of body. [6.10.3.3(1)]
> It's a warning, recovery consists in dropping them at #define time.
Fixed in the patch I'm testing, and I was wrong about proper recovery -
better ignore the entire definition than go on with mangled one.
> 2) ## that comes from argument does *not* have a special meaning;
> only one from the body does. [6.10.3.3(3)]
Fixed.
> 3) assuming left-to-right order of evaluation for ## (we are free
> to choose any [6.10.3.3(3)] and left-to-right is much easier when we get
> to merging tokens), if ## immediately follows another ## it can be treated
> as not having a special meaning (it will be merged with whatever precedes
> the first one and even if result of merge will be "##", it won't have a
> special meaning). IOW, in
> a ## b ## ## c ## ## ## d
> the 1st, 2nd, 4th and 6th ## will concatenate; 3rd and 5th ones will be
> merge arguments and won't merge anything themselves.
Fixed. The rest is not - it will be a separate patch. BTW, there's another
couple of bugs - see preprocessor9.c and preprocessor10.c for the stuff that
triggers them.
Parsing of macro bodies should be OK by now; the only missing check is
"the only place where __VA_ARGS__ can occur is in the body of macro with
argument list ending on ellipsis" and that will be dealt with later.
We get three new kinds of tokens - TOKEN_QUOTED_ARGUMENT (unexpanded arguments
next to ##), TOKEN_STR_ARGUMENT (#<argument>) and TOKEN_CONCAT (concatenation
operator). When we go through the body of macro at #define time, we
a) decide which ## will be operators there (see (3) above) and set
the type of those to TOKEN_CONCAT; retokenize() is looking for that
token_type() now instead of doing match_op().
b) collapse OP[#] IDENT[argument] to a single token -
STR_ARGUMENT[argnum]. expand_arguments() looks for that token type instead
of messing with match_op() + check that the next one is an argument.
c) mark the rest of arguments either as QUOTED_ARGUMENT or
MACRO_ARGUMENT, depending on whether they have a concatenation as neighbor
or not; expand_arguments() looks for that to decide if expand_one_arg()
should do expansion when subsitituting (instead of messing with checking
neighbors).
All that stuff is done in one pass and comes pretty much for free -
see handle_define() for details. And we get a nice payoff at expansion
time for that (will get more when we get to collecting the information on
arguments uses).
preprocessor{8,9,10}.c added; the first one shows the stuff fixed by
now; other two are still not handled correctly.
I don't see any regression - neither on validation/* stuff nor on full
kernel build.
|
|
We taint identifiers when we are done with argument expansion; we
untaint them when scan gets through the "untaint" token we leave right
after the body of macro; we mark tokens noexpand in three situations:
1) when we copy a token marked noexpand;
2) when we get to expanding a token and see that its identifier is
currently tainted;
3) when we scan for closing ) and see an identifier token with
currently tainted identifier.
That makes sure that by the time we get past the expanded body (and
untaint the left-hand side of macro), all tokens bearing that identifier
will be seen and marked noexpand.
There is more elegant variant (it's practically straight from text of
standard - set noexpand only in dup_token() and make it
if (token_type(token) == TOKEN_IDENT)
alloc->noexpand = token->noexpand | token->ident->tainted;
which is as close as to transliteration of 6.10.3.4(2) as it gets), but
before we can go for it, we need to sort out another problem - order of
expansion/copying for arguments.
So for now here's an equivalent heavier variant with a wart ((3) above)
that works even with current expand_arguments() and when that gets sorted
out we'll be able to get anti-recursion logics absolutely straightforward.
And yes, it does preprocessor1.c correctly; it even get much subtler
preprocessor7.c right.
|
|
|
|
|
|
|
|
This time we (again) have a bug where we optimize
the case of a zero member offset, and we get the
type wrong.
|
|
verifies that we're doing the proper integer
promotions for normal integer binary operations.
|
|
|
|
|
|
|
|
Now, we just assume it's the ABI size (32 bits) rather than
care about the distinction between what we want to return up the
parse tree stack (possibly not 32 bits), and what the ABI gives
us (32 bits).
Also, add new file validation/test-be.c, to house a growing
collection of basic sanity checks I can run, to help avoid
regressions.
|
|
which we get badly wrong (it's used by pcinames.c).
|
|
the current brokenness in type generation and checking.
Run
./check validation/type1.c
to see these totally bogus error messages:
warning: validation/type1.c:23:15: incorrect type in initializer (different base types)
warning: validation/type1.c:23:15: expected char const *s
warning: validation/type1.c:23:15: got struct hello *arg
This bug causes several bogus warnings in the kernel.
|
|
One is interesting (preprocessor3.c), and we get it wrong.
Surprise surprise.
|
|
We actually pass one of them, surprise surprise.
I should add a few more that highlight the known
problems in token pasting etc, but I'm lazy. The
C preprocessor is a bitch.
|