diff options
| author | Christopher Li <sparse@chrisli.org> | 2013-02-12 23:01:45 -0800 |
|---|---|---|
| committer | Christopher Li <sparse@chrisli.org> | 2013-02-13 14:55:26 -0800 |
| commit | 1b8e012d10d2a5af2d4935e4a47df9c527399219 (patch) | |
| tree | eb5c93ce49e6bc718f6da55fdb59c85df7a7f011 /Makefile | |
| parent | 6558e30ec635e26e767cee027936a0d0cae79bcb (diff) | |
| parent | 3dbed8ac24a2b4b24bc9776d89ea5328f1424a63 (diff) | |
| download | sparse-dev-1b8e012d10d2a5af2d4935e4a47df9c527399219.tar.gz | |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/sparse into marge
Pull preprocessor fix from Al Viro.
1) we really, really shouldn't convert escape sequences too early;
#define A(x) #x
A('\12')
should yield "'\\12'", *not* "'\\n'".
2) literal merging handles all sequences of string/wide string
literals; result is wide if any of them is wide. string_expression() is
handling that wrong - "ab"L"c" is L"abc"
3) with support (no matter how cursory) of wide char constants
and wide string literals, we really ought to handle
#define A(x,y)
A(L,'a')
properly; it's not that tricky - combine() needs to recognize <IDENT["L"],CHAR>
and <IDENT["L"],STRING> pairs.
4) '\777' is an error, while L'\777' is valid - the value should fit
into unsigned char or unsigned counterpart of wchar_t. Note that for string
literals this happens *after* phase 6 - what matters is the type of literal
after joining the adjacent ones (see (2) above).
5) stringifying should only quote \ and " in character constants and
string literals,
#define A(x) #x
A(\n)
should produce "\n", not "\\n"
6) we are losing L when stringifying wide string literals; that's
wrong.
I've patches hopefully fixing the above. Basically, I delay interpreting
escape sequences (past the bare minimum needed to find where the token ends)
until we are handling an expression with character constant or string literal
in it.
For character constants I'm keeping the token body in token->embedded -
4-character array replacing token->character. That covers practically
all realistic instances; character constant *may* be longer than that,
but it has to be something like '\x000000000000000000000000041' - sure,
that's 100% legitimate C and it's going to be the same as '\x41' on
everything, but when was the last time you've seen something like that?
So I've split TOKEN_CHAR into 5 values - TOKEN_CHAR+1--TOKEN_CHAR+4 meaning
1--4 characters kept in ->embedded[], TOKEN_CHAR itself used for extremely
rare cases longer than that (token->string holds the body in that case).
TOKEN_WIDE_CHAR got the same treatment.
AFAICS, with those fixes we get the same behaviour as in gcc for
silently ignored by cpp if the string/char constant doesn't make it
out of preprocessor. sparse still warns about those. The situation
with this one is frustrating; on one hand C99 is saying that e.g.
'\x' is not a token. Moreover, in a footnote in 6.4.4.4 it flat-out
requires diagnostics for such. On the other hand... footnotes are
informative-only and having "other character" token match ' would
puts us in nasal daemon country, so gcc is free to do whatever it feels
like doing. I think we shouldn't play that kind of standard-lawyering
*and* sparse has always warned on that, so I've left that warning
in place.
Note that real wchar_t handling is still not there; at the very least,
we need to decide what type will be used for that sucker (for gcc it's
int on all targets we care about), fix the handling of wide string literals
in initializers and evaluate_string() and stop dropping upper bits in
get_string_constant(). That would probably mean not using struct string
for wide ones, as well... Hell knows; I don't want to touch that right
now. If anything, I'd rather wait until we get to C11 support - they've
got much saner variants of wide strings there (char16_t/char32_t with
u and U as token prefix as L is used for wchar_t; there's also u8"..." for
UTF8 strings).
Diffstat (limited to 'Makefile')
| -rw-r--r-- | Makefile | 2 |
1 files changed, 1 insertions, 1 deletions
@@ -93,7 +93,7 @@ LIB_H= token.h parse.h lib.h symbol.h scope.h expression.h target.h \ LIB_OBJS= target.o parse.o tokenize.o pre-process.o symbol.o lib.o scope.o \ expression.o show-parse.o evaluate.o expand.o inline.o linearize.o \ - sort.o allocate.o compat-$(OS).o ptrlist.o \ + char.o sort.o allocate.o compat-$(OS).o ptrlist.o \ flow.o cse.o simplify.o memops.o liveness.o storage.o unssa.o dissect.o LIB_FILE= libsparse.a |
