blob: 49ff6358b4b8c53f890c24df728df4fa9d51fe25 [file] [log] [blame] [view]
Elliott Hughesc2125452017-11-27 13:42:41 -08001# Clang Migration Notes
2
Dan Alberta08131f2021-06-16 14:40:29 -07003NDK r17 was the last version to include GCC. If you're upgrading from an old NDK
4and need to migrate to Clang, this doc can help.
Elliott Hughesc2125452017-11-27 13:42:41 -08005
Dan Alberta08131f2021-06-16 14:40:29 -07006If you maintain a custom build system, see the [Build System Maintainers]
7documentation.
Elliott Hughesc2125452017-11-27 13:42:41 -08008
Dan Alberta08131f2021-06-16 14:40:29 -07009[Build System Maintainers]: ./BuildSystemMaintainers.md
Elliott Hughesc2125452017-11-27 13:42:41 -080010
Elliott Hughes2a011a02021-05-18 16:44:36 -070011## `-Oz` versus `-Os`
Elliott Hughesac244782017-08-31 08:58:13 -070012
13[Clang Optimization Flags](https://clang.llvm.org/docs/CommandGuide/clang.html#code-generation-options)
14has the full details, but if you used `-Os` to optimize your
15code for size with GCC, you probably want `-Oz` when using
16Clang. Although `-Os` attempts to make code small, it still
17enables some optimizations that will increase code size (based on
18https://stackoverflow.com/a/15548189/632035). For the smallest possible
19code with Clang, prefer `-Oz`. With `-Oz`, Chromium actually saw both
20size *and* performance improvements when moving to Clang compared to
21`-Os` with GCC.
22
Elliott Hughes2a011a02021-05-18 16:44:36 -070023## `__attribute__((__aligned__))`
Elliott Hughesac244782017-08-31 08:58:13 -070024
25Normally the `__aligned__` attribute is given an explicit alignment,
26but with no value means “maximum alignment”. The interpretation of
27“maximum” differs between GCC and Clang: Clang includes vector types
28too so for ARM GCC thinks the maximum alignment is 8 (for `uint64_t`), but
29Clang thinks it’s 16 (because there are NEON instructions that require
3016-byte alignment). Normally this shouldn’t matter because malloc is
31always at least 16-byte aligned, and mmap regions are page (4096-byte)
32aligned. Most code should either specify an explicit alignment or use
33[alignas](http://en.cppreference.com/w/cpp/language/alignas) instead.
34
Elliott Hughes2a011a02021-05-18 16:44:36 -070035## `-Bsymbolic`
Elliott Hughesac244782017-08-31 08:58:13 -070036
37When targeting Android (but no other platform), GCC passed
38[-Bsymbolic](ftp://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html)
39to the linker by default. This is not a good default, so Clang does not
40do that. `-Bsymbolic` causes the following behavior change:
41
42```c++
43// foo.cpp
44#include <iostream>
45
46void foo() {
47 std::cout << "Goodbye, world" << std::endl;
48}
49
50void bar() {
51 foo();
52}
53```
54
55```c++
56// main.cpp
57#include <iostream>
58
59extern void bar();
60
61void foo() {
62 std::cout << "Hello, world\n";
63}
64
65int main(int, char**) {
66 foo(); // Prints “Hello, world!”
67 bar(); // Without -Bsymbolic, prints “Hello, world!” With -Bsymbolic, prints “Goodbye, world!”
68}
69```
70
71In addition to not being the "expected" default behavior on all other
72platforms, this prevents symbol interposition (used by tools such
73as asan).
74
75You might however wish to add manually `-Bsymbolic` back because it can
76result in smaller ELF files because fewer relocations are needed. If you
77do want the non-`-Bsymbolic` behavior but would like fewer relocations,
78that can be achieved via `-fvisibility=hidden` (and manually exporting
79the symbols you want to be public, using the `JNI_EXPORT` macro in JNI
80code or `__attribute__ ((visibility("default")))` otherwise. Linker
81version scripts are an even more powerful mechanism for controlling
82exported symbols, but harder to use.
83
Elliott Hughes2a011a02021-05-18 16:44:36 -070084## Assembler issues
Elliott Hughesac244782017-08-31 08:58:13 -070085
Elliott Hughes2a011a02021-05-18 16:44:36 -070086For many years the problem of adjusting inline assembler to work with
87LLVM could be punted down the road by using `-fno-integrated-as` to fall
88back to the GNU Assembler (GAS). With the removal of GNU binutils from
89the NDK, such issues will now need to be addressed. We’ve collected
90some of the most common issues and their solutions/workarounds here.
91
92### `.arch` or `.arch_extension` scope with `__asm__`
93GAS doesn’t scope `.arch` or `.arch_extension`, so you can have a global
94`__asm__(".arch foo")` that applies to the whole C/C++ source file,
95just like a bare `.arch` or `.arch_extension` directive would in a .S
96file. LLVM scopes these to the specific `__asm__` in which it occurs,
97so you’ll need to adapt your inline assembler, or build the whole file
98for the relevant arch variant.
99
100### ARM `ADRL`
101GAS lets you use the `ADRL` pseudoinstruction to get the address of
102something too far away for a regular `ADR` to reference. This means
103that it expands to two instructions, which LLVM doesn’t support,
104so you’ll need to use a macro something like this instead:
105```
106 .macro ADRL reg:req, label:req
107 add \reg, pc, #((\label - .L_adrl_\@) & 0xff00)
108 add \reg, \reg, #((\label - .L_adrl_\@) - ((\label - .L_adrl_\@) & 0xff00))
109 .L_adrl_\@:
110 .endm
111```
112
113### ARM assembler syntactical strictness
114While GAS supports the older divided and newer unified syntax (selectable
115via `.syntax unified` and `.syntax divided`), LLVM only supports the
116newer unified syntax.
117
118As an example of where this matters, `LDR` has an optional type and the
119optional condition code allowed on all instructions. GAS allows these
120to come in either order when using divided syntax, but LLVM only allows
121them in the canonical order given in the ARM instruction reference (which
122is what “unified” syntax means). So continuing this example, GAS
123accepts both `LDRBEQ` and `LDREQB`, but LLVM only accepts `LDRBEQ` (with
124the condition code at the end, as the instruction appears in the manual).
125
126Most humans usually use this order anyway, but you’ll have to rearrange
127any instructions that don’t use the canonical order.
128
129### ARM assembler implicit operands
130Some ARM instructions have restrictions that make some operands
131implicit. For example, the two target registers supplied to `LDREXD`
132must be consecutive. GAS would allow you to write `LDREXD R1, [R4]`
133because the other register _must_ be `R2`, but LLVM requires both
134registers to be explicitly stated, in this case `LDREXD R1, R2, [R4]`.
135
136### ARM `.arm` or `.code 32` alignment
137Switching from Thumb to ARM mode implicitly forces 4-byte alignment
138with GAS but doesn’t with LLVM. You may need to use an explicit
139`.align`/`.balign`/`.p2align` directive in such cases.
140
141### No `--defsym` command-line option
142GAS and LLVM implement their own conditional assembly mechanism with
143`.if`...`.endif` rather than the C preprocessor’s `#if`...`#endif`. The
144equivalent of `-DA=B` for `.if` is `-Wa,-defsym,A=B`, but GAS allowed
145`--defsym` instead of `-defsym`. LLVM requires `-defsym`.
146
147You might also prefer to just use the C preprocessor. If your assembly
148is in a .S file it is already being preprocessed. If your assembly
149is in a file with any other extension (including `.s` --- this is the
150difference between `.s` and `.S`), you’ll need to either rename it to
151`.S` or use the `-x assembler-with-cpp` flag to the compiler to override
152the file extension-based guess.
153
154### No `.func`/`.endfunc`
155GAS ignores a request for obsolete STABS debugging information to be
156emitted using `.func` and `.endfunc`. Neither GAS nor LLVM actually
157support STABS, but LLVM rejects these meaningless directives. The fix
158is simply to remove them.