elf-tls.md - OpenGrok cross reference for /aosp12/bionic/docs/elf-tls.md

Lines Matching refs:TLS
1 # Android ELF TLS (Draft)
12 ELF TLS is a system for automatically allocating thread-local variables with cooperation among the
21 At run-time, TLS variables are allocated on a module-by-module basis, where a module is a shared
22 object or executable. At program startup, TLS for all initially-loaded modules comprises the "Static
23 TLS Block". TLS variables within the Static TLS Block exist at fixed offsets from an
25 few instructions. TLS variables belonging to dlopen'ed shared objects, on the other hand, may be
30 Ulrich Drepper's ELF TLS document specifies two ways of organizing memory pointed at by the
33 ![TLS Variant 1 Layout](img/tls-variant1.png)
35 ![TLS Variant 2 Layout](img/tls-variant2.png)
37 Variant 1 places the static TLS block after the TP, whereas variant 2 places it before the TP.
40 an executable, the linker needs to know where an executable's TLS segment is relative to the TP so
41 it can correctly relocate TLS accesses. Both variants are incompatible with Bionic's current
44 Each thread has a "Dynamic Thread Vector" (DTV) with a pointer to each module's TLS block (or NULL
45 if it hasn't been allocated yet). If the executable has a TLS segment, then it will always be module
57 When a C/C++ file references a TLS variable, the toolchain generates instructions to find its
58 address using a TLS "access model". The access models trade generality against efficiency. The four
66 A TLS variable may be in a different module than the reference.
70 A GD access can refer to a TLS variable anywhere. To access a variable `tls_var` using the
71 "traditional" non-TLSDESC design described in Drepper's TLS document, the toolchain compiler emits a
105 module's TLS block. Before it can do this, it ensures that the module's TLS block is allocated. A
108 1. If the current thread's DTV generation count is less than the current global TLS generation, then
116 musl, on the other, preallocates TLS memory in `pthread_create` and in `dlopen`, and each can report
121 LD is a specialization of GD that's useful when a function has references to two or more TLS
124 module's TLS block, then adds each variable's DTPOFF to the result.
154 variable with a dynamic initializer has an associated TLS guard variable.)
158 If the variable is part of the Static TLS Block (i.e. the executable or an initially-loaded shared
186 LE is a specialization of IE. If the variable is not just part of the Static TLS Block, but is also
207 # Shared Objects with Static TLS
209 Shared objects are sometimes compiled with `-ftls-model=initial-exec` (i.e. "static TLS") for better
211 objects using static TLS can't be loaded with `dlopen` unless libc has reserved enough surplus
212 memory in the static TLS block. glibc reserves a kilobyte or two (`TLS_STATIC_SURPLUS`) with the
213 intent that only a few core system libraries would use static TLS. Non-core libraries also sometimes
216  * web search: [`"dlopen: cannot load any more object with static TLS"`][glibc-static-tls-error]
218 Neither musl nor the Bionic TLS prototype currently allocate any surplus TLS memory.
220 In general, supporting surplus TLS memory probably requires maintaining a thread list so that
221 `dlopen` can initialize the new static TLS memory in all existing threads. A thread list could be
222 omitted if the loader only allowed zero-initialized TLS segments and didn't reclaim memory on
228 …s-error]: https://www.google.com/search?q=%22dlopen:+cannot+load+any+more+object+with+static+TLS%22
230 # TLS Descriptors (TLSDESC)
232 The code fragments above match the "traditional" TLS design from Drepper's document. For the GD and
233 LD models, there is a newer, more efficient design that uses "TLS descriptors". Each TLS variable
261 The dynamic loader fills in the TLS descriptors. For a reference to a variable allocated in the
262 Static TLS Block, it can use a simple resolver function:
289    need to use an atomic or synchronized access of the global TLS generation counter.
317 The loader needs to allocate a table of `TlsDescDynamicArg` objects for each TLS module with dynamic
322 The traditional TLS design is implemented everywhere, but the TLSDESC design has less toolchain
333 The (static) linker frequently has more information about the location of a referenced TLS variable
334 than the compiler, so it can "relax" TLS accesses to more efficient models. For example, if an
336 to IE or LE. To relax a TLS access, the linker looks for an expected sequences of instructions and
364 arm32 linkers can't relax traditional TLS accesses. BFD can relax an arm32 TLSDESC access, but LLD
369 Calling `dlsym` on a TLS variable returns the address of the current thread's variable.
377 gdbserver. We will need to implement at least 2 APIs in `libthread_db.a` to find TLS variables, and
381  * Reference: [Currently unimplemented TLS functions in Android's libthread_tb][libthread_db.c]
388 LLDB more-or-less implemented Linux TLS debugging in [r192922][rL192922] ([D1944]) for x86 and
389 x86-64. [arm64 support came later][D5073]. However, the Linux TLS functionality no longer does
401 Both debuggers need metadata from the threading library (`libc.so` / `libpthread.so`) to find TLS
405 > Drepper's TLS ELF ABI document, so we can easily write code to decode it ourselves. The only
409 > libthread_db's algorithm itself. We thereby get cross-platform TLS lookup without either requiring
418 `_thread_db_dtv_t_pointer_val`    | Offset within a DTV slot to the pointer to the allocated TLS bl…
419 …_link_map_l_tls_modid` | Offset of a `link_map` field containing the module's 1-based TLS module ID
425  * Find the `link_map` object and module-relative offset for a TLS variable.
426  * Use `_thread_db_link_map_l_tls_modid` to find the TLS variable's module ID.
449 C/C++ TLS variables are declared with a specifier:
458 TLS (or emutls), so this design document mostly ignores it. Like emutls, ELF TLS variables either
462 efficient in C++ than `thread_local` when the compiler cannot see the definition of a declared TLS
468 ELF TLS isn't implemented on older Android platforms, so dynamic executables and shared objects
472 Static executables aren't a problem--the necessary runtime support is part of the executable, so TLS
476  * On arm32, x86, and x86_64, the loader [should reject a TLS relocation]. (XXX: I haven't verified
478  * On arm64, the primary TLS relocation (R_AARCH64_TLSDESC) is [confused with an obsolete
480  * Android P [added compatibility checks] for TLS symbols and `DT_TLSDESC_{GOT|PLT}` entries.
482 XXX: A dynamic executable using ELF TLS would have a PT_TLS segment and no other distinguishing
487 [should reject a TLS relocation]: https://android.googlesource.com/platform/bionic/+/android-8.1.0_…
494 There is an [ELF TLS prototype] uploaded on Gerrit. It implements:
495  * Static TLS Block allocation for static and dynamic executables
496  * TLS for dynamically loaded and unloaded modules (`__tls_get_addr`)
500  * `dlsym` of a TLS variable
503 [ELF TLS prototype]: https://android-review.googlesource.com/q/topic:%22elf-tls-prototype%22+(statu…
507 The loader exposes a list of TLS modules ([`struct TlsModules`][TlsModules]) to `libc.so` using the
510 iterates its module list to lazily allocate and free TLS blocks.
516 ## TLS Allocator
518 The prototype currently allocates a `pthread_internal_t` object and static TLS in a single mmap'ed
519 region, along with a thread's stack if it needs one allocated. It doesn't place TLS memory on a
526 There are three "entry points" to dynamically locate a TLS variable's address:
535 The prototype currently allows for arbitrarily-large TLS variable alignment. IIRC, different
536 implementations (glibc, musl, FreeBSD) vary in their level of respect for TLS alignment. It looks
553 The prototype lazily allocates TLS memory for dlopen'ed modules (see `__tls_get_addr`), and an
554 out-of-memory error on a TLS access aborts the process. musl, on the other hand, preallocates TLS
563 solib's TLS variables. Drepper makes this argument in his TLS document:
575 ## ELF TLS Not Usable in libc
577 The dynamic loader currently can't use ELF TLS, so any part of libc linked into the loader (i.e.
578 most of it) also can't use ELF TLS. It might be possible to lift this restriction, perhaps with
583 ## Bionic Memory Layout Conflicts with Common TLS Layout
585 Bionic already allocates thread-specific data in a way that conflicts with TLS variants 1 and 2:
586 ![Bionic TLS Layout in Android P](img/bionic-tls-layout-in-p.png)
588 TLS variant 1 allocates everything after the TP to ELF TLS (except the first two words), and variant
610    by GCC on x86 (and x86-64), where it is compatible with x86's TLS variant 2. We [modified Clang
635     * [On x86/x86-64 Darwin, Go uses a TLS slot reserved for both Go and Wine][go-darwin-x86] (On
641    into the host architecture. TLS accesses in the app solib (whether ELF TLS, Bionic slots, or
642    `pthread_internal_t` fields) become host accesses. Laying out TLS memory differently across
652 The TLS prototype currently uses a patched LLD that uses a variant 1 TLS layout with a 16-word TCB
655 Aside: gcc's arm64ilp32 target uses a 32-bit unsigned offset for a TLS IE access
657 variant 2 TLS, we might need to change the compiler to emit a sign-extending load.
673  * arm64: requires either subtle reinterpretation of a TLS relocation or addition of a new
675  * arm64: a new TLS relocation reduces compiler/assembler compatibility with non-Android
680  * When linking an executable, the static linker needs to know how TLS is allocated because it
723 `R_AARCH64_TLSLE_SUB_TPREL_HI12` relocation, and Clang would use a different TLS LE instruction
730    We might want to mark the binaries somehow to indicate the non-standard TLS ABI. Suggestion:
744 Pros: Minimal linker change, no change to TLS relocations.
750 TP-to-TLS-segment offset.
780 The layout conflict is apparently only a problem because an executable assumes that its TLS segment
782 shared object can still use the efficient IE access model, but its TLS segment offset is known at
784 LE, then the Bionic loader can place the executable's TLS segment at any offset from the TP, leaving
798  * LLD should abort if it sees a TLS LE relocation.
812 As a temporary compatibility hack, we might try to keep these programs running by reserving a TLS
822 ### Workaround for Go: place pthread keys after the executable's TLS
825 AOSP hikey960 build, only `/system/bin/netd` has a TLS segment, and it's only 32 bytes. As long as
826 `/system/bin/app_process{32,64}` limits its use of TLS memory, then the pthread keys could be
827 allocated after `app_process`' TLS segment, and Go will still find them.
830 keys (2 words per key), then `app_process` can use at most 108 words of TLS memory.
834 somewhere (a global variable, another TLS slot, ...). `__get_thread()` already uses a TLS slot to
845 XXX: Maybe a sanitizer would want to intercept allocations of TLS memory, and that could be hard if
853  * Ulrich Drepper's TLS document, ["ELF Handling For Thread-Local Storage."][drepper] Describes the
854    overall ELF TLS design and ABI details for x86 and x86-64 (as well as several other architectures
864    "Addendum: Thread Local Storage" has details for arm32 non-TLSDESC ELF TLS.
866  * ["ELF for the ARM® Architecture."][arm-elf] List TLS relocations (traditional and TLSDESC).
871  * ["ELF for the ARM® 64-bit Architecture (AArch64)."][arm64-elf] Lists TLS relocations (traditional
875 [tlsdesc-x86]: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
877 [tlsdesc-arm]: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt