TLS and shared library initialization
For my work on Rig, the ability for variables to be thread-local is of critical importance.
Both Pthreads in the POSIX world and Windows offer ways to allocate a thread-local key, from which to get/set
a pointer(-sized) value.
Using this you can easily make any piece of data thread-local, you'd just allocate it on the heap and store
the address in the thread-local key variable. This is usually called thread-local data or run-time TLS
(thread-local-storage).
But several compilers and operating systems support extensions to the C language, so that one can declare a
variable, with some restrictions, thread-local, and access it as one usually would, without the need to
use any specific library functions or API. This is what people normally refer to as TLS (or load-time TLS if
one wants to be more precise).
If TLS is available, it clearly is the preferred alternative, as it's much easier to use, doesn't need any
special kind of initialization, and is usually faster (due to possible compiler optimizations) or at the very
least as-fast as thread-local data.
Following is a table of which OSes support TLS, using which compilers (in their minimum version) and what
keyword is exactly needed.
OS | load-time TLS | supported compilers | run-time TLS |
---|---|---|---|
Windows | __declspec(thread) | MSVC 2005, ICC 9.0 | Tls{Alloc,Free} |
Linux | __thread | GCC 4.1.X, Clang 2.8, ICC 9.0, Solaris Studio 12 | pthread_key_{create,delete} |
FreeBSD | __thread | GCC 4.1.X, Clang 2.8 | pthread_key_{create,delete} |
MacOS X | None | None (unsupported) | pthread_key_{create,delete} |
Solaris | __thread | GCC 3.4.X, Solaris Studio 12 | pthread_key_{create,delete} |
OpenBSD | None | None (unsupported) | pthread_key_{create,delete} |
NetBSD | None | None (segfaults) | pthread_key_{create,delete} |
AIX | __thread | IBM XL C 11.X | pthread_key_{create,delete} |
GCC, ICC and Solaris Studio all support __thread, Clang does so as well.
IBM's XL C compiler on AIX supports __thread with the -qtls option.
On Windows, VC++ and ICC both support __declspec(thread).
Support is needed in both the compiler, the linker and the C library/thread library for this to correctly work.
Here a snippet of code, a few C defines that try to safely tell if TLS is available.
Another question that came up right alongside the availability of TLS was how to correctly have code executed
when a shared library is loaded (at load-time before main() is entered, at run-time when dlopen() or LoadLibrary()
are called) and unloaded (when returning from main(), or calling exit(), or at run-time when calling dlclose()
or FreeLibrary()), so that initialization and destruction code could be safely run, both for miscellaneous
purposes, such as more involved initialization of global data, and to correctly initialize thread-local keys
using the OS/thread library functions, in case TLS wasn't available. At first I used custom synchronization
code, based on atomic operations, to implicitly do this (by checking a shared variable that indicated if
initialization was already done on each call to something that required it), but this is error-prone, hits
performance, and leaves the question of cleanup at unload open. Another way to do it would be to explicitly
require the user to call an initialization routine before he uses any library functionality, and a destruction
one when he's finished, but that approach is tedious and error-prone too; so I figured there must be a better,
standard way to do this, it seemed like such an useful and common functionality requirement, that it would've
been strange that there was no useable solution out there...
The dlopen(3) man-page got me started, explaining that recent GCC's support the two function attributes
"constructor" and "destructor" to define initialization and destruction functions, which substitute the
old approach of having functions named __init and __fini. Using function attributes also enables you to define
multiple initialization and destruction functions. Furthermore, with GCC, you can specify a priority to control
the order of execution. Clang does not support this, and I couldn't find anything indicating Solaris Studio or
ICC to support it either, so it's probably better anyway to not depend on the calling order of different
constructor/destructor functions, and keep them independant from eachother.
Windows provides a similar mechanism using DllMain(), in which you can put code you want called at various
relevant events.
I also checked if cleanup functions registered with atexit() would be called together with the destructors,
while this is non-standard, it can be useful and is supported by a few, major C libraries.
The next table summarizes my findings on all this in an easily readable format.
OS | initialization (load&run-time) | destruction (load&run-time) | atexit() on process exit | atexit() on library unload |
---|---|---|---|---|
Windows | DllMain (DLL_PROCESS_ATTACH) |
DllMain (DLL_PROCESS_DETACH) |
Yes | Yes |
Linux | function __attribute__((constructor)) | function __attribute__((destructor)) | Yes | Yes (since glibc 2.2.3) |
FreeBSD | function __attribute__((constructor)) | function __attribute__((destructor)) | Yes | No |
MacOS X | function __attribute__((constructor)) | function __attribute__((destructor)) | Yes | No |
Solaris | function __attribute__((constructor)) | function __attribute__((destructor)) | Yes | Yes (since Solaris 8) |
OpenBSD | function __attribute__((constructor)) | function __attribute__((destructor)) | Yes | No |
NetBSD | function __attribute__((constructor)) | function __attribute__((destructor)) | Yes | No |
GCC, Clang and ICC directly support the __attribute__ syntax, Solaris Studio 12 does too (and seems to
translate it to the corresponding #pragma it supports), older versions or Sun Studio may only support #pragma
init() / #pragma fini() though.
This necessitates support from both the compiler and the linker to work, on all tested platforms this was
the case.