TLS and shared library initialization

For my work on Rig, the ability for variables to be thread-local is of critical importance.
Both Pthreads in the POSIX world and Windows offer ways to allocate a thread-local key, from which to get/set a pointer(-sized) value.
Using this you can easily make any piece of data thread-local, you'd just allocate it on the heap and store the address in the thread-local key variable. This is usually called thread-local data or run-time TLS (thread-local-storage).
But several compilers and operating systems support extensions to the C language, so that one can declare a variable, with some restrictions, thread-local, and access it as one usually would, without the need to use any specific library functions or API. This is what people normally refer to as TLS (or load-time TLS if one wants to be more precise).
If TLS is available, it clearly is the preferred alternative, as it's much easier to use, doesn't need any special kind of initialization, and is usually faster (due to possible compiler optimizations) or at the very least as-fast as thread-local data.
Following is a table of which OSes support TLS, using which compilers (in their minimum version) and what keyword is exactly needed.

OS load-time TLS supported compilers run-time TLS
Windows __declspec(thread) MSVC 2005, ICC 9.0 Tls{Alloc,Free}
Linux __thread GCC 4.1.X, Clang 2.8, ICC 9.0, Solaris Studio 12 pthread_key_{create,delete}
FreeBSD __thread GCC 4.1.X, Clang 2.8 pthread_key_{create,delete}
MacOS X None None (unsupported) pthread_key_{create,delete}
Solaris __thread GCC 3.4.X, Solaris Studio 12 pthread_key_{create,delete}
OpenBSD None None (unsupported) pthread_key_{create,delete}
NetBSD None None (segfaults) pthread_key_{create,delete}
AIX __thread IBM XL C 11.X pthread_key_{create,delete}

GCC, ICC and Solaris Studio all support __thread, Clang does so as well.
IBM's XL C compiler on AIX supports __thread with the -qtls option.
On Windows, VC++ and ICC both support __declspec(thread).
Support is needed in both the compiler, the linker and the C library/thread library for this to correctly work.
Here a snippet of code, a few C defines that try to safely tell if TLS is available.

Another question that came up right alongside the availability of TLS was how to correctly have code executed when a shared library is loaded (at load-time before main() is entered, at run-time when dlopen() or LoadLibrary() are called) and unloaded (when returning from main(), or calling exit(), or at run-time when calling dlclose() or FreeLibrary()), so that initialization and destruction code could be safely run, both for miscellaneous purposes, such as more involved initialization of global data, and to correctly initialize thread-local keys using the OS/thread library functions, in case TLS wasn't available. At first I used custom synchronization code, based on atomic operations, to implicitly do this (by checking a shared variable that indicated if initialization was already done on each call to something that required it), but this is error-prone, hits performance, and leaves the question of cleanup at unload open. Another way to do it would be to explicitly require the user to call an initialization routine before he uses any library functionality, and a destruction one when he's finished, but that approach is tedious and error-prone too; so I figured there must be a better, standard way to do this, it seemed like such an useful and common functionality requirement, that it would've been strange that there was no useable solution out there...
The dlopen(3) man-page got me started, explaining that recent GCC's support the two function attributes "constructor" and "destructor" to define initialization and destruction functions, which substitute the old approach of having functions named __init and __fini. Using function attributes also enables you to define multiple initialization and destruction functions. Furthermore, with GCC, you can specify a priority to control the order of execution. Clang does not support this, and I couldn't find anything indicating Solaris Studio or ICC to support it either, so it's probably better anyway to not depend on the calling order of different constructor/destructor functions, and keep them independant from eachother.
Windows provides a similar mechanism using DllMain(), in which you can put code you want called at various relevant events.
I also checked if cleanup functions registered with atexit() would be called together with the destructors, while this is non-standard, it can be useful and is supported by a few, major C libraries.
The next table summarizes my findings on all this in an easily readable format.

OS initialization (load&run-time) destruction (load&run-time) atexit() on process exit atexit() on library unload
Windows DllMain
(DLL_PROCESS_ATTACH)
DllMain
(DLL_PROCESS_DETACH)
Yes Yes
Linux function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since glibc 2.2.3)
FreeBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
MacOS X function __attribute__((constructor)) function __attribute__((destructor)) Yes No
Solaris function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since Solaris 8)
OpenBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
NetBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No

GCC, Clang and ICC directly support the __attribute__ syntax, Solaris Studio 12 does too (and seems to translate it to the corresponding #pragma it supports), older versions or Sun Studio may only support #pragma init() / #pragma fini() though.
This necessitates support from both the compiler and the linker to work, on all tested platforms this was the case.

Posted by Luca Longinotti on 24 Feb 2011 at 18:00
Categories: C99, Programming Comments


blog comments powered by Disqus