llongi's blog

TLS and shared library initialization

For my work on Rig, the ability for variables to be thread-local is of critical importance.
Both Pthreads in the POSIX world and Windows offer ways to allocate a thread-local key, from which to get/set a pointer(-sized) value.
Using this you can easily make any piece of data thread-local, you'd just allocate it on the heap and store the address in the thread-local key variable. This is usually called thread-local data or run-time TLS (thread-local-storage).
But several compilers and operating systems support extensions to the C language, so that one can declare a variable, with some restrictions, thread-local, and access it as one usually would, without the need to use any specific library functions or API. This is what people normally refer to as TLS (or load-time TLS if one wants to be more precise).
If TLS is available, it clearly is the preferred alternative, as it's much easier to use, doesn't need any special kind of initialization, and is usually faster (due to possible compiler optimizations) or at the very least as-fast as thread-local data.
Following is a table of which OSes support TLS, using which compilers (in their minimum version) and what keyword is exactly needed.

OS	load-time TLS	supported compilers	run-time TLS
Windows	__declspec(thread)	MSVC 2005, ICC 9.0	Tls{Alloc,Free}
Linux	__thread	GCC 4.1.X, Clang 2.8, ICC 9.0, Solaris Studio 12	pthread_key_{create,delete}
FreeBSD	__thread	GCC 4.1.X, Clang 2.8	pthread_key_{create,delete}
MacOS X	None	None (unsupported)	pthread_key_{create,delete}
Solaris	__thread	GCC 3.4.X, Solaris Studio 12	pthread_key_{create,delete}
OpenBSD	None	None (unsupported)	pthread_key_{create,delete}
NetBSD	None	None (segfaults)	pthread_key_{create,delete}
AIX	__thread	IBM XL C 11.X	pthread_key_{create,delete}

GCC, ICC and Solaris Studio all support __thread, Clang does so as well.
IBM's XL C compiler on AIX supports __thread with the -qtls option.
On Windows, VC++ and ICC both support __declspec(thread).
Support is needed in both the compiler, the linker and the C library/thread library for this to correctly work.
Here a snippet of code, a few C defines that try to safely tell if TLS is available.

Another question that came up right alongside the availability of TLS was how to correctly have code executed when a shared library is loaded (at load-time before main() is entered, at run-time when dlopen() or LoadLibrary() are called) and unloaded (when returning from main(), or calling exit(), or at run-time when calling dlclose() or FreeLibrary()), so that initialization and destruction code could be safely run, both for miscellaneous purposes, such as more involved initialization of global data, and to correctly initialize thread-local keys using the OS/thread library functions, in case TLS wasn't available. At first I used custom synchronization code, based on atomic operations, to implicitly do this (by checking a shared variable that indicated if initialization was already done on each call to something that required it), but this is error-prone, hits performance, and leaves the question of cleanup at unload open. Another way to do it would be to explicitly require the user to call an initialization routine before he uses any library functionality, and a destruction one when he's finished, but that approach is tedious and error-prone too; so I figured there must be a better, standard way to do this, it seemed like such an useful and common functionality requirement, that it would've been strange that there was no useable solution out there...
The dlopen(3) man-page got me started, explaining that recent GCC's support the two function attributes "constructor" and "destructor" to define initialization and destruction functions, which substitute the old approach of having functions named __init and __fini. Using function attributes also enables you to define multiple initialization and destruction functions. Furthermore, with GCC, you can specify a priority to control the order of execution. Clang does not support this, and I couldn't find anything indicating Solaris Studio or ICC to support it either, so it's probably better anyway to not depend on the calling order of different constructor/destructor functions, and keep them independant from eachother.
Windows provides a similar mechanism using DllMain(), in which you can put code you want called at various relevant events.
I also checked if cleanup functions registered with atexit() would be called together with the destructors, while this is non-standard, it can be useful and is supported by a few, major C libraries.
The next table summarizes my findings on all this in an easily readable format.

OS	initialization (load&run-time)	destruction (load&run-time)	atexit() on process exit	atexit() on library unload
Windows	DllMain (DLL_PROCESS_ATTACH)	DllMain (DLL_PROCESS_DETACH)	Yes	Yes
Linux	function __attribute__((constructor))	function __attribute__((destructor))	Yes	Yes (since glibc 2.2.3)
FreeBSD	function __attribute__((constructor))	function __attribute__((destructor))	Yes	No
MacOS X	function __attribute__((constructor))	function __attribute__((destructor))	Yes	No
Solaris	function __attribute__((constructor))	function __attribute__((destructor))	Yes	Yes (since Solaris 8)
OpenBSD	function __attribute__((constructor))	function __attribute__((destructor))	Yes	No
NetBSD	function __attribute__((constructor))	function __attribute__((destructor))	Yes	No

GCC, Clang and ICC directly support the __attribute__ syntax, Solaris Studio 12 does too (and seems to translate it to the corresponding #pragma it supports), older versions or Sun Studio may only support #pragma init() / #pragma fini() though.
This necessitates support from both the compiler and the linker to work, on all tested platforms this was the case.

Posted by Luca Longinotti on 24 Feb 2011 at 18:00
Categories: C99, Programming Comments

llongi's blog

TLS and shared library initialization

Feeds

Search

Categories

Archives

Latest posts

Projects

Blogroll