TLS and shared library initialization

For my work on Rig, the ability for variables to be thread-local is of critical importance.
Both Pthreads in the POSIX world and Windows offer ways to allocate a thread-local key, from which to get/set a pointer(-sized) value.
Using this you can easily make any piece of data thread-local, you'd just allocate it on the heap and store the address in the thread-local key variable. This is usually called thread-local data or run-time TLS (thread-local-storage).
But several compilers and operating systems support extensions to the C language, so that one can declare a variable, with some restrictions, thread-local, and access it as one usually would, without the need to use any specific library functions or API. This is what people normally refer to as TLS (or load-time TLS if one wants to be more precise).
If TLS is available, it clearly is the preferred alternative, as it's much easier to use, doesn't need any special kind of initialization, and is usually faster (due to possible compiler optimizations) or at the very least as-fast as thread-local data.
Following is a table of which OSes support TLS, using which compilers (in their minimum version) and what keyword is exactly needed.

OS load-time TLS supported compilers run-time TLS
Windows __declspec(thread) MSVC 2005, ICC 9.0 Tls{Alloc,Free}
Linux __thread GCC 4.1.X, Clang 2.8, ICC 9.0, Solaris Studio 12 pthread_key_{create,delete}
FreeBSD __thread GCC 4.1.X, Clang 2.8 pthread_key_{create,delete}
MacOS X None None (unsupported) pthread_key_{create,delete}
Solaris __thread GCC 3.4.X, Solaris Studio 12 pthread_key_{create,delete}
OpenBSD None None (unsupported) pthread_key_{create,delete}
NetBSD None None (segfaults) pthread_key_{create,delete}
AIX __thread IBM XL C 11.X pthread_key_{create,delete}

GCC, ICC and Solaris Studio all support __thread, Clang does so as well.
IBM's XL C compiler on AIX supports __thread with the -qtls option.
On Windows, VC++ and ICC both support __declspec(thread).
Support is needed in both the compiler, the linker and the C library/thread library for this to correctly work.
Here a snippet of code, a few C defines that try to safely tell if TLS is available.

Another question that came up right alongside the availability of TLS was how to correctly have code executed when a shared library is loaded (at load-time before main() is entered, at run-time when dlopen() or LoadLibrary() are called) and unloaded (when returning from main(), or calling exit(), or at run-time when calling dlclose() or FreeLibrary()), so that initialization and destruction code could be safely run, both for miscellaneous purposes, such as more involved initialization of global data, and to correctly initialize thread-local keys using the OS/thread library functions, in case TLS wasn't available. At first I used custom synchronization code, based on atomic operations, to implicitly do this (by checking a shared variable that indicated if initialization was already done on each call to something that required it), but this is error-prone, hits performance, and leaves the question of cleanup at unload open. Another way to do it would be to explicitly require the user to call an initialization routine before he uses any library functionality, and a destruction one when he's finished, but that approach is tedious and error-prone too; so I figured there must be a better, standard way to do this, it seemed like such an useful and common functionality requirement, that it would've been strange that there was no useable solution out there...
The dlopen(3) man-page got me started, explaining that recent GCC's support the two function attributes "constructor" and "destructor" to define initialization and destruction functions, which substitute the old approach of having functions named __init and __fini. Using function attributes also enables you to define multiple initialization and destruction functions. Furthermore, with GCC, you can specify a priority to control the order of execution. Clang does not support this, and I couldn't find anything indicating Solaris Studio or ICC to support it either, so it's probably better anyway to not depend on the calling order of different constructor/destructor functions, and keep them independant from eachother.
Windows provides a similar mechanism using DllMain(), in which you can put code you want called at various relevant events.
I also checked if cleanup functions registered with atexit() would be called together with the destructors, while this is non-standard, it can be useful and is supported by a few, major C libraries.
The next table summarizes my findings on all this in an easily readable format.

OS initialization (load&run-time) destruction (load&run-time) atexit() on process exit atexit() on library unload
Windows DllMain
Yes Yes
Linux function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since glibc 2.2.3)
FreeBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
MacOS X function __attribute__((constructor)) function __attribute__((destructor)) Yes No
Solaris function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since Solaris 8)
OpenBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
NetBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No

GCC, Clang and ICC directly support the __attribute__ syntax, Solaris Studio 12 does too (and seems to translate it to the corresponding #pragma it supports), older versions or Sun Studio may only support #pragma init() / #pragma fini() though.
This necessitates support from both the compiler and the linker to work, on all tested platforms this was the case.

Posted by Luca Longinotti on 24 Feb 2011 at 18:00
Categories: C99, Programming Comments

Am I main?, a tale of TIDs

During my work on Rig, which will also include thread abstraction, I stumbled upon the problem of getting some kind of ID to identify a running thread, I wanted to be able to do something akin to getpid() (or GetCurrentProcessId() for Windows), but for threads, not processes. Solving this on Windows was easy, the Unix world is another story.
Now, the Pthreads API doesn't offer this functionality, the closest is pthread_self(), which returns an opaque type pthread_t, which can't (safely) be used directly to differentiate between threads. Which means that to solve this, I needed to enter the world of non-portable, OS-specific functionality: one of the reasons I wanted to use VMs in my previous post was in fact to try this out.
After reading a lot of documentation and trying out a few things, it became clear that each OS had a different way of getting this information.
Coincidentally, the next day someone on StackOverflow asked an interesting question that turned out to be related: "How to determine if the current thread is the main one?", which I set out to answer. My answer already contains a good explanation of how to approach that problem, so I won't reiterate it here, and simply offer a helpful reference of my overall findings.

OS Thread ID Is thread main?
Windows tid = GetCurrentThreadId(); ???
Linux tid = syscall(SYS_gettid); tid == getpid()
FreeBSD long lwpid;
tid = lwpid;
pthread_main_np() != 0
MacOS X tid = pthread_mach_thread_np(pthread_self()); pthread_main_np() != 0
Solaris tid = pthread_self(); tid == 1
OpenBSD Not available. pthread_main_np() != 0
NetBSD tid = _lwp_self(); tid == 1

Posted by Luca Longinotti on 09 Feb 2011 at 17:00
Categories: C99, Programming Comments


A few computer & programming books suggestions:

  • C Programming: A Modern Approach, 2nd edition, by K.N. King - awesome C book, it really goes over the whole language logically, is updated to the C99 standard, and is awesome as a reference too, I consider this my bible
  • The Art of Multiprocessor Programming, 1st edition, by M. Herlihy & N. Shavit - all you ever wanted to know about parallel programming, including lock-free data structures, examples are in Java
  • Digital Design and Computer Architecture, 1st edition, by D. Harris and S.Harris - great book about how your processor is actually designed, what it's made up of, ...
  • Algorithms in a Nutshell: A Desktop Quick Reference, 1st edition, by G. Heineman, G. Pollice and S. Selkow - great reference book for most common algorithms
  • Thinking in Java, 4th edition, by Bruce Eckel - great Java programming book, covers Java 1.5/1.6, a free version of the 3rd edition is available
  • Effective Java: A Programming Language Guide, 2nd edition, by Joshua Bloch - tips & tricks to become a better Java developer, very recommended

And a few fantasy books suggestions, as we all like to relax from time to time and escape into slightly more adventurous and magical worlds:

  • The Lord of the Rings Trilogy, by J.R.R. Tolkien - seriously, if you've never read this, buy it immediately, it's one of the best books ever written
  • The Mistborn Trilogy, by Brandon Sanderson - awesome, totally new supernatural way to do things, kick-ass heroine, and surprising twists at every turn of the page
  • The Belgariad and The Mallorean, by David Eddings - 10 books total, a classic epic fantasy series, with magic, evil gods, a company of heroes, ...
  • The Dresden Files, by Jim Butcher - contemporary fantasy series, set in Chicago, lots of creatures populate our world here, currently book 12 is out, up to 11 in paperback

Posted by Luca Longinotti on 25 Sep 2010 at 19:05
Categories: Longi, Programming, C99 Comments

GCC's __attribute__ ((nonnull (...))) not helpful at all

Today I encountered some really interesting and disconcerting GCC behaviour...
I'm currently writing a library with some data structures in it, and basically have functions such as the following:

bool queue_get(QUEUE q, void *item);

where item is an address to allocated space on which to operate, this basically is the way you do generics in pure C, using void pointers to memory locations you copy from/store to, based on some size you know depending on your usage pattern.
Now, the first thing any security conscious (and sane) programmer would do is to check that both queue and item are not NULL and act accordingly, either returning false to the user, or in my case aborting completely as I'm of the opinion this is a serious error on the programmers part that he must fix, still I digress, the fact is: there should be some form of NULL check there, for example:

bool queue_get(QUEUE q, void *item) {
    if (q == NULL || item == NULL) {

This has always worked beautifully. Now the other day I was reading the GCC manual on function attributes and stumbled upon attribute "nonnull", documented as such:

nonnull (arg-index, ...)
The nonnull attribute specifies that some function parameters should be non-null pointers. For instance, the declaration:

extern void *
my_memcpy (void *dest, const void *src, size_t len)
__attribute__ ((nonnull (1, 2)));

causes the compiler to check that, in calls to my_memcpy, arguments dest and src are non-null. If the compiler determines that a null pointer is passed in an argument slot marked as non-null, and the -Wnonnull option is enabled, a warning is issued. The compiler may also choose to make optimizations based on the knowledge that certain function arguments will not be null.

Note the bold sentence, we'll come back to it later on.
Now, I thought this looked quite useful and changed the prototypes of my functions to accordingly hint to the compiler when I expected a parameter to be nonnull, so for example I changed my queue_get prototype to look like this:

bool queue_get(QUEUE q, void *item) __attribute__ ((nonnull (1, 2)));

And I then started testing this throughly, and was surprised by what I found:
GCC's enforcing of this is more or less useless, it only emits a warning if you pass NULL directly as a parameter to the annotated function, meaning only an erroneous call of the form "queue_get(q, NULL);" would for example be detected, anything else, like "void *p = NULL; queue_get(q, p);" or "queue_get(q, myfunc());" where myfunc() may return NULL, are not detected. There is an enhancement bug open at GCC (bug 17308) to improve it to handle more cases, but it hasn't seen much activity.
Still, I thought even if it was only that one warning, annotating the function this way would surely help and also work as a form of documentation for clients later reading the librarie's header file, so no harm in leaving it there, because the explicit NULL checks inside the library's functions would anyway correctly catch any passed NULL pointer and abort, right? WRONG! COMPLETELY WRONG!
Let's return to that quote from the GCC docs and the last, seemingly innocent line there: "The compiler may also choose to make optimizations based on the knowledge that certain function arguments will not be null.", what this actually means is that at any meaningful optimization level, such as -O2, any NULL-pointer-checks against parameters of a function declared with the nonnull attribute will be simply optimized away silently, so suddenly calling the function while passing NULL during testing always led to a segmentation fault. Whoa, that's somewhat unexpected! And passing "-fno-delete-null-pointer-checks" didn't change anything.
Searching GCC's bugzilla I found bug 36166, which discussed the exact same thing I was observing, marked as RESO INVALID... Basically it boils down to the fact GCC does the detection for warnings and the optimization of code in separate parts, and they don't quite seem to agree, meaning that you only get a warning in one particular case of passing NULL, and even then only if the appropriate warning level and options are set, while it ALWAYS optimizes out NULL checks from the code, resulting in quite broken code. The GCC developers seem to regard this as correct, the reporter disagrees (as do I), comment 7 summarizes the core of the problem quite well, as well as both positions. RESO INVALID makes it quite clear who won the dispute, and in the end we get a mostly useless function attribute that silently completely breaks code, based on a somewhat unclear line of documentation, that was never updated to be clearer (and at least that would have been easy and helpful to do!).
Conclusion: all the __attribute__ nonnull's have been removed from my code again, and what looked like a great idea, completely failed in practice in my opinion.

Posted by Luca Longinotti on 19 Apr 2010 at 21:22
Categories: C99, Programming Comments

(Page 1 of 1)