Rig 2011/09/16

Second year at UZH looms on the horizon... Let's hope it's gonna be FUN! ;)

I haven't worked on Rig as much as I'd have liked this summer, but here's a quick summary of what did happen:

  • rig_str moved outside main source tree as its own module
  • further work went into testing
  • BS.txt's added, prototype configuration files for a new build-files generator, to maybe substitute CMake
  • improved support for Eclipse CDT 8.0 Static Analysis (awesome feature!)
  • fixed a bug in MurmurHash 3 (from upstream)
  • SMR using HPs was better name-spaced and the retire_mem function split up, to accommodate alternative SMR methods more easily
  • rig_mem_alloc_aligned() now supports additional flags; "RIG_MEM_ALLOC_ALIGN_PAD: pad allocation to next alignment boundary" was added (mostly for cache-line alignment to avoid false-sharing)
  • recursion was made optional for the MLock and the MRWLock, controlled by a flag. MRWLock now fully supports recursion, in both the read and write sides. This requires TLS-based owner checking to be performed, so it's not possible to disable that at compile-time for the MRWLock anymore. Non-recursive locks are the default now.

Next I'll be working on an implementation of Epoch-based SMR, which should offer better performance while traversing lists and ease of usage than HP-based SMR (as well as not having any patents I know of on it).

Posted by Luca Longinotti on 16 Sep 2011 at 14:37
Categories: Rig, C99 Comments



Rig 2011/05/02

This week's Rig status update:

  • MLOCK/MRWLOCK: added functions to discover if a lock is held or not, clarified documentation wrt recursive locking, and fixed a possible wrap-around bug in MLOCK and an incorrect error return in MRWLOCK
  • SMR: ported it to use MLOCK instead of its own micro-lock scheme
  • Changed rig_thread_id() to start at 1 for the main thread and not to use 0 at all. This aligns it with some OS implementations, leads to better performance, and fixes a bug in MLOCK, which uses the 0 ID to signal "no-thread-owns-me"
  • Changed default hash algorithm to MurmurHash 3
  • Added support for tests, using the great Check framework for C Unit-Testing, and using CMake's CTest to run them
  • Added optional testing support to the Gentoo ebuild (USE=test + FEATURES=test)

Posted by Luca Longinotti on 02 May 2011 at 08:00
Categories: Rig, C99 Comments



Micro-Lock, MurmurHash 3, ...

A quick summary of work done on Rig in the past weeks since the release of 0.4.0:

  • added Micro-Lock, a Mutual Exclusion Lock, based on CAS like MRWLock, with support for ownership checking and recursive locking
  • added MurmurHash 3, the new version of this well known hash algorithm, even faster than its predecessor, especially on older systems
  • Atomic_Ops: added support for the SPARCv9 architecture, as well as support for GCC intrinsics (though the membar situation still needs some work there), and further support for emulating certain atomic primitives using others (there's a graph of how this works)

Posted by Luca Longinotti on 26 Apr 2011 at 22:00
Categories: Rig, C99 Comments



Rig 0.4.0 released

Finally, after years of development, the first release of the Rig library is ready.
Rig started out as a safe strings C99 library, but grew to encompass lock-free data structures and other helpful, multiprocessor oriented features. A quick overview:

  • lock-free stack, queue and ordered list, all with the ability to iterate over them, O(1) size
  • memory is correctly reclaimed through the use of the SMR concept, working correctly in the absence of GC (Garbage Collection is not part of the C language)
  • hashing functions
  • atomic counter
  • thread abstraction (currently supporting POSIX Threads, Win32 Threads will follow)
  • Micro-ReadWrite Lock, a minimal RW-Lock, following the concept of Windows's Slim-RW-Locks, fully based on atomic operations, but also taking advantage of TLS to implement advanced features, like owner-checking
  • string/buffer functions (still incomplete, don't use those yet!)

Focus was set on modularity and reusability, meaning some parts of Rig are segregated and can be used by other projects too:

  • Atomic_Ops: atomic operations headers, providing atomic load, store, add, fetch-and-add, CAS (returning the old value or a boolean), swap and various memory barrier types. All operations expect an explicit memory barrier specification, forcing the programmer to think about them.
    A flag-pointer is also provided, an atomic pointer which uses its last bit to save a boolean flag.
    Modern x86-64 (SSE2 and newer) is the only currently supported architecture, others will follow as soon as I get access to hardware for testing (offers are gladly accepted!).
    This work was inspired by OpenPA.
  • SupportDS: support data structures, non-multi-threading-safe, currently there's a hybrid stack.
  • System Includes: a series of headers to parse the predefined macros on your system to discover things like which OS, which compiler, which libc and which architecture you're running on, to influence compilation, and define a few other features in an as portable way as possible, like TLS support or alignment specification.
    This work was inspired by predef.

Rig currently requires GCC and Cmake to compile, as well as the PThreads library.
It can be downloaded here, and following is its SHA256 checksum:
SHA256 0e95e7c643631f378b46c4b9c948f59b48927d5b249e7bb885623a7491ad45ba rig-0.4.0.tar.bz2

Posted by Luca Longinotti on 10 Apr 2011 at 21:20
Categories: Rig, C99 Comments



TLS and shared library initialization

For my work on Rig, the ability for variables to be thread-local is of critical importance.
Both Pthreads in the POSIX world and Windows offer ways to allocate a thread-local key, from which to get/set a pointer(-sized) value.
Using this you can easily make any piece of data thread-local, you'd just allocate it on the heap and store the address in the thread-local key variable. This is usually called thread-local data or run-time TLS (thread-local-storage).
But several compilers and operating systems support extensions to the C language, so that one can declare a variable, with some restrictions, thread-local, and access it as one usually would, without the need to use any specific library functions or API. This is what people normally refer to as TLS (or load-time TLS if one wants to be more precise).
If TLS is available, it clearly is the preferred alternative, as it's much easier to use, doesn't need any special kind of initialization, and is usually faster (due to possible compiler optimizations) or at the very least as-fast as thread-local data.
Following is a table of which OSes support TLS, using which compilers (in their minimum version) and what keyword is exactly needed.

OS load-time TLS supported compilers run-time TLS
Windows __declspec(thread) MSVC 2005, ICC 9.0 Tls{Alloc,Free}
Linux __thread GCC 4.1.X, Clang 2.8, ICC 9.0, Solaris Studio 12 pthread_key_{create,delete}
FreeBSD __thread GCC 4.1.X, Clang 2.8 pthread_key_{create,delete}
MacOS X None None (unsupported) pthread_key_{create,delete}
Solaris __thread GCC 3.4.X, Solaris Studio 12 pthread_key_{create,delete}
OpenBSD None None (unsupported) pthread_key_{create,delete}
NetBSD None None (segfaults) pthread_key_{create,delete}
AIX __thread IBM XL C 11.X pthread_key_{create,delete}

GCC, ICC and Solaris Studio all support __thread, Clang does so as well.
IBM's XL C compiler on AIX supports __thread with the -qtls option.
On Windows, VC++ and ICC both support __declspec(thread).
Support is needed in both the compiler, the linker and the C library/thread library for this to correctly work.
Here a snippet of code, a few C defines that try to safely tell if TLS is available.

Another question that came up right alongside the availability of TLS was how to correctly have code executed when a shared library is loaded (at load-time before main() is entered, at run-time when dlopen() or LoadLibrary() are called) and unloaded (when returning from main(), or calling exit(), or at run-time when calling dlclose() or FreeLibrary()), so that initialization and destruction code could be safely run, both for miscellaneous purposes, such as more involved initialization of global data, and to correctly initialize thread-local keys using the OS/thread library functions, in case TLS wasn't available. At first I used custom synchronization code, based on atomic operations, to implicitly do this (by checking a shared variable that indicated if initialization was already done on each call to something that required it), but this is error-prone, hits performance, and leaves the question of cleanup at unload open. Another way to do it would be to explicitly require the user to call an initialization routine before he uses any library functionality, and a destruction one when he's finished, but that approach is tedious and error-prone too; so I figured there must be a better, standard way to do this, it seemed like such an useful and common functionality requirement, that it would've been strange that there was no useable solution out there...
The dlopen(3) man-page got me started, explaining that recent GCC's support the two function attributes "constructor" and "destructor" to define initialization and destruction functions, which substitute the old approach of having functions named __init and __fini. Using function attributes also enables you to define multiple initialization and destruction functions. Furthermore, with GCC, you can specify a priority to control the order of execution. Clang does not support this, and I couldn't find anything indicating Solaris Studio or ICC to support it either, so it's probably better anyway to not depend on the calling order of different constructor/destructor functions, and keep them independant from eachother.
Windows provides a similar mechanism using DllMain(), in which you can put code you want called at various relevant events.
I also checked if cleanup functions registered with atexit() would be called together with the destructors, while this is non-standard, it can be useful and is supported by a few, major C libraries.
The next table summarizes my findings on all this in an easily readable format.

OS initialization (load&run-time) destruction (load&run-time) atexit() on process exit atexit() on library unload
Windows DllMain
(DLL_PROCESS_ATTACH)
DllMain
(DLL_PROCESS_DETACH)
Yes Yes
Linux function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since glibc 2.2.3)
FreeBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
MacOS X function __attribute__((constructor)) function __attribute__((destructor)) Yes No
Solaris function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since Solaris 8)
OpenBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
NetBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No

GCC, Clang and ICC directly support the __attribute__ syntax, Solaris Studio 12 does too (and seems to translate it to the corresponding #pragma it supports), older versions or Sun Studio may only support #pragma init() / #pragma fini() though.
This necessitates support from both the compiler and the linker to work, on all tested platforms this was the case.

Posted by Luca Longinotti on 24 Feb 2011 at 18:00
Categories: C99, Programming Comments



Am I main?, a tale of TIDs

During my work on Rig, which will also include thread abstraction, I stumbled upon the problem of getting some kind of ID to identify a running thread, I wanted to be able to do something akin to getpid() (or GetCurrentProcessId() for Windows), but for threads, not processes. Solving this on Windows was easy, the Unix world is another story.
Now, the Pthreads API doesn't offer this functionality, the closest is pthread_self(), which returns an opaque type pthread_t, which can't (safely) be used directly to differentiate between threads. Which means that to solve this, I needed to enter the world of non-portable, OS-specific functionality: one of the reasons I wanted to use VMs in my previous post was in fact to try this out.
After reading a lot of documentation and trying out a few things, it became clear that each OS had a different way of getting this information.
Coincidentally, the next day someone on StackOverflow asked an interesting question that turned out to be related: "How to determine if the current thread is the main one?", which I set out to answer. My answer already contains a good explanation of how to approach that problem, so I won't reiterate it here, and simply offer a helpful reference of my overall findings.

OS Thread ID Is thread main?
Windows tid = GetCurrentThreadId(); ???
Linux tid = syscall(SYS_gettid); tid == getpid()
FreeBSD long lwpid;
thr_self(&lwpid);
tid = lwpid;
pthread_main_np() != 0
MacOS X tid = pthread_mach_thread_np(pthread_self()); pthread_main_np() != 0
Solaris tid = pthread_self(); tid == 1
OpenBSD Not available. pthread_main_np() != 0
NetBSD tid = _lwp_self(); tid == 1

Posted by Luca Longinotti on 09 Feb 2011 at 17:00
Categories: C99, Programming Comments



Books!

A few computer & programming books suggestions:

  • C Programming: A Modern Approach, 2nd edition, by K.N. King - awesome C book, it really goes over the whole language logically, is updated to the C99 standard, and is awesome as a reference too, I consider this my bible
  • The Art of Multiprocessor Programming, 1st edition, by M. Herlihy & N. Shavit - all you ever wanted to know about parallel programming, including lock-free data structures, examples are in Java
  • Digital Design and Computer Architecture, 1st edition, by D. Harris and S.Harris - great book about how your processor is actually designed, what it's made up of, ...
  • Algorithms in a Nutshell: A Desktop Quick Reference, 1st edition, by G. Heineman, G. Pollice and S. Selkow - great reference book for most common algorithms
  • Thinking in Java, 4th edition, by Bruce Eckel - great Java programming book, covers Java 1.5/1.6, a free version of the 3rd edition is available
  • Effective Java: A Programming Language Guide, 2nd edition, by Joshua Bloch - tips & tricks to become a better Java developer, very recommended

And a few fantasy books suggestions, as we all like to relax from time to time and escape into slightly more adventurous and magical worlds:

  • The Lord of the Rings Trilogy, by J.R.R. Tolkien - seriously, if you've never read this, buy it immediately, it's one of the best books ever written
  • The Mistborn Trilogy, by Brandon Sanderson - awesome, totally new supernatural way to do things, kick-ass heroine, and surprising twists at every turn of the page
  • The Belgariad and The Mallorean, by David Eddings - 10 books total, a classic epic fantasy series, with magic, evil gods, a company of heroes, ...
  • The Dresden Files, by Jim Butcher - contemporary fantasy series, set in Chicago, lots of creatures populate our world here, currently book 12 is out, up to 11 in paperback

Posted by Luca Longinotti on 25 Sep 2010 at 19:05
Categories: Longi, Programming, C99 Comments



GCC's __attribute__ ((nonnull (...))) not helpful at all

Today I encountered some really interesting and disconcerting GCC behaviour...
I'm currently writing a library with some data structures in it, and basically have functions such as the following:

bool queue_get(QUEUE q, void *item);

where item is an address to allocated space on which to operate, this basically is the way you do generics in pure C, using void pointers to memory locations you copy from/store to, based on some size you know depending on your usage pattern.
Now, the first thing any security conscious (and sane) programmer would do is to check that both queue and item are not NULL and act accordingly, either returning false to the user, or in my case aborting completely as I'm of the opinion this is a serious error on the programmers part that he must fix, still I digress, the fact is: there should be some form of NULL check there, for example:

bool queue_get(QUEUE q, void *item) {
    if (q == NULL || item == NULL) {
        exit(EXIT_FAILURE);
    }
    ...
}

This has always worked beautifully. Now the other day I was reading the GCC manual on function attributes and stumbled upon attribute "nonnull", documented as such:

nonnull (arg-index, ...)
The nonnull attribute specifies that some function parameters should be non-null pointers. For instance, the declaration:

extern void *
my_memcpy (void *dest, const void *src, size_t len)
__attribute__ ((nonnull (1, 2)));

causes the compiler to check that, in calls to my_memcpy, arguments dest and src are non-null. If the compiler determines that a null pointer is passed in an argument slot marked as non-null, and the -Wnonnull option is enabled, a warning is issued. The compiler may also choose to make optimizations based on the knowledge that certain function arguments will not be null.

Note the bold sentence, we'll come back to it later on.
Now, I thought this looked quite useful and changed the prototypes of my functions to accordingly hint to the compiler when I expected a parameter to be nonnull, so for example I changed my queue_get prototype to look like this:

bool queue_get(QUEUE q, void *item) __attribute__ ((nonnull (1, 2)));

And I then started testing this throughly, and was surprised by what I found:
GCC's enforcing of this is more or less useless, it only emits a warning if you pass NULL directly as a parameter to the annotated function, meaning only an erroneous call of the form "queue_get(q, NULL);" would for example be detected, anything else, like "void *p = NULL; queue_get(q, p);" or "queue_get(q, myfunc());" where myfunc() may return NULL, are not detected. There is an enhancement bug open at GCC (bug 17308) to improve it to handle more cases, but it hasn't seen much activity.
Still, I thought even if it was only that one warning, annotating the function this way would surely help and also work as a form of documentation for clients later reading the librarie's header file, so no harm in leaving it there, because the explicit NULL checks inside the library's functions would anyway correctly catch any passed NULL pointer and abort, right? WRONG! COMPLETELY WRONG!
Let's return to that quote from the GCC docs and the last, seemingly innocent line there: "The compiler may also choose to make optimizations based on the knowledge that certain function arguments will not be null.", what this actually means is that at any meaningful optimization level, such as -O2, any NULL-pointer-checks against parameters of a function declared with the nonnull attribute will be simply optimized away silently, so suddenly calling the function while passing NULL during testing always led to a segmentation fault. Whoa, that's somewhat unexpected! And passing "-fno-delete-null-pointer-checks" didn't change anything.
Searching GCC's bugzilla I found bug 36166, which discussed the exact same thing I was observing, marked as RESO INVALID... Basically it boils down to the fact GCC does the detection for warnings and the optimization of code in separate parts, and they don't quite seem to agree, meaning that you only get a warning in one particular case of passing NULL, and even then only if the appropriate warning level and options are set, while it ALWAYS optimizes out NULL checks from the code, resulting in quite broken code. The GCC developers seem to regard this as correct, the reporter disagrees (as do I), comment 7 summarizes the core of the problem quite well, as well as both positions. RESO INVALID makes it quite clear who won the dispute, and in the end we get a mostly useless function attribute that silently completely breaks code, based on a somewhat unclear line of documentation, that was never updated to be clearer (and at least that would have been easy and helpful to do!).
Conclusion: all the __attribute__ nonnull's have been removed from my code again, and what looked like a great idea, completely failed in practice in my opinion.

Posted by Luca Longinotti on 19 Apr 2010 at 21:22
Categories: C99, Programming Comments




(Page 1 of 1)