Micro-Lock, MurmurHash 3, ...

A quick summary of work done on Rig in the past weeks since the release of 0.4.0:

  • added Micro-Lock, a Mutual Exclusion Lock, based on CAS like MRWLock, with support for ownership checking and recursive locking
  • added MurmurHash 3, the new version of this well known hash algorithm, even faster than its predecessor, especially on older systems
  • Atomic_Ops: added support for the SPARCv9 architecture, as well as support for GCC intrinsics (though the membar situation still needs some work there), and further support for emulating certain atomic primitives using others (there's a graph of how this works)

Posted by Luca Longinotti on 26 Apr 2011 at 22:00
Categories: Rig, C99 Comments



Rig 0.4.0 released

Finally, after years of development, the first release of the Rig library is ready.
Rig started out as a safe strings C99 library, but grew to encompass lock-free data structures and other helpful, multiprocessor oriented features. A quick overview:

  • lock-free stack, queue and ordered list, all with the ability to iterate over them, O(1) size
  • memory is correctly reclaimed through the use of the SMR concept, working correctly in the absence of GC (Garbage Collection is not part of the C language)
  • hashing functions
  • atomic counter
  • thread abstraction (currently supporting POSIX Threads, Win32 Threads will follow)
  • Micro-ReadWrite Lock, a minimal RW-Lock, following the concept of Windows's Slim-RW-Locks, fully based on atomic operations, but also taking advantage of TLS to implement advanced features, like owner-checking
  • string/buffer functions (still incomplete, don't use those yet!)

Focus was set on modularity and reusability, meaning some parts of Rig are segregated and can be used by other projects too:

  • Atomic_Ops: atomic operations headers, providing atomic load, store, add, fetch-and-add, CAS (returning the old value or a boolean), swap and various memory barrier types. All operations expect an explicit memory barrier specification, forcing the programmer to think about them.
    A flag-pointer is also provided, an atomic pointer which uses its last bit to save a boolean flag.
    Modern x86-64 (SSE2 and newer) is the only currently supported architecture, others will follow as soon as I get access to hardware for testing (offers are gladly accepted!).
    This work was inspired by OpenPA.
  • SupportDS: support data structures, non-multi-threading-safe, currently there's a hybrid stack.
  • System Includes: a series of headers to parse the predefined macros on your system to discover things like which OS, which compiler, which libc and which architecture you're running on, to influence compilation, and define a few other features in an as portable way as possible, like TLS support or alignment specification.
    This work was inspired by predef.

Rig currently requires GCC and Cmake to compile, as well as the PThreads library.
It can be downloaded here, and following is its SHA256 checksum:
SHA256 0e95e7c643631f378b46c4b9c948f59b48927d5b249e7bb885623a7491ad45ba rig-0.4.0.tar.bz2

Posted by Luca Longinotti on 10 Apr 2011 at 21:20
Categories: Rig, C99 Comments



Within Temptation @ Zürich

The new Within Temptation album "The Unforgiving" is really awesome music, which just meant that I couldn't resist, now that they're finally giving a concert in Switzerland, to go!
The concert will be on the 18th of October at the Volkshaus in Zürich, get tickets while you can, it'll be worth it! ;) See ya there!

Posted by Luca Longinotti on 02 Apr 2011 at 14:06
Categories: Longi Comments



TLS and shared library initialization

For my work on Rig, the ability for variables to be thread-local is of critical importance.
Both Pthreads in the POSIX world and Windows offer ways to allocate a thread-local key, from which to get/set a pointer(-sized) value.
Using this you can easily make any piece of data thread-local, you'd just allocate it on the heap and store the address in the thread-local key variable. This is usually called thread-local data or run-time TLS (thread-local-storage).
But several compilers and operating systems support extensions to the C language, so that one can declare a variable, with some restrictions, thread-local, and access it as one usually would, without the need to use any specific library functions or API. This is what people normally refer to as TLS (or load-time TLS if one wants to be more precise).
If TLS is available, it clearly is the preferred alternative, as it's much easier to use, doesn't need any special kind of initialization, and is usually faster (due to possible compiler optimizations) or at the very least as-fast as thread-local data.
Following is a table of which OSes support TLS, using which compilers (in their minimum version) and what keyword is exactly needed.

OS load-time TLS supported compilers run-time TLS
Windows __declspec(thread) MSVC 2005, ICC 9.0 Tls{Alloc,Free}
Linux __thread GCC 4.1.X, Clang 2.8, ICC 9.0, Solaris Studio 12 pthread_key_{create,delete}
FreeBSD __thread GCC 4.1.X, Clang 2.8 pthread_key_{create,delete}
MacOS X None None (unsupported) pthread_key_{create,delete}
Solaris __thread GCC 3.4.X, Solaris Studio 12 pthread_key_{create,delete}
OpenBSD None None (unsupported) pthread_key_{create,delete}
NetBSD None None (segfaults) pthread_key_{create,delete}
AIX __thread IBM XL C 11.X pthread_key_{create,delete}

GCC, ICC and Solaris Studio all support __thread, Clang does so as well.
IBM's XL C compiler on AIX supports __thread with the -qtls option.
On Windows, VC++ and ICC both support __declspec(thread).
Support is needed in both the compiler, the linker and the C library/thread library for this to correctly work.
Here a snippet of code, a few C defines that try to safely tell if TLS is available.

Another question that came up right alongside the availability of TLS was how to correctly have code executed when a shared library is loaded (at load-time before main() is entered, at run-time when dlopen() or LoadLibrary() are called) and unloaded (when returning from main(), or calling exit(), or at run-time when calling dlclose() or FreeLibrary()), so that initialization and destruction code could be safely run, both for miscellaneous purposes, such as more involved initialization of global data, and to correctly initialize thread-local keys using the OS/thread library functions, in case TLS wasn't available. At first I used custom synchronization code, based on atomic operations, to implicitly do this (by checking a shared variable that indicated if initialization was already done on each call to something that required it), but this is error-prone, hits performance, and leaves the question of cleanup at unload open. Another way to do it would be to explicitly require the user to call an initialization routine before he uses any library functionality, and a destruction one when he's finished, but that approach is tedious and error-prone too; so I figured there must be a better, standard way to do this, it seemed like such an useful and common functionality requirement, that it would've been strange that there was no useable solution out there...
The dlopen(3) man-page got me started, explaining that recent GCC's support the two function attributes "constructor" and "destructor" to define initialization and destruction functions, which substitute the old approach of having functions named __init and __fini. Using function attributes also enables you to define multiple initialization and destruction functions. Furthermore, with GCC, you can specify a priority to control the order of execution. Clang does not support this, and I couldn't find anything indicating Solaris Studio or ICC to support it either, so it's probably better anyway to not depend on the calling order of different constructor/destructor functions, and keep them independant from eachother.
Windows provides a similar mechanism using DllMain(), in which you can put code you want called at various relevant events.
I also checked if cleanup functions registered with atexit() would be called together with the destructors, while this is non-standard, it can be useful and is supported by a few, major C libraries.
The next table summarizes my findings on all this in an easily readable format.

OS initialization (load&run-time) destruction (load&run-time) atexit() on process exit atexit() on library unload
Windows DllMain
(DLL_PROCESS_ATTACH)
DllMain
(DLL_PROCESS_DETACH)
Yes Yes
Linux function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since glibc 2.2.3)
FreeBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
MacOS X function __attribute__((constructor)) function __attribute__((destructor)) Yes No
Solaris function __attribute__((constructor)) function __attribute__((destructor)) Yes Yes (since Solaris 8)
OpenBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No
NetBSD function __attribute__((constructor)) function __attribute__((destructor)) Yes No

GCC, Clang and ICC directly support the __attribute__ syntax, Solaris Studio 12 does too (and seems to translate it to the corresponding #pragma it supports), older versions or Sun Studio may only support #pragma init() / #pragma fini() though.
This necessitates support from both the compiler and the linker to work, on all tested platforms this was the case.

Posted by Luca Longinotti on 24 Feb 2011 at 18:00
Categories: C99, Programming Comments



Interview with Linus Torvalds

Great interview with Linus Torvalds by ITWire I wanted to make sure to share.
Especially the non-IT-related questions give some very interesting insights into the man behind Linux.

Posted by Luca Longinotti on 10 Feb 2011 at 01:16
Categories: CompSci Comments



Am I main?, a tale of TIDs

During my work on Rig, which will also include thread abstraction, I stumbled upon the problem of getting some kind of ID to identify a running thread, I wanted to be able to do something akin to getpid() (or GetCurrentProcessId() for Windows), but for threads, not processes. Solving this on Windows was easy, the Unix world is another story.
Now, the Pthreads API doesn't offer this functionality, the closest is pthread_self(), which returns an opaque type pthread_t, which can't (safely) be used directly to differentiate between threads. Which means that to solve this, I needed to enter the world of non-portable, OS-specific functionality: one of the reasons I wanted to use VMs in my previous post was in fact to try this out.
After reading a lot of documentation and trying out a few things, it became clear that each OS had a different way of getting this information.
Coincidentally, the next day someone on StackOverflow asked an interesting question that turned out to be related: "How to determine if the current thread is the main one?", which I set out to answer. My answer already contains a good explanation of how to approach that problem, so I won't reiterate it here, and simply offer a helpful reference of my overall findings.

OS Thread ID Is thread main?
Windows tid = GetCurrentThreadId(); ???
Linux tid = syscall(SYS_gettid); tid == getpid()
FreeBSD long lwpid;
thr_self(&lwpid);
tid = lwpid;
pthread_main_np() != 0
MacOS X tid = pthread_mach_thread_np(pthread_self()); pthread_main_np() != 0
Solaris tid = pthread_self(); tid == 1
OpenBSD Not available. pthread_main_np() != 0
NetBSD tid = _lwp_self(); tid == 1

Posted by Luca Longinotti on 09 Feb 2011 at 17:00
Categories: C99, Programming Comments



KVM, slow IO and strange options

In my quest for portability, I wanted to test a few things on several operating systems, mostly BSDs and Sun Oracle Solaris.
Seeing as virtualization is the current hype, I decided to give Linux KVM a try, as it promised to be the more open solution, while requiring less effort to setup, which in my case, for a few dev-VMs to try stuff on, is kinda important, I don't want to spend hours maintaining this setup, but I also don't expect stellar performance to run heavy workloads on it.
Gentoo makes the installation quite easy, all you need is to enable KVM in your kernel and emerge app-emulation/qemu-kvm.

  • clearly the kernel needs to have KVM support enabled for your CPU, but I have all the VirtIO stuff disabled, I don't need it and I tried VirtIO-blk to speed-up IO performance, but didn't notice any difference, it doesn't probably do much when you only have 1-2, max. 3 VMs running at any time, with not that much going on in them, for development.
  • qemu-kvm, careful of the USE flags and the QEMU_*_TARGETS!

package.use entries:

media-libs/libsdl X audio video opengl xv
app-emulation/qemu-kvm aio sdl
# remember "alsa" if you use it, for both packages!

make.conf entries:

QEMU_SOFTMMU_TARGETS="arm i386 ppc ppc64 sparc sparc64 x86_64"
QEMU_USER_TARGETS="${QEMU_SOFTMMU_TARGETS}"

'aio' is important for native AsyncIO support and 'sdl' to get a window with your VM in it (unless you always want to use VNC to connect). Most people can also probably reduce QEMU_SOFTMMU_TARGETS to "i386 x86_64", but I wanted to keep the option to emulate some alternative architectures.
Once that's all done, KVM worked perfectly, and I started installing a Xubuntu image just to test it, but noticed that IO was incredibly slow, and set out to find out how to better its performance, I ended up with the following two Bash functions to install VMs from ISOs and start them, to get a somewhat usable performance. The options are explained below.

# KVM support
kvm-start() {
    /usr/bin/kvm -net nic,macaddr=random -net user -cpu host -smp 4 -m 768 -usb
    -usbdevice tablet -vga cirrus -drive file=$1,cache=writeback,aio=native
}
kvm-install() {
    /usr/bin/qemu-img create -f raw $1 6G
    /usr/bin/kvm -net nic,macaddr=random -net user -cpu host -smp 4 -m 768 -usb
    -usbdevice tablet -vga cirrus -drive file=$1,cache=writeback,aio=native
    -cdrom $2 -boot d
}
  • -drive's cache=writeback,aio=native are crucial for storage performance, while aio helped just a little, changing the cache mode to writeback massively improved IO performance! Also, raw disk images do perform better than qcow2!
  • -cpu host -smp 4 -m 768 passes along all available CPU features, and raising memory from the default 128 helps too.
  • -usb -usbdevice tablet was needed to fix the broken mouse (it just didn't react at all in my case!), it also makes it possible to drag the mouse off the screen of the VM and back without having to always CTRL+ALT, but this also kinda depends on the OS you're emulating.
  • -vga cirrus enables support for resolutions up to 1024x768 and has very good compatibility all around. You could use -vga vmware for Linux guests to get very high resolutions, but it doesn't work that well with other (especially older) operating systems.
  • -net nic,macaddr=random -net user is for the standard, software routed networking, documented as "slow", but more than fast enough for development work (of course not for some kind of high-traffic thousands-of-connections server). Remember to set a valid, random MAC address!

Posted by Luca Longinotti on 08 Feb 2011 at 17:40
Categories: Gentoo, Software Comments



Books for sale!

I've been cleaning out my library, making space for new, exciting books, and in the process found several books and materials, I got for ETHZ classes back then, or even now at UZH, but mostly never even opened ...
All are in a good state, sometimes some usage traces, very rarely a manual annotation graces the pages.
Books for sale are the following, mostly in English and German, a few in Italian, prices are negotiable:

Books:

  • Maurice Herlihy/Nir Shavit, "The Art of Multiprocessor Programming", 1. Edition (English, 15 CHF)
  • David Harris/Sarah Harris, "Digital Design and Computer Architecture", 1. Edition (English, 15 CHF)
  • David Kirk/Wen-mei Hwu, "Programming Massively Parallel Processors: A Hands-On Approach", 1. Edition (English, 30 CHF)
  • Mark Lutz/David Ascher, "Learning Python", 2. Edition (O'Reilly) (English, 10 CHF)
  • Larry Wall/Tom Christiansen/Jon Orwant, "Programming Perl", 3. Edition (O'Reilly) (English, 10 CHF)
  • Jennifer Robbins, "HTML & XHTML Pocket Reference", 3. Edition (O'Reilly) (English, 5 CHF)
  • Eric Meyer, "CSS Pocket Reference", 3. Edition (O'Reilly) (English, 5 CHF)
  • Lothar Papula, "Mathematik für Ingenieure und Naturwissenschaftler, Band 1", 11. Auflage (German, 30 CHF)
  • Lothar Papula, "Mathematik für Ingenieure und Naturwissenschaftler, Band 2", 11. Auflage (German, 30 CHF)
  • Lothar Papula, "Mathematik für Ingenieure und Naturwissenschaftler, Band 3", 5. Auflage (German, 30 CHF)
  • Klett Verlag, "Physikalische Formeln und Daten", 1. Auflage (German, 5 CHF)
  • Hansen/Neumann, "Wirtschaftsinformatik 1: Grundlagen und Anwendungen", 10. Auflage (German, 25 CHF)
  • Schreyögg/Koch, "Grundlagen des Managements: Basiswissen für Studium und Praxis", 2. Auflage (German, 25 CHF)
  • Howard Anton, "Lineare Algebra: Einführung, Grundlagen, Übungen", 1. Auflage (German, 20 CHF)
  • Tim Converse/Joyce Park/Clark Morgan, "PHP5 & MySQL: La Guida", Mc Graw Hill (Italian, 5 CHF)
  • Bergamaschini/Marazzini/Mazzoni, "L'indagine del mondo fisico", Volumi A-F (Italian, 30 CHF)
  • Giuseppe Ruffo, "Fisica per Moduli", Volume Unico (Italian, 15 CHF)
  • Amartya Sen, "Globalizzazione e Libertà", Mondadori (Italian, 5 CHF)

Materials:

  • KKarten, "BWL I - UniZH", HS 2011 (German, 15 CHF)
  • KKarten, "BWL II - UniZH", FS 2011 (German, 15 CHF)
  • Scherer, "BWL I - Grundlagen des Managements Script", HS 2010 (German, 5 CHF)
  • Wehrli, "BWL I - Einführung Marketing Script", 10. Auflage 2010 (German, 5 CHF)
  • Bernstein, "Informatik im Unternehmen/für Ökonomen I Script", HS 2010 (German, 5 CHF)

Posted by Luca Longinotti on 21 Jan 2011 at 18:02
Categories: Longi, UZH Comments



Nouveau ++ and HAL --

I finally did it: I tried out Nouveau, the open-source driver for Nvidia graphics cards, and everything went well, my dual head setup works as before, thanks also to XMonad, which is one of the few window-managers that implements virtual desktop management and multi-head setups the right way.
I've waited this long to be sure it all worked and got tested by lots of other people before me, as I simply can't have the main workstation not displaying anything and spend days getting stuff from Git repositories to try out fixes.
Needed a moment to get how XRandr wants the position of monitors specified in xorg.conf, but in the end everything worked out well, and I managed to also massively slim down my Xorg configuration.
So now I have a kernel with no proprietary drivers, and that also means I can finally build a monolithic hardened kernel, without any modules. Works great!
2.6.37 will also bring Temperature Sensors support to Nouveau from what I'm told, I'm waiting on that!
This also brings a fully hardened desktop a little bit closer, as every binary piece of software gone is a problem less there.

I also got fully rid of HAL, since it's being deprecated, and thanks to uam and pmount I can still mount/unmount USB drives, having only udev running, and I also don't need any of the Policy/Console/Udisk-Kit stuff, that I hope never to have to install.
And I'm taking Midori for a test-drive, looking for a good alternative browser to Firefox, maybe it will be, maybe it won't.

Posted by Luca Longinotti on 04 Jan 2011 at 17:29
Categories: Gentoo, Software Comments



Berlin: the aftermath

Got back from Berlin on Sunday evening, this time the train was only five minutes late, so I got on the connecting trains just fine.
Looking back at my stay in Berlin: 27C3 was great (read the other blog posts for a summary), the hotel we stayed at, Agon am Alexanderplatz, was great too, normal prices, clean, very spacey rooms, they were in fact old apartments converted to hotel rooms, so we had a big bedroom, bathroom, and a kitchen/living room area too, great to keep drinks chilled! I definitely recommend the place.
Food in Berlin was also usually great, be it from the BCC's catering, Las Malvinas (steak-house), or Piazza Rossa (pizzeria, Italian restaurant), or other places we went to, but I wanted to mention those two in particular as they were really good and relatively cheap, at least compared to our prices here in Switzerland.
On Friday we mostly relaxed at the hotel, seeing as the weather didn't really encourage much sightseeing, and then we went to the big New Year's Eve party at the Brandenburger Tor, which was really awesome, the Hermes House Band was great ("Country Roads, take me home, to the place, I belong, West Virginia, ..." ;) ). The fireworks weren't that impressive in my opinion, I expected much more from such an event, Lugano's are both bigger and last longer, and we're just a small swiss town!
I've uploaded a few photos I took with my new digital camera.
I spent most of Saturday in bed, as I managed to catch a cold, but that's getting better now, slowly. Watched a few of the talks I didn't see at CCC on the laptop. The official recordings are now being published, so if any talk interests you, download and watch it, it's worth it.

  • "Hackers and Computer Science" is really awesome, watch that one!
  • "Reverse Engineering a real-world RFID payment system" was very interesting too, as was
  • "Chip and PIN is Broken", both concerning the security of widely used payment systems.
  • "Rootkits and Trojans on Your SAP Landscape" was kinda scary, this one is probably very interesting for any Wirtschafts-Informatik student.
  • Finally managed to watch "Console Hacking 2010", which brings great news: it will soon be possible to install Linux on the PS3 again, and boot it directly.
  • "Data Recovery Techniques" was a very informative talk about storage media and how and when they can be recovered in practice.
  • "USB and libusb" gives a great overview of what USB is, how it works, and how to program it under Linux, really worth a viewing if you plan to do anything with USB.

Posted by Luca Longinotti on 04 Jan 2011 at 16:32
Categories: Longi, CCC Comments




<< Previous Page -- Next Page >> (Page 3 of 10)