I'm looking for a proper implementation of a work stealing queue in C/CPP. I've looked around Google but haven't found anything useful.
Perhaps someone is familiar with a good open-source implementation? (I prefer not to implement the pseudo-code taken from the original academic papers).
No free lunch.
Please take a look the original work stealing paper. This paper is hard to understand. I know that paper contains theoretical proof rather than pseudo code. However, there is simply no such much more simple version than TBB. If any, it won't give optimal performance. Work stealing itself incurs some amount of overhead, so optimizations and tricks are quite important. Especially, dequeues are must be thread-safe. Implementing highly scalable and low-overhead synchronizations are challenging.
I'm really wondering why you need it. I think that proper implementation means something like TBB and Cilk. Again, work stealing is hard to implement.