It is easy to set memory barriers on the kernel side: the macros mb, wmb, rmb, etc. are always in place thanks to the Linux kernel headers.
How to accomplish this on the user side?
You are looking for the full memory barrier atomic builtins of gcc.
Please note the detail on the reference i gave here says,
The [following] builtins are intended to be compatible with those described in the Intel Itanium Processor-specific Application Binary Interface, section 7.4. As such, they depart from the normal GCC practice of using the “__builtin_” prefix, and further that they are overloaded such that they work on multiple types.