I was implementing a FIFO queue of requests instances (preallocated request objects for speed) and started with using the "synchronized" keyword on the add method. The method was quite short (check if room in fixed size buffer, then add value to array). Using visualVM it appeared the thread was blocking more often than I liked ("monitor" to be precise). So I converted the code over to use AtomicInteger values for things such as keeping track of the current size, then using compareAndSet() in while loops (as AtomicInteger does internally for methods such as incrementAndGet()). The code now looks quite a bit longer.
What I was wondering is what is the performance overhead of using synchronized and shorter code versus longer code without the synchronized keyword (so should never block on a lock).
Here is the old get method with the synchronized keyword:
public synchronized Request get()
{
if (head == tail)
{
return null;
}
Request r = requests[head];
head = (head + 1) % requests.length;
return r;
}
Here is the new get method without the synchronized keyword:
public Request get()
{
while (true)
{
int current = size.get();
if (current <= 0)
{
return null;
}
if (size.compareAndSet(current, current - 1))
{
break;
}
}
while (true)
{
int current = head.get();
int nextHead = (current + 1) % requests.length;
if (head.compareAndSet(current, nextHead))
{
return requests[current];
}
}
}
My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc), even though the code is shorter.
Thanks!
My guess was the synchronized keyword is worse because of the risk of blocking on the lock (potentially causing thread context switches etc)
Yes, in the common case you are right. Java Concurrency in Practice discusses this in section 15.3.2:
[...] at high contention levels locking tends to outperform atomic variables, but at more realistic contention levels atomic variables outperform locks. This is because a lock reacts to contention by suspending threads, reducing CPU usage and synchronization traffic on the shared memory bus. (This is similar to how blocking producers in a producer-consumer design reduces the load on consumers and thereby lets them catch up.) On the other hand, with atomic variables, contention management is pushed back to the calling class. Like most CAS-based algorithms,
AtomicPseudoRandom
reacts to contention by trying again immediately, which is usually the right approach but in a high-contention environment just creates more contention.Before we condemn
AtomicPseudoRandom
as poorly written or atomic variables as a poor choice compared to locks, we should realize that the level of contention in Figure 15.1 is unrealistically high: no real program does nothing but contend for a lock or atomic variable. In practice, atomics tend to scale better than locks because atomics deal more effectively with typical contention levels.The performance reversal between locks and atomics at differing levels of contention illustrates the strengths and weaknesses of each. With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance. (CAS-based algorithms also outperform lock-based ones on single-CPU systems, since a CAS always succeeds on a single-CPU system except in the unlikely case that a thread is preempted in the middle of the read-modify-write operation.)
(On the figures referred to by the text, Figure 15.1 shows that the performance of AtomicInteger and ReentrantLock is more or less equal when contention is high, while Figure 15.2 shows that under moderate contention the former outperforms the latter by a factor of 2-3.)
As others have noted, nonblocking algorithms, although potentially faster, are more complex, thus more difficult to get right. A hint from section 15.4 of JCiA:
Good nonblocking algorithms are known for many common data structures, including stacks, queues, priority queues, and hash tables, though designing new ones is a task best left to experts.
Nonblocking algorithms are considerably more complicated than their lock-based equivalents. The key to creating nonblocking algorithms is figuring out how to limit the scope of atomic changes to a single variable while maintaining data consistency. In linked collection classes such as queues, you can sometimes get away with expressing state transformations as changes to individual links and using an
AtomicReference
to represent each link that must be updated atomically.