I wrote code using Java 8 streams and parallel streams for the same functionality with a custom collector to perform an aggregation function.
When I see CPU usage using htop
, it shows all CPU cores being used for both 'streams' and 'parallel streams' version. So, it seems when list.stream()
is used, it also uses all CPUs. Here, what is the precise difference between parallelStream()
and stream()
in terms of usage of multi-core.
Consider the following program:
import java.util.ArrayList;
import java.util.List;
public class Foo {
public static void main(String... args) {
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 1000; i++) {
list.add(i);
}
list.stream().forEach(System.out::println);
}
}
You will notice that this program will output the numbers from 0 to 999 sequentially, in the order in which they are in the list. If we change stream()
to parallelStream()
this is not the case anymore (at least on my computer): all number are written, but in a different order. So, apparently, parallelStream()
indeed uses multiple threads.
The htop
is explained by the fact that even single-threaded applications are divided over mutliple cores by most modern operating systems (parts of the same thread may run on several cores, but of course not at the same time). So if you see that a process used more than one core, this does not mean necessarily that the program uses multiple threads.
Also the performance may not improve when using multiple threads. The cost of synchronization may nihilite the gains of using multiple threads. For simple testing scenarios this is often the case. For example, in the above example, System.out
is synchronized. So, effectively, only number can be written at the same time, although multiple threads are used.