Java 8 Stream: difference between limit() and skip()

Luigi Cortese picture Luigi Cortese · Sep 5, 2015 · Viewed 50.6k times · Source

Talking about Streams, when I execute this piece of code

public class Main {
    public static void main(String[] args) {
        Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .limit(3)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));
    }
}

I get this output

A1B1C1
A2B2C2
A3B3C3

because limiting my stream to the first three components forces actions A, B and C to be executed only three times.

Trying to perform an analogous computation on the last three elements by using skip() method, shows a different behaviour: this

public class Main {
    public static void main(String[] args) {
        Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .skip(6)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));
    }
}

outputs this

A1
A2
A3
A4
A5
A6
A7B7C7
A8B8C8
A9B9C9

Why, in this case, actions A1 to A6 are being executed? It must have something to do with the fact that limit is a short-circuiting stateful intermediate operation, while skip is not, but I don't understand practical implications of this property. Is it just that "every action before skip is executed while not everyone before limit is"?

Answer

RealSkeptic picture RealSkeptic · Sep 5, 2015

What you have here are two stream pipelines.

These stream pipelines each consist of a source, several intermediate operations, and a terminal operation.

But the intermediate operations are lazy. This means that nothing happens unless a downstream operation requires an item. When it does, then the intermediate operation does all it needs to produce the required item, and then again waits until another item is requested, and so on.

The terminal operations are usually "eager". That is, they ask for all the items in the stream that are needed for them to complete.

So you should really think of the pipeline as the forEach asking the stream behind it for the next item, and that stream asks the stream behind it, and so on, all the way to the source.

With that in mind, let's see what we have with your first pipeline:

Stream.of(1,2,3,4,5,6,7,8,9)
        .peek(x->System.out.print("\nA"+x))
        .limit(3)
        .peek(x->System.out.print("B"+x))
        .forEach(x->System.out.print("C"+x));

So, the forEach is asking for the first item. That means the "B" peek needs an item, and asks the limit output stream for it, which means limit will need to ask the "A" peek, which goes to the source. An item is given, and goes all the way up to the forEach, and you get your first line:

A1B1C1

The forEach asks for another item, then another. And each time, the request is propagated up the stream, and performed. But when forEach asks for the fourth item, when the request gets to the limit, it knows that it has already given all the items it is allowed to give.

Thus, it is not asking the "A" peek for another item. It immediately indicates that its items are exhausted, and thus, no more actions are performed and forEach terminates.

What happens in the second pipeline?

    Stream.of(1,2,3,4,5,6,7,8,9)
    .peek(x->System.out.print("\nA"+x))
    .skip(6)
    .peek(x->System.out.print("B"+x))
    .forEach(x->System.out.print("C"+x));

Again, forEach is asking for the first item. This is propagated back. But when it gets to the skip, it knows it has to ask for 6 items from its upstream before it can pass one downstream. So it makes a request upstream from the "A" peek, consumes it without passing it downstream, makes another request, and so on. So the "A" peek gets 6 requests for an item and produces 6 prints, but these items are not passed down.

A1
A2
A3
A4
A5
A6

On the 7th request made by skip, the item is passed down to the "B" peek and from it to the forEach, so the full print is done:

A7B7C7

Then it's just like before. The skip will now, whenever it gets a request, ask for an item upstream and pass it downstream, as it "knows" it has already done its skipping job. So the rest of the prints are going through the entire pipe, until the source is exhausted.