Example of the Scala aggregate function

christangrant picture christangrant · Aug 3, 2011 · Viewed 38.3k times · Source

I have been looking and I cannot find an example or discussion of the aggregate function in Scala that I can understand. It seems pretty powerful.

Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:

val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))

After applying aggregate:

Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))

Also, can you give example of parameters z, segop, and combop? I'm unclear on what these parameters do.

Answer

Daniel C. Sobral picture Daniel C. Sobral · Aug 4, 2011

Let's see if some ascii art doesn't help. Consider the type signature of aggregate:

def aggregate [B] (z: B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B

Also, note that A refers to the type of the collection. So, let's say we have 4 elements in this collection, then aggregate might work like this:

z   A   z   A   z   A   z   A
 \ /     \ /seqop\ /     \ /    
  B       B       B       B
    \   /  combop   \   /
      B _           _ B
         \ combop  /
              B

Let's see a practical example of that. Say I have a GenSeq("This", "is", "an", "example"), and I want to know how many characters there are in it. I can write the following:

Note the use of par in the below snippet of code. The second function passed to aggregate is what is called after the individual sequences are computed. Scala is only able to do this for sets that can be parallelized.

import scala.collection.GenSeq
val seq = GenSeq("This", "is", "an", "example")
val chars = seq.par.aggregate(0)(_ + _.length, _ + _)

So, first it would compute this:

0 + "This".length     // 4
0 + "is".length       // 2
0 + "an".length       // 2
0 + "example".length  // 7

What it does next cannot be predicted (there are more than one way of combining the results), but it might do this (like in the ascii art above):

4 + 2 // 6
2 + 7 // 9

At which point it concludes with

6 + 9 // 15

which gives the final result. Now, this is a bit similar in structure to foldLeft, but it has an additional function (B, B) => B, which fold doesn't have. This function, however, enables it to work in parallel!

Consider, for example, that each of the four computations initial computations are independent of each other, and can be done in parallel. The next two (resulting in 6 and 9) can be started once their computations on which they depend are finished, but these two can also run in parallel.

The 7 computations, parallelized as above, could take as little as the same time 3 serial computations.

Actually, with such a small collection the cost in synchronizing computation would be big enough to wipe out any gains. Furthermore, if you folded this, it would only take 4 computations total. Once your collections get larger, however, you start to see some real gains.

Consider, on the other hand, foldLeft. Because it doesn't have the additional function, it cannot parallelize any computation:

(((0 + "This".length) + "is".length) + "an".length) + "example".length

Each of the inner parenthesis must be computed before the outer one can proceed.