Trouble with goroutines in a for loop

Ghost Bunnies picture Ghost Bunnies · Jul 9, 2016 · Viewed 7.1k times · Source

I am trying to solve this problem on Exercism:

Write a program that counts the frequency of letters in texts using parallel computation.

Basically, I have a FreqMap type:

type FreqMap map[rune]int

And a Frequency function:

func Frequency(s string) FreqMap {
    m := make(FreqMap)
    for _, v := range s {
        m[v]++
    }
    return m
}

Exercism provides an example implementing a concurrent version using recursion, but I would like to implement my own version using a for loop. I came up with the following solution, which does not work:

func ConcurrentFrequency(l []string) FreqMap {
    c := make(chan FreqMap)
    for i := 0; i < len(l); i++ {
        go func(i int) {
            c <- Frequency(l[i])
        }(i)
    }
    return <- c
}

This seems to return after only 1 iteration, c seems to contain the result of only 1 goroutine; I get the same result if I add a sync.WaitGroup.

Could you please explain what I am missing here?

Thank you in advance for your help!

Answer

ain picture ain · Jul 9, 2016

Your code seems to make only one iteration because ConcurrentFrequency returns the first value from the channel and thats it. I quess you want something like this:

func ConcurrentFrequency(l []string) chan FreqMap {
    c := make(chan FreqMap)
    go func() {
        var wg sync.WaitGroup
        wg.Add(len(l))
        for _, s := range l {
            go func(s string) {
                defer wg.Done()
                c <- Frequency(s)
            }(s)
        }
        wg.Wait()
        close(c)
    }()
    return c
}

Now it returns the channel of maps, these you probaly want to combine into sigle map:

func main() {
    m := make(FreqMap)
    for v := range ConcurrentFrequency([]string{"foo", "bar","zoo"}) {
        for k, v := range v {
            m[k] += v
        }
    }
    fmt.Println(m)
}

Longer explanation that doesn't fit into comment:

In the for _, s := range l loop all goroutines write into the same channel, but as that channel is not buffered, as soon as the first value is written into it, it is "full", meaning that no other values can be written into it. So only one goroutine in the loop can complete, and wg.Done is called only once. Thus if the source array has more than one string, the rest of the gorutines can't complete until something starts to consume values from the channel. But in your version it would be stuck in wg.Wait as not all goroutines are done and thus the ConcurrentFrequency can't return the channel to the consumer. In the way I wrote the ConcurrentFrequency, the cannel can be returned to the consumer and this (reading from the channel) enables the other Frequency(s) calls to write into channel.