How and When to Use @async and @sync in Julia

Michael Ohlrogge picture Michael Ohlrogge · May 18, 2016 · Viewed 8.8k times · Source

I have read the documentation for the @async and @sync macros but still cannot figure out how and when to use them, nor can I find many resources or examples for them elsewhere on the internet.

My immediate goal is to find a way to set several workers to do work in parallel and then wait until they have all finished to proceed in my code. This post: Waiting for a task to be completed on remote processor in Julia contains one successful way to accomplish this. I had thought it should be possible using the @async and @sync macros, but my initial failures to accomplish this made me wonder if I am understanding properly how and when to use these macros.

Answer

Michael Ohlrogge picture Michael Ohlrogge · May 18, 2016

According to the documentation under ?@async, "@async wraps an expression in a Task." What this means is that for whatever falls within its scope, Julia will start this task running but then proceed to whatever comes next in the script without waiting for the task to complete. Thus, for instance, without the macro you will get:

julia> @time sleep(2)
  2.005766 seconds (13 allocations: 624 bytes)

But with the macro, you get:

julia> @time @async sleep(2)
  0.000021 seconds (7 allocations: 657 bytes)
Task (waiting) @0x0000000112a65ba0

julia> 

Julia thus allows the script to proceed (and the @time macro to fully execute) without waiting for the task (in this case, sleeping for two seconds) to complete.

The @sync macro, by contrast, will "Wait until all dynamically-enclosed uses of @async, @spawn, @spawnat and @parallel are complete." (according to the documentation under ?@sync). Thus, we see:

julia> @time @sync @async sleep(2)
  2.002899 seconds (47 allocations: 2.986 KB)
Task (done) @0x0000000112bd2e00

In this simple example then, there is no point to including a single instance of @async and @sync together. But, where @sync can be useful is where you have @async applied to multiple operations that you wish to allow to all start at once without waiting for each to complete.

For example, suppose we have multiple workers and we'd like to start each of them working on a task simultaneously and then fetch the results from those tasks. An initial (but incorrect) attempt might be:

using Distributed
cell(N) = Vector{Any}(undef, N)

addprocs(2)
@time begin
    a = cell(nworkers())
    for (idx, pid) in enumerate(workers())
        a[idx] = remotecall_fetch(sleep, pid, 2)
    end
end
## 4.011576 seconds (177 allocations: 9.734 KB)

The problem here is that the loop waits for each remotecall_fetch() operation to finish, i.e. for each process to complete its work (in this case sleeping for 2 seconds) before continuing to start the next remotecall_fetch() operation. In terms of a practical situation, we're not getting the benefits of parallelism here, since our processes aren't doing their work (i.e. sleeping) simultaneously.

We can correct this, however, by using a combination of the @async and @sync macros:

@time begin
    a = cell(nworkers())
    @sync for (idx, pid) in enumerate(workers())
        @async a[idx] = remotecall_fetch(sleep, pid, 2)
    end
end
## 2.009416 seconds (274 allocations: 25.592 KB)

Now, if we count each step of the loop as a separate operation, we see that there are two separate operations preceded by the @async macro. The macro allows each of these to start up, and the code to continue (in this case to the next step of the loop) before each finishes. But, the use of the @sync macro, whose scope encompasses the whole loop, means that we won't allow the script to proceed past that loop until all of the operations preceded by @async have completed.

It is possible to get an even more clear understanding of the operation of these macros by further tweaking the above example to see how it changes under certain modifications. For instance, suppose we just have the @async without the @sync:

@time begin
    a = cell(nworkers())
    for (idx, pid) in enumerate(workers())
        println("sending work to $pid")
        @async a[idx] = remotecall_fetch(sleep, pid, 2)
    end
end
## 0.001429 seconds (27 allocations: 2.234 KB)

Here, the @async macro allows us to continue in our loop even before each remotecall_fetch() operation finishes executing. But, for better or worse, we have no @sync macro to prevent the code from continuing past this loop until all of the remotecall_fetch() operations finish.

Nevertheless, each remotecall_fetch() operation is still running in parallel, even once we go on. We can see that because if we wait for two seconds, then the array a, containing the results, will contain:

sleep(2)
julia> a
2-element Array{Any,1}:
 nothing
 nothing

(The "nothing" element is the result of a successful fetch of the results of the sleep function, which does not return any values)

We can also see that the two remotecall_fetch() operations start at essentially the same time because the print commands that precede them also execute in rapid succession (output from these commands not shown here). Contrast this with the next example where the print commands execute at a 2 second lag from each other:

If we put the @async macro on the whole loop (instead of just the inner step of it), then again our script will continue immediately without waiting for the remotecall_fetch() operations to finish. Now, however, we only allow for the script to continue past the loop as a whole. We don't allow each individual step of the loop to start before the previous one finished. As such, unlike in the example above, two seconds after the script proceeds after the loop, there is the results array still has one element as #undef indicating that the second remotecall_fetch() operation still has not completed.

@time begin
    a = cell(nworkers())
    @async for (idx, pid) in enumerate(workers())
        println("sending work to $pid")
        a[idx] = remotecall_fetch(sleep, pid, 2)
    end
end
# 0.001279 seconds (328 allocations: 21.354 KB)
# Task (waiting) @0x0000000115ec9120
## This also allows us to continue to

sleep(2)

a
2-element Array{Any,1}:
    nothing
 #undef    

And, not surprisingly, if we put the @sync and @async right next to each other, we get that each remotecall_fetch() runs sequentially (rather than simultaneously) but we don't continue in the code until each has finished. In other words, this would be, I believe, essentially the equivalent of if we had neither macro in place, just like sleep(2) behaves essentially identically to @sync @async sleep(2)

@time begin
    a = cell(nworkers())
    @sync @async for (idx, pid) in enumerate(workers())
        a[idx] = remotecall_fetch(sleep, pid, 2)
    end
end
# 4.019500 seconds (4.20 k allocations: 216.964 KB)
# Task (done) @0x0000000115e52a10

Note also that it is possible to have more complicated operations inside the scope of the @async macro. The documentation gives an example containing an entire loop within the scope of @async.

Update: Recall that the help for the sync macros states that it will "Wait until all dynamically-enclosed uses of @async, @spawn, @spawnat and @parallel are complete." For the purposes of what counts as "complete" it matters how you define the tasks within the scope of the @sync and @async macros. Consider the below example, which is a slight variation on one of the examples given above:

@time begin
    a = cell(nworkers())
    @sync for (idx, pid) in enumerate(workers())
        @async a[idx] = remotecall(sleep, pid, 2)
    end
end
## 0.172479 seconds (93.42 k allocations: 3.900 MB)

julia> a
2-element Array{Any,1}:
 RemoteRef{Channel{Any}}(2,1,3)
 RemoteRef{Channel{Any}}(3,1,4)

The earlier example took roughly 2 seconds to execute, indicating that the two tasks were run in parallel and that the script waiting for each to complete execution of their functions before proceeding. This example, however, has a much lower time evaluation. The reason is that for the purposes of @sync the remotecall() operation has "finished" once it has sent the worker the job to do. (Note that the resulting array, a, here, just contains RemoteRef object types, which just indicate that there is something going on with a particular process which could in theory be fetched at some point in the future). By contrast, the remotecall_fetch() operation has only "finished" when it gets the message from the worker that its task is complete.

Thus, if you are looking for ways to ensure that certain operations with workers have completed before moving on in your script (as for instance is discussed in this post: Waiting for a task to be completed on remote processor in Julia) it is necessary to think carefully about what counts as "complete" and how you will measure and then operationalize that in your script.