Let's say we have a dummy function:
async def foo(arg):
result = await some_remote_call(arg)
return result.upper()
What's the difference between:
import asyncio
coros = []
for i in range(5):
coros.append(foo(i))
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(coros))
And:
import asyncio
futures = []
for i in range(5):
futures.append(asyncio.ensure_future(foo(i)))
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(futures))
Note: The example returns a result, but this isn't the focus of the question. When return value matters, use gather()
instead of wait()
.
Regardless of return value, I'm looking for clarity on ensure_future()
. wait(coros)
and wait(futures)
both run the coroutines, so when and why should a coroutine be wrapped in ensure_future
?
Basically, what's the Right Way (tm) to run a bunch of non-blocking operations using Python 3.5's async
?
For extra credit, what if I want to batch the calls? For example, I need to call some_remote_call(...)
1000 times, but I don't want to crush the web server/database/etc with 1000 simultaneous connections. This is doable with a thread or process pool, but is there a way to do this with asyncio
?
2020 update (Python 3.7+): Don't use these snippets. Instead use:
import asyncio
async def do_something_async():
tasks = []
for i in range(5):
tasks.append(asyncio.create_task(foo(i)))
await asyncio.gather(*tasks)
def do_something():
asyncio.run(do_something_async)
Also consider using Trio, a robust 3rd party alternative to asyncio.
A coroutine is a generator function that can both yield values and accept values from the outside. The benefit of using a coroutine is that we can pause the execution of a function and resume it later. In case of a network operation, it makes sense to pause the execution of a function while we're waiting for the response. We can use the time to run some other functions.
A future is like the Promise
objects from Javascript. It is like a placeholder for a value that will be materialized in the future. In the above-mentioned case, while waiting on network I/O, a function can give us a container, a promise that it will fill the container with the value when the operation completes. We hold on to the future object and when it's fulfilled, we can call a method on it to retrieve the actual result.
Direct Answer: You don't need ensure_future
if you don't need the results. They are good if you need the results or retrieve exceptions occurred.
Extra Credits: I would choose run_in_executor
and pass an Executor
instance to control the number of max workers.
In the first example, you are using coroutines. The wait
function takes a bunch of coroutines and combines them together. So wait()
finishes when all the coroutines are exhausted (completed/finished returning all the values).
loop = get_event_loop() #
loop.run_until_complete(wait(coros))
The run_until_complete
method would make sure that the loop is alive until the execution is finished. Please notice how you are not getting the results of the async execution in this case.
In the second example, you are using the ensure_future
function to wrap a coroutine and return a Task
object which is a kind of Future
. The coroutine is scheduled to be executed in the main event loop when you call ensure_future
. The returned future/task object doesn't yet have a value but over time, when the network operations finish, the future object will hold the result of the operation.
from asyncio import ensure_future
futures = []
for i in range(5):
futures.append(ensure_future(foo(i)))
loop = get_event_loop()
loop.run_until_complete(wait(futures))
So in this example, we're doing the same thing except we're using futures instead of just using coroutines.
Let's look at an example of how to use asyncio/coroutines/futures:
import asyncio
async def slow_operation():
await asyncio.sleep(1)
return 'Future is done!'
def got_result(future):
print(future.result())
# We have result, so let's stop
loop.stop()
loop = asyncio.get_event_loop()
task = loop.create_task(slow_operation())
task.add_done_callback(got_result)
# We run forever
loop.run_forever()
Here, we have used the create_task
method on the loop
object. ensure_future
would schedule the task in the main event loop. This method enables us to schedule a coroutine on a loop we choose.
We also see the concept of adding a callback using the add_done_callback
method on the task object.
A Task
is done
when the coroutine returns a value, raises an exception or gets canceled. There are methods to check these incidents.
I have written some blog posts on these topics which might help:
Of course, you can find more details on the official manual: https://docs.python.org/3/library/asyncio.html