I have an SQS queue that is constantly being populated by a data consumer and I am now trying to create the service that will pull this data from SQS using Python's boto.
The way I designed it is that I will have 10-20 threads all trying to read messages from the SQS queue and then doing what they have to do on the data (business logic), before going back to the queue to get the next batch of data once they're done. If there's no data they will just wait until some data is available.
I have two areas I'm not sure about with this design
Thanks
The long-polling capability of the receive_message()
method is the most efficient way to poll SQS. If that returns without any messages, I would recommend a short delay before retrying, especially if you have multiple readers. You may want to even do an incremental delay so that each subsequent empty read waits a bit longer, just so you don't end up getting throttled by AWS.
And yes, you do have to delete the message after you have read or it will reappear in the queue. This can actually be very useful in the case of a worker reading a message and then failing before it can fully process the message. In that case, it would be re-queued and read by another worker. You also want to make sure the invisibility timeout of the messages is set to be long enough the the worker has enough time to process the message before it automatically reappears on the queue. If necessary, your workers can adjust the timeout as they are processing if it is taking longer than expected.