Google Cloud Pub/Sub Retry Count

T.Okahara picture T.Okahara · Jul 26, 2016 · Viewed 7.5k times · Source

We are moving from an unstable messaging queue service to Google's Pub Sub in NodeJS. It seems to work well but we would like to include error handling.

We would like to limit the number of retries for a particular message, say 10 times in our test environment and 100 times in production. Now if a message fails 10 times (in test), instead of it sitting in our queue and continue to be processed and fail for 7 days we would like to move it to a separate error queue and send us an email.

We currently have all of this set up in our previous messaging queue but we have yet to find Google's Pub Sub retry count attribute for each message. Does anyone know if this exists?

We do use task queues in Google App Engine and they have everything we would need but Google's pub sub seems to be missing a lot. We do require any solution to be in Node.

Answer

Kamal Aboul-Hosn picture Kamal Aboul-Hosn · Jul 27, 2016

Update 04/21/2020: As of today, the dead letter queue feature for Cloud Pub/Sub has been released. This feature allows one to set the maximum number of times delivery of a message should be attempted and then to specify a topic to which to publish messages that were delivered more than that number of times. When enabled, the feature also exposes the number of delivery attempts as a field. For example, it is exposed at the deliveryAttempt property on the message passed into the subscriber callback in Node.js.

Previous answer

Cloud Pub/Sub does not have a limit to the number of times it will retry delivery of a message to a subscriber. If your subscriber never acknowledges the message within the ack deadline, it will be redelivered until the message expires 7 days later.

If you want to stop receiving these messages, then you will need to ack them at some point. If you want to protect against "messages of death" that cannot be processed by your subscribers, I recommend the following:

  1. Keep track of message failure counts in a database, keyed by message id. Hopefully, failures are not frequent, so this database should not be too large and queries to it will only be made when there is actually a failure.

  2. When a message fails, query the database and see how many failures have occurred before. Increment the counter and do not acknowledge the message if the count is below your threshold.

  3. If a message fails more times than your threshold, publish the message to a separate "failed messages" topic, send an email, and acknowledge the message.

  4. If necessary, have a means by which to publish messages from the "failed messages" topic back to your main topic when the problems that caused the message to fail in the first place have been remedied.

You now have the message saved in a separate topic (for 7 days or until you ack it) and the message won't be redelivered to the subscribers on your main topic.