How to process SQS queue with lambda function (not via scheduled events)?

xtx picture xtx · Jan 8, 2016 · Viewed 45.8k times · Source

Here is the simplified scheme I am trying to make work:

http requests --> (Gateway API + lambda A) --> SQS --> (lambda B ?????) --> DynamoDB

So it should work as shown: data coming from many http requests (up to 500 per second, for example) is placed into SQS queue by my lambda function A. Then the other function, B, processes the queue: reads up to 10 items (on some periodical basis) and writes them to DynamoDB with BatchWriteItem.

The problem is that I can't figure out how to trigger the second lambda function. It should be called frequently, multiple times per second (or at least once per second), because I need all the data from the queue to get into DynamoDB ASAP (that's why calling lambda function B via scheduled events as described here is not a option)


Why don't I want to write directly into DynamoDB, without SQS?

That would be great for me to avoid using SQS at all. The problem that I am trying to address with SQS is DynamoDB throttling. Not even throttling itself but the way it is handled while writing data to DynamoDB with AWS SDK: when writing records one by one and getting them throttled, AWS SDK silently retries writing, resulting in increasing of the request processing time from the http client's point of view.

So I would like to temporarily store data in the queue, send response "200 OK" back to client, and then get queue processed by separate function, writing multiple records with one DynamoDB's BatchWriteItem call (which returns Unprocessed items instead of automatic retry in case of throttling). I would even prefer to lose some records instead of increasing the lag between a record being received and stored in DynamoDB

Answer

Eric Hammond picture Eric Hammond · Jan 9, 2016

[This doesn't directly answer your explicit question, so in my experience it will be downvoted :) However, I will answer the fundamental problem you are trying to solve.]

The way we take a flood of incoming requests and feed them to AWS Lambda functions for writing in a paced manner to DynamoDB is to replace SQS in the proposed architecture with Amazon Kinesis streams.

Kinesis streams can drive AWS Lambda functions.

Kinesis streams guarantee ordering of the delivered messages for any given key (nice for ordered database operations).

Kinesis streams let you specify how many AWS Lambda functions can be run in parallel (one per partition), which can be coordinated with your DynamoDB write capacity.

Kinesis streams can pass multiple available messages in one AWS Lambda function invocation, allowing for further optimization.

Note: It's really the AWS Lambda service that reads from Amazon Kinesis streams then invokes the function, and not Kinesis streams directly invoking AWS Lambda; but sometimes it's easier to visualize as Kinesis driving it. The result to the user is nearly the same.