I am trying to configure a Kinesis Analytics application with the following settings:
Later down the line, I will import the contents of the S3 bucket using Hive + JSONSERDE which expects each JSON record to live on its own line. The Firehose output just appends all of the JSON records which breaks JSONSERDE.
I could attach an AWS Lambda data formatter to the output stream but that seems expensive. All I want is to split each record using a newline.
If I was doing without an Analytics app I would append the newline to each Firehose record. It seems strange that there is no way to do that in the app's SQL:
CREATE OR REPLACE STREAM "STREAM_OUT" (
a VARCHAR(4),
b VARCHAR(4),
c VARCHAR(4)
);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "STREAM_OUT"
SELECT STREAM
"a",
"b",
"c"
FROM "SOURCE_SQL_STREAM_001";
Is the best answer to add the Lambda data formatter? I'd really like to avoid this.
I had a similar requirement to add new lines to the firehose generated files, In our application firehose is invoked via API Gateway.
This is specified in the Body Mapping Templates under Integration Request section.
The following command in the API Gateway generates new lines to the kinesis firehose records.
Method 1 :
#set($payload="$input.path('$.Record.Data')
")
{
"DeliveryStreamName": "$input.path('$.DeliveryStreamName')",
"Record": {
"Data": "$util.base64Encode($payload)"
}
}
This works perfectly if you are invoking firehose via API Gateway.
Thanks & Regards, Srivignesh KN