Spring Batch - Read files from Aws S3

sve picture sve · Jun 14, 2015 · Viewed 10.6k times · Source

I am trying to read files from AWS S3 and process it with Spring Batch:

Can a Spring Itemreader process this Task? If so, How do I pass the credentials to S3 client and config my spring xml to read a file or multiple files

<bean id="itemReader" class=""org.springframework.batch.item.file.FlatFileItemReader"">
    <property name="resource" value=""${aws.file.name}"" />
    </bean>

Answer

mtoutcalt picture mtoutcalt · Jul 17, 2015

Update To use the Spring-cloud-AWS you would still use the FlatFileItemReader but now you don't need to make a custom extended Resource.

Instead you set up a aws-context and give it your S3Client bean.

    <aws-context:context-resource-loader amazon-s3="amazonS3Client"/>

The reader would be set up like any other reader - the only thing that's unique here is that you would now autowire your ResourceLoader

@Autowired
private ResourceLoader resourceLoader;

and then set that resourceloader:

@Bean
public FlatFileItemReader<Map<String, Object>> AwsItemReader() {
    FlatFileItemReader<Map<String, Object>> reader = new FlatFileItemReader<>();
    reader.setLineMapper(new JsonLineMapper());
    reader.setRecordSeparatorPolicy(new JsonRecordSeparatorPolicy());
    reader.setResource(resourceLoader.getResource("s3://" + amazonS3Bucket + "/" + file));
    return reader;
}

I would use the FlatFileItemReader and the customization that needs to take place is making your own S3 Resource object. Extend Spring's AbstractResource to create your own AWS resource that contains the AmazonS3 Client, bucket and file path info etc..

For the getInputStream use the Java SDK:

        S3Object object = s3Client.getObject(new GetObjectRequest(bucket, awsFilePath));
        return object.getObjectContent();

Then for contentLength -

return s3Client.getObjectMetadata(bucket, awsFilePath).getContentLength();

and lastModified use

.getLastModified().getTime();

The Resource you make will have the AmazonS3Client which contains all the info your spring-batch app needs to communicate with S3. Here's what it could look like with Java config.

    reader.setResource(new AmazonS3Resource(amazonS3Client, amazonS3Bucket, inputFile));