I'm using the following code to create ParquetWriter and to write records to it.
ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);
final GenericRecord record = new GenericData.Record(avroSchema);
parquetWriter.write(record);
But it only allows to create new files(at the specfied path). Is there a way to append data to an existing parquet file (at path)? Caching parquetWriter is not feasible in my case.
There is a Spark API SaveMode called append: https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/SaveMode.html which I believe solves your problem.
Example of use:
df.write.mode('append').parquet('parquet_data_file')