How to generate schema-less avro files using apache avro?

mintra picture mintra · Mar 2, 2015 · Viewed 9.1k times · Source

I am using Apache avro for data serialization. Since, the data has a fixed schema I do not want the schema to be a part of serialized data. In the following example, schema is a part of the avro file "users.avro".

User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
User user2 = new User("Ben", 7, "red");
User user3 = User.newBuilder()
         .setName("Charlie")
         .setFavoriteColor("blue")
         .setFavoriteNumber(null)
         .build();

// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User (userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();

Can anyone please tell me how to store avro-files without schema embedded in it?

Answer

Paolo Maresca picture Paolo Maresca · Nov 3, 2015

Here you find a comprehensive how to in which I explain how to achieve the schema-less serialization using Apache Avro. A companion test campaign shows up some figures on the performance that you might expect.

The code is on GitHub: example and test classes show up how to use the Data Reader and Writer with a Stub class generated by Avro itself.