How to extract schema from an avro file in Java

mba12 picture mba12 · Aug 4, 2017 · Viewed 16.4k times · Source

How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java.

I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking in?

Schema schema = new Schema.Parser().parse(
    new File("/home/Hadoop/Avro/schema/emp.avsc")
);

Answer

Helder Pereira picture Helder Pereira · Aug 12, 2017

If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader:

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);

And then you can read the data inside the file:

GenericRecord record = null;
while (dataFileReader.hasNext()) {
    record = dataFileReader.next(record);
    System.out.println(record);
}