How do you extract first the schema and then the data from an avro file in Java? Identical to this question except in java.
I've seen examples of how to get the schema from an avsc file but not an avro file. What direction should I be looking in?
Schema schema = new Schema.Parser().parse(
new File("/home/Hadoop/Avro/schema/emp.avsc")
);
If you want know the schema of a Avro file without having to generate the corresponding classes or care about which class the file belongs to, you can use the GenericDatumReader
:
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("file.avro"), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema);
And then you can read the data inside the file:
GenericRecord record = null;
while (dataFileReader.hasNext()) {
record = dataFileReader.next(record);
System.out.println(record);
}