Avro schema definition nesting types

derdc picture derdc · Mar 26, 2015 · Viewed 19k times · Source

I am fairly new to Avro and going through documentation for nested types. I have the example below working nicely but many different types within the model will have addresses. Is it possible to define an address.avsc file and reference that as a nested type? If that is possible, can you also take it a step further and have a list of Addresses for a Customer? Thanks in advance.

{"namespace": "com.company.model",
  "type": "record",
  "name": "Customer",
  "fields": [
    {"name": "firstname", "type": "string"},
    {"name": "lastname", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "phone", "type": "string"},
    {"name": "address", "type":
      {"type": "record",
       "name": "AddressRecord",
       "fields": [
         {"name": "streetaddress", "type": "string"},
         {"name": "city", "type": "string"},
         {"name": "state", "type": "string"},
         {"name": "zip", "type": "string"}
       ]}
    }
  ]
}

Answer

Princey James picture Princey James · Apr 7, 2015

There are 4 possible ways:

  1. Including it in pom file as mentioned in this ticket.
  2. Declare all your types in a single avsc file.
  3. Using a single static parser that first parses all the imports and then parse the actual data types.
  4. (This is a hack) Use avdl file and use imports like https://avro.apache.org/docs/1.7.7/idl.html#imports . Though, IDL is intended for RPC calls.

Example for 2. Declare all your types in a single avsc file. Also answers array declaration on address.

[
{
    "type": "record",
    "namespace": "com.company.model",
    "name": "AddressRecord",
    "fields": [
        {
            "name": "streetaddress",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "zip",
            "type": "string"
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer",
    "fields": [
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "phone",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer2",
    "fields": [
        {
            "name": "x",
            "type": "string"
        },
        {
            "name": "y",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
}
]

Example for 3. Using a single static parser

Parser parser = new Parser(); // Make this static and reuse
parser.parse(<location of address.avsc file>);
parser.parse(<location of customer.avsc file>);
parser.parse(<location of customer2.avsc file>);

If we want a hold of the Schema, that is if we want to create new records, we can either do https://avro.apache.org/docs/1.5.4/api/java/org/apache/avro/Schema.Parser.html#getTypes() method to get the schema or

Parser parser = new Parser(); // Make this static and reuse
Schema addressSchema =parser.parse(<location of address.avsc file>);
Schema customerSchema=parser.parse(<location of customer.avsc file>);
Schema customer2Schema =parser.parse(<location of customer2.avsc file>);