According to this question on nesting Avro schemas, the right way to nest a record schema is as follows:
{
"name": "person",
"type": "record",
"fields": [
{"name": "firstname", "type": "string"},
{"name": "lastname", "type": "string"},
{
"name": "address",
"type": {
"type" : "record",
"name" : "AddressUSRecord",
"fields" : [
{"name": "streetaddress", "type": "string"},
{"name": "city", "type": "string"}
]
},
}
]
}
I don't like giving the field the name address
and having to give a different name (AddressUSRecord
) to the field's schema. Can I give the field and schema the same name, address
?
What if I want to use the AddressUSRecord
schema in multiple other schemas, not just person
? If I want to use AddressUSRecord
in another schema, let's say business
, do I have to name it something else?
Ideally, I'd like to define AddressUSRecord
in a separate schema, then let the type of address
reference AddressUSRecord
. However, it's not clear that Avro 1.8.1 supports this out-of-the-box. This 2014 article shows that sub-schemas need to be handled with custom code. What the best way to define reusable schemas in Avro 1.8.1?
Note: I'd like a solution that works with Confluent Inc.'s Schema Registry. There's a Google Groups thread that seems to suggest that Schema Registry does not play nice with schema references.
Can I give the field and schema the same name, address?
Yes, you can name the record with the same name as the field name.
What if I want to use the AddressUSRecord schema in multiple other schemas, not just person?
You can use multiple schemas using a couple of techniques: the avro schema parser clients (JVM and others) allow you to specify multiple schemas, usually through the names
parameter (the Java Schema$Parser/parse
method allows multiple schema String
arguments).
You can then specify dependant Schemas as a named type:
{
"type": "record",
"name": "Address",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
}
]
}
And run this through the parser before the parent schema:
{
"name": "person",
"type": "record",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "address",
"type": "Address"
}
]
}
Incidentally, this allows you to parse from separate files.
Alternatively, you can also parse a single Union schema that references schemas in the same way:
[
{
"type": "record",
"name": "Address",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
}
]
},
{
"type": "record",
"name": "person",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "address",
"type": "Address"
}
]
}
]
I'd like a solution that works with Confluent Inc.'s Schema Registry.
The schema registry does not support parsing schemas separately, but it does support the latter example of parsing into a union type.