How to fix Expected start-union. Got VALUE_NUMBER_INT when converting JSON to Avro on the command line?

Emre Sevinç picture Emre Sevinç · Dec 15, 2014 · Viewed 31.7k times · Source

I'm trying to validate a JSON file using an Avro schema and write the corresponding Avro file. First, I've defined the following Avro schema named user.avsc:

{"namespace": "example.avro",
 "type": "record",
 "name": "user",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}

Then created a user.json file:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}

And then tried to run:

java -jar ~/bin/avro-tools-1.7.7.jar fromjson --schema-file user.avsc user.json > user.avro

But I get the following exception:

Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT
    at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
    at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:290)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99)
    at org.apache.avro.tool.Main.run(Main.java:84)
    at org.apache.avro.tool.Main.main(Main.java:73)

Am I missing something? Why do I get "Expected start-union. Got VALUE_NUMBER_INT".

Answer

Emre Sevinç picture Emre Sevinç · Dec 16, 2014

According to the explanation by Doug Cutting,

Avro's JSON encoding requires that non-null union values be tagged with their intended type. This is because unions like ["bytes","string"] and ["int","long"] are ambiguous in JSON, the first are both encoded as JSON strings, while the second are both encoded as JSON numbers.

http://avro.apache.org/docs/current/spec.html#json_encoding

Thus your record must be encoded as:

{"name": "Alyssa", "favorite_number": {"int": 7}, "favorite_color": null}