How to generate fields of type String instead of CharSequence using Avro?

Shekhar picture Shekhar · Aug 4, 2014 · Viewed 14.4k times · Source

I wrote one Avro schema in which some of the fields ** need to be ** of type String but Avro has generated those fields of type CharSequence.

I am not able to find any way to tell Avro to make those fields of type String.

I tried to use

"fields": [
    {
        "name":"startTime",
        "type":"string",
        "avro.java.stringImpl":"String"
    },
    {
        "name":"endTime",
        "type":"string",
        "avro.java.string":"String"
    }
]

but for both the fields Avro is generating fields of type CharSequence.

Is there any other way to make those fields of type String?

Answer

Clément MATHIEU picture Clément MATHIEU · Aug 14, 2014

If you want all you string fields be instances of java.lang.String then you only have to configure the compiler:

java -jar /path/to/avro-tools-1.7.7.jar compile -string schema 

or if you are using the Maven plugin

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.7.7</version>
  <configuration>
    <stringType>String</stringType>
  </configuration>
  [...]
</plugin>        

If you want one specific field to be of type java.lang.String then... you can't. It is not supported by the compiler. You can use "java-class" with the reflect API but the compiler does not care.

If you want to learn more, you can set a breakpoint in SpecificCompiler line 372, Avro 1.7.7. You can see that before the call to addStringType() the schema have the required information in the props field. If you pass this schema to SpecificCompiler.javaType() then it will do what you want. But then addStringType replaces your schema by a static one. I will most likely ask the question on the mailing list since I don't see the point.