I am using spark 1.6 and I aim to create external hive table like what I do in hive script. To do this, I first read in the partitioned avro file and get the schema of this file. Now I stopped here, I get no idea how to apply this schema to my creating table. I use scala. Need help guys.
finally, I make it myself with old-fashioned way. With the help of code below:
val rawSchema = sqlContext.read.avro("Path").schema
val schemaString = rawSchema.fields.map(field => field.name.replaceAll("""^_""", "").concat(" ").concat(field.dataType.typeName match {
case "integer" => "int"
case smt => smt
})).mkString(",\n")
val ddl =
s"""
|Create external table $tablename ($schemaString) \n
|partitioned by (y int, m int, d int, hh int, mm int) \n
|Stored As Avro \n
|-- inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' \n
| -- outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' \n
| Location 'hdfs://$path'
""".stripMargin
take care no column name can start with _
and hive can't parse integer
. I would like to say that this way is not flexible but work. if anyone get better idea, plz comment.