I have previously registered a UDF with hive. It is permanent not TEMPORARY
. It works in beeline.
CREATE FUNCTION normaliseURL AS 'com.example.hive.udfs.NormaliseURL' USING JAR 'hdfs://udfs/hive-udfs.jar';
I have spark configured to use the hive metastore. The config is working as I can query hive tables. I can see the UDF;
In [9]: spark.sql('describe function normaliseURL').show(truncate=False)
+-------------------------------------------+
|function_desc |
+-------------------------------------------+
|Function: default.normaliseURL |
|Class: com.example.hive.udfs.NormaliseURL |
|Usage: N/A. |
+-------------------------------------------+
However I cannot use the UDF in a sql statement;
spark.sql('SELECT normaliseURL("value")')
AnalysisException: "Undefined function: 'default.normaliseURL'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7"
If I attempt to register the UDF with spark (bypassing the metastore) it fails to register it, suggesting that it does already exist.
In [12]: spark.sql("create function normaliseURL as 'com.example.hive.udfs.NormaliseURL'")
AnalysisException: "Function 'default.normaliseURL' already exists in database 'default';"
I'm using Spark 2.0, hive metastore 1.1.0. The UDF is scala, my spark driver code is python.
I'm stumped.
Issue is Spark 2.0 is not able to execute the functions whose JARs are located on HDFS.
Spark SQL: Thriftserver unable to run a registered Hive UDTF
One workaround is to define the function as a temporary function in Spark job with jar path pointing to a local edge-node path. Then call the function in same Spark job.
CREATE TEMPORARY FUNCTION functionName as 'com.test.HiveUDF' USING JAR '/user/home/dir1/functions.jar'