We have been trying to create a simple Hive UDF to mask some fields in a Hive Table. We are using an external file (placed on HDFS) to grab a piece of text to make a salting to the masking process. It seems we are doing everything ok but when we tried to create the external function it throws the error:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
This is our code for the UDF:
package co.company;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.commons.codec.digest.DigestUtils;
@Description(
name = "masker",
value = "_FUNC_(str) - mask a string",
extended = "Example: \n" +
" SELECT masker(column) FROM hive_table; "
)
public class Mask extends UDF {
private static final String arch_clave = "/user/username/filename.dat";
private static String clave = null;
public static String getFirstLine( String arch ) {
try {
FileSystem fs = FileSystem.get(new Configuration());
FSDataInputStream in = fs.open(new Path(arch));
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String ret = br.readLine();
br.close();
return ret;
} catch (Exception e) {
System.out.println("out: Error Message: " + arch + " exc: " + e.getMessage());
return null;
}
}
public Text evaluate(Text s) {
clave = getFirstLine( arch_clave );
Text to_value = new Text( DigestUtils.shaHex( s + clave) );
return to_value;
}
}
We are uploading the jar file and creating the UDF through HUE's interface (Sadly, we don't have yet console access to the Hadoop cluster.
On Hue's Hive Interface, our commands are:
add jar hdfs:///user/my_username/myJar.jar
And then to create the Function we execute:
CREATE TEMPORARY FUNCTION masker as 'co.company.Mask';
Sadly the error thrown when we tried to create the UDF is not very helpful. This is the log for the creation of the UDF. Any Help is greatly appreciated. Thank you very much.
This issue was solved but it wasn't related to the code. The code above is fine to read a file in HDFS from a HIVE UDF (Awufully inneficient because it reads the file each time the evaluation function is called, buth it manages to read the file).
It turns out that When creating a Hive UDF through HUE, you upload the jar and then you create the function. However, if you changed your function and reuploaded the jar, it still maintained the previous definition of the function.
We defined the same UDF class in another packagein the jar, droped the original function in HIVE and created again the function (with the new class) through HUE:
add jar hdfs:///user/my_username/myJar2.jar;
drop function if exists masker;
create temporary function masker as 'co.company.otherpackage.Mask';
It seems a bug report is needed for HIVE (or HUE?, Thrift?), I still need to understand better which part of the system is at fault.
I hope it helps someone in the future.