I got a oozie workflow, running on a CDH4 cluster of 4 machines (one master-for-everything, three "dumb" workers). The hive metastore runs on the master using mysql (driver is present), the oozie server also runs on the master using mysql, too. Using the web interface I can import and query hive as expected, but when I do the same queries within an oozie workflow it fails. Even the addition of the "IF EXISTS" leads to the error below. I tried to add the connection information as properties to the hive job without any success.
Can anybody give me a hint? Did I miss anything? Any further information needed?
This is the output of the job's log:
Script [drop.sql] content:
------------------------
DROP TABLE IF EXISTS performance_log;
------------------------
Hive command arguments :
-f
drop.sql
=================================================================
>>> Invoking Hive command line now >>>
Intercepting System.exit(10001)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
Oozie Launcher failed, finishing Hadoop job gracefully
And this is the error message:
FAILED: SemanticException [Error 10001]: Table not found performance_log
Intercepting System.exit(10001)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
The problem is other nodes don't know where your MYSQL is , so you are getting error table not found.
You need to do 2 things
Something like below
action name="hive-node">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
This should work.
Thanks