I'm trying to execute a shell script through oozie but I'm having some issues.
I have a property file like this (import.properties):
startIndex=2000
chunkSize=2000
The idea is, in every single execution the startIndex value will be updated by the chunk size. So if I execute it, it should have
startIndex=4000
chunkSize=2000
I have tested the script separately and it works fine. Here are my other related files.
job.properties
nameNode=hdfs://192.168.56.101:8020
jobTracker=192.168.56.101:50300
wfeRoot=wfe
queueName=default
EXEC=script.sh
propertyLoc=import.properties
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${wfeRoot}/coordinator
workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.2' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<file>${EXEC}#${EXEC}</file>
<file>${propertyLoc}#${propertyLoc}</file>
</shell>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />
script.sh
#!/bin/sh
file=import.properties
. $file
SCRIPT=$(readlink -f $file)
SCRIPTPATH=$(dirname $SCRIPT)
echo $SCRIPTPATH
newStartIndex=`expr $chunkSize + $startIndex`
newStartIndexStr=startIndex=$newStartIndex
oldStartIndexStr=startIndex=$startIndex
chunkSizeStr=chunkSize=$chunkSize
sed -i "s|$oldStartIndexStr|$newStartIndexStr|g" $file
And I put all these files inside my HDFS working directory:
[ambari_qa@sandbox coordinator]$ hadoop fs -lsr /user/ambari_qa/wfe/coordinator
-rw-rw-rw- 1 ambari_qa hdfs 32 2013-05-09 00:12 /user/ambari_qa/wfe/coordinator/import.properties
-rw-rw-rw- 1 ambari_qa hdfs 533 2013-05-09 01:19 /user/ambari_qa/wfe/coordinator/script.sh
-rw------- 1 ambari_qa hdfs 852 2013-05-09 00:50 /user/ambari_qa/wfe/coordinator/workflow.xml
I was expecting that the import.properties file will be changed after each execution. But I see it's not changing even though the oozie job is successful. For debugging purpose, I printed out the location of the file during execution and found out that it copied to another location (from log):
>>> Invoking Shell command line now >>
Stdoutput /hadoop/mapred/taskTracker/ambari_qa/distcache/-5756672768810005023_889271025_125659265/192.168.56.101/user/ambari_qa/wfe/coordinator
Stdoutput startIndex=4000
Stdoutput startIndex=2000
Exit code of the Shell command 0
<<< Invocation of Shell command completed <<<
What I need to do so that it effects the working directory of HDFS? Thanks in advance.
Update:
After changing the script based on Chris's suggestion, it becomes (last 3 lines):
hadoop fs -rm hdfs://ip-10-0-0-92:8020/user/ambari_qa/wfe/shell-oozie/$file
sed -i "s|$oldStartIndexStr|$newStartIndexStr|g" $file
hadoop fs -put $file /user/ambari_qa/wfe/shell-oozie
But then I started facing permission issue. I gave write permission on that file and folder.
[ambari_qa@ip-10-0-0-91 shell-oozie]$ hadoop fs -ls /user/ambari_qa/wfe/shell-oozie
Found 3 items:
-rw-rw-rw- 3 ambari_qa hdfs 32 2013-05-10 16:55 /user/ambari_qa/wfe/shell-oozie/import.properties
-rw-rw-rw- 3 ambari_qa hdfs 540 2013-05-10 16:48 /user/ambari_qa/wfe/shell-oozie/script.sh
-rw-rw-rw- 3 ambari_qa hdfs 826 2013-05-10 15:29 /user/ambari_qa/wfe/shell-oozie/workflow.xml
Here is the error log:
rm: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=EXECUTE, inode="ambari_qa":ambari_qa:hdfs:rwxrwx---
put: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=EXECUTE, inode="ambari_qa":ambari_qa:hdfs:rwxrwx---
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
sed is running on the local distributed cache version of the file - you'll need to pipe the output of sed back via the hadoop fs shell (remembering to delete the file before uploading), something like:
hadoop fs -rm /user/ambari_qa/wfe/coordinator/$file
sed "s|$oldStartIndexStr|$newStartIndexStr|g" $file \
hadoop fs -put - /user/ambari_qa/wfe/coordinator/$file
There are probably ways you can find the coordinator path in hdfs rather than hard coding it into the script.
Update
The permission problem is because the oozie job is running as the mapred user, yet the file only has rwx
permissions for the user ambari_qa
and group hdfs
user=mapred, access=EXECUTE, inode="ambari_qa":ambari_qa:hdfs:rwxrwx---
I would either amend the file permissions on the file and parent folder such that the mapred user can delete / replace the file, or look into masquerading as a user that does have the correct permissions