Running shell script through oozie

dreamer picture dreamer · May 9, 2013 · Viewed 14.7k times · Source

I'm trying to execute a shell script through oozie but I'm having some issues.

I have a property file like this (import.properties):

startIndex=2000
chunkSize=2000

The idea is, in every single execution the startIndex value will be updated by the chunk size. So if I execute it, it should have

startIndex=4000
chunkSize=2000

I have tested the script separately and it works fine. Here are my other related files.

job.properties

nameNode=hdfs://192.168.56.101:8020
jobTracker=192.168.56.101:50300
wfeRoot=wfe
queueName=default
EXEC=script.sh
propertyLoc=import.properties

oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${wfeRoot}/coordinator

workflow.xml

<workflow-app xmlns='uri:oozie:workflow:0.2' name='shell-wf'>
<start to='shell1' />
<action name='shell1'>
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <configuration>
            <property>
              <name>mapred.job.queue.name</name>
              <value>${queueName}</value>
            </property>
        </configuration>
        <exec>${EXEC}</exec>
     <file>${EXEC}#${EXEC}</file>
        <file>${propertyLoc}#${propertyLoc}</file>
    </shell>
    <ok to="end" />
    <error to="fail" />
</action>
<kill name="fail">
    <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name='end' />

script.sh

#!/bin/sh
file=import.properties
. $file

SCRIPT=$(readlink -f $file)
SCRIPTPATH=$(dirname $SCRIPT)
echo $SCRIPTPATH

newStartIndex=`expr $chunkSize + $startIndex`
newStartIndexStr=startIndex=$newStartIndex

oldStartIndexStr=startIndex=$startIndex
chunkSizeStr=chunkSize=$chunkSize

sed -i "s|$oldStartIndexStr|$newStartIndexStr|g" $file

And I put all these files inside my HDFS working directory:

[ambari_qa@sandbox coordinator]$ hadoop fs -lsr /user/ambari_qa/wfe/coordinator
-rw-rw-rw-   1 ambari_qa hdfs         32 2013-05-09 00:12    /user/ambari_qa/wfe/coordinator/import.properties
-rw-rw-rw-   1 ambari_qa hdfs        533 2013-05-09 01:19 /user/ambari_qa/wfe/coordinator/script.sh
-rw-------   1 ambari_qa hdfs        852 2013-05-09 00:50 /user/ambari_qa/wfe/coordinator/workflow.xml

I was expecting that the import.properties file will be changed after each execution. But I see it's not changing even though the oozie job is successful. For debugging purpose, I printed out the location of the file during execution and found out that it copied to another location (from log):

>>> Invoking Shell command line now >>

Stdoutput /hadoop/mapred/taskTracker/ambari_qa/distcache/-5756672768810005023_889271025_125659265/192.168.56.101/user/ambari_qa/wfe/coordinator
Stdoutput startIndex=4000
Stdoutput startIndex=2000
Exit code of the Shell command 0
<<< Invocation of Shell command completed <<<

What I need to do so that it effects the working directory of HDFS? Thanks in advance.

Update:

After changing the script based on Chris's suggestion, it becomes (last 3 lines):

hadoop fs -rm hdfs://ip-10-0-0-92:8020/user/ambari_qa/wfe/shell-oozie/$file
sed -i "s|$oldStartIndexStr|$newStartIndexStr|g" $file
hadoop fs -put $file /user/ambari_qa/wfe/shell-oozie

But then I started facing permission issue. I gave write permission on that file and folder.

[ambari_qa@ip-10-0-0-91 shell-oozie]$  hadoop fs -ls /user/ambari_qa/wfe/shell-oozie

Found 3 items:

-rw-rw-rw-   3 ambari_qa hdfs         32 2013-05-10 16:55 /user/ambari_qa/wfe/shell-oozie/import.properties
-rw-rw-rw-   3 ambari_qa hdfs        540 2013-05-10 16:48 /user/ambari_qa/wfe/shell-oozie/script.sh
-rw-rw-rw-   3 ambari_qa hdfs        826 2013-05-10 15:29 /user/ambari_qa/wfe/shell-oozie/workflow.xml

Here is the error log:

rm: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=EXECUTE, inode="ambari_qa":ambari_qa:hdfs:rwxrwx---
put: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=EXECUTE, inode="ambari_qa":ambari_qa:hdfs:rwxrwx---
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Answer

Chris White picture Chris White · May 10, 2013

sed is running on the local distributed cache version of the file - you'll need to pipe the output of sed back via the hadoop fs shell (remembering to delete the file before uploading), something like:

hadoop fs -rm /user/ambari_qa/wfe/coordinator/$file

sed "s|$oldStartIndexStr|$newStartIndexStr|g" $file \ 
    hadoop fs -put - /user/ambari_qa/wfe/coordinator/$file

There are probably ways you can find the coordinator path in hdfs rather than hard coding it into the script.

Update

The permission problem is because the oozie job is running as the mapred user, yet the file only has rwx permissions for the user ambari_qa and group hdfs

user=mapred, access=EXECUTE, inode="ambari_qa":ambari_qa:hdfs:rwxrwx---

I would either amend the file permissions on the file and parent folder such that the mapred user can delete / replace the file, or look into masquerading as a user that does have the correct permissions