Oozie SSH Action

Kasa picture Kasa · Oct 9, 2013 · Viewed 8.5k times · Source

Oozie SSH Action Issue:

Issue: We are trying to run few commands on a particular host machine of our cluster. We chose SSH Action for the same. We have been facing this SSH issue for some time now. What might be the real issue here? Please point me towards the solution.

logs:

AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added host,1.2.3.4 (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

org.apache.oozie.action.ActionExecutorException: AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added 1.2.3.4,192.168.34.208 (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:589)
at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:204)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59)
at org.apache.oozie.command.XCommand.call(XCommand.java:277)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Caused by: java.io.IOException: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added '1.2.3.4,1.2.3.4' (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

at org.apache.oozie.action.ssh.SshActionExecutor.executeCommand(SshActionExecutor.java:340)
at org.apache.oozie.action.ssh.SshActionExecutor.setupRemote(SshActionExecutor.java:373)
at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:206)
at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:204)
at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:547)
... 10 more

2013-10-09 12:48:25,982 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[0000000-131008185935754-oozie-oozi-W@action1] Suspending Workflow Job id=0000000-131008185935754-oozie-oozi-W 2013-10-09 12:48:27,204 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[0000000-131008185935754-oozie-oozi-W@action1] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-10-09 12:59:57,477 INFO org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=0000000-131008185935754-oozie-oozi-W 2013-10-09 12:59:57,685 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-10-09 12:59:57,686 INFO org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=0000000-131008185935754-oozie-oozi-W 2013-10-09 13:41:32,654 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 0000000-131008185935754-oozie-oozi-W, Error Code: E0725 2013-10-09 13:41:45,199 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 0000000-131008185935754-oozie-oozi-W, Error Code: E0725 2013-10-09 13:42:04,869 WARN org.apache.oozie.command.wf.ResumeXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [workflow's status is KILLED is not SUSPENDED], Error Code: E1100 2013-10-09 13:45:56,357 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 0000000-131008185935754-oozie-oozi-W, Error Code: E0725

Approached tried:

  1. Password-less SSH set
  2. User proxies set
  3. Giving permissions to the required folders

Thanks;

Kasa.

Answer

quux00 picture quux00 · Oct 24, 2013

I just hit a similar problem. I had a case where I could run as USER:

ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/

by hand on the command line and it worked, but when launched via Oozie as USER it failed.

The reason, in my case, it failed is that I set up passwordless ssh between USER on the oozie server and USER on the remote machine. What one needs to do is set up passwordless ssh between oozie on the oozie server and USER on the remote machine. In other words, su to oozie on the oozie server and run the above command by hand. If it fails, it will fail in Oozie. If it works, then it should work in Oozie (assuming all else is correct, like dir permissions, etc.)

Take a look at what user your oozie server is running as:

ps -ef | grep oozie

Whatever user that is needs passwordless ssh to USER on the remote machine.