Monit fails to start process

João Daniel picture João Daniel · May 13, 2013 · Viewed 14.2k times · Source

I've written a scrip that works fine to start and stop a server.

#!/bin/bash

PID_FILE='/var/run/rserve.pid'

start() {
       touch $PID_FILE
       eval "/usr/bin/R CMD Rserve"
       PID=$(ps aux | grep Rserve | grep -v grep | awk '{print $2}')
       echo "Starting Rserve with PID $PID"
       echo $PID > $PID_FILE
}
stop () {
       pkill Rserve
       rm $PID_FILE
       echo "Stopping Rserve"
}

case $1 in
    start)
       start
       ;;
    stop)  
       stop
       ;;
     *)  
       echo "usage: rserve {start|stop}" ;;
 esac
 exit 0

If I start it by running

rserve start

and then start monit it will correctly capture the PID and the server:

The Monit daemon 5.3.2 uptime: 0m 

Remote Host 'localhost'
  status                            Online with all services
  monitoring status                 Monitored
  port response time                0.000s to localhost:6311 [DEFAULT via TCP]
  data collected                    Mon, 13 May 2013 20:03:50

System 'system_gauss'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.37] [0.29] [0.25]
  cpu                               0.0%us 0.2%sy 0.0%wa
  memory usage                      524044 kB [25.6%]
  swap usage                        4848 kB [0.1%]
  data collected                    Mon, 13 May 2013 20:03:50

If I stop it, it will properly kill the process and unmonitor it. However if I start it again, it won't start the server again:

ps ax | grep Rserve | grep -vc grep
1
monit stop localhost
ps ax | grep Rserve | grep -vc grep
0
monit start localhost

[UTC May 13 20:07:24] info     : 'localhost' start on user request
[UTC May 13 20:07:24] info     : monit daemon at 4370 awakened
[UTC May 13 20:07:24] info     : Awakened by User defined signal 1
[UTC May 13 20:07:24] info     : 'localhost' start: /usr/bin/rserve
[UTC May 13 20:07:24] info     : 'localhost' start action done
[UTC May 13 20:07:34] error    : 'localhost' failed, cannot open a connection to INET[localhost:6311] via TCP

Here is the monitrc:

check host localhost with address 127.0.0.1
  start = "/usr/bin/rserve start"
  stop = "/usr/bin/rserve stop"
  if failed host localhost port 6311 type tcp with timeout 15 seconds for 5 cycles
    then restart

Answer

Green Su picture Green Su · Oct 25, 2013

I had problem start or stop process via shell too. One solution might be add "/bin/bash" in the config like this:

start program = "/bin/bash /urs/bin/rserv start"
stop program = "/bin/bash /urs/bin/rserv stop"

It worked for me.