Bash script to monitor process and sendmail if failed

JeremyA1 picture JeremyA1 · Jun 18, 2014 · Viewed 9.9k times · Source

I realize that I can’t reliably count on ps | grep or variants to accurately tell me what PID is started. However I know what I need for interim until this problem is resolved in the next release.

I have a process named Foo that is the parent, TEST1 and TEST2 are the child processes. If TEST1 and/or TEST2 dies off Foo will continue to run and will not respawn TEST1 and/or TEST2 which is needed to function properly. How do I know this because the program to restart TEST1 and/or TEST2 requires Foo to be restarted first.

So when I want to monitor a child process, if failed sendemail that it failed then restart the service and send another email that it is started again. I plan to run the script via CRON every 5 minutes.

The check works independently and so does the sendmail. The problem is when I create a if else statement. When TEST1 or TEST2 dies it still logs that it is running when it is not. Can someone help me on this please.

#!/bin/bash
#Check if process is running
VAL1=`/usr/ucb/ps aux | grep "[P]ROCESS TEST1" >/dev/null`
VAL2=`/usr/ucb/ps aux | grep "[P]ROCESS TEST2" >/dev/null`
if $VAL1 && $VAL2; then
echo "$(date) - $VAL1 & $VAL2 is Running" >> /var/tmp/Log.txt;
else
SUBJ="Process has stopped"
FROM="Server"
TO="[email protected]"
(
cat << !
To : ${TO}
From : ${FROM}
Subject : ${SUBJ}
!
cat << !
The $VAL1 and $VAL2 went down at $(date) please login to the server to restart
!
) | sendmail -v ${TO}
elseif
/usr/sbin/svcadm disable Foo;
wait 10;
/usr/sbin/svcadm enable Foo; 
fi

Answer

Tim Kennedy picture Tim Kennedy · Jun 18, 2014

So, one thing about your tests is that you're pushing the output to /dev/null, which means that VAL1 and VAL2 will always be empty.

Secondly, you don't need the elif. You have two basic conditions. Either things are running, or they are not. If anything is not running, send an email. You could do some additional testing to determine whether it's PROCESS TEST1 or PROCESS TEST2 that died, but that wouldn't strictly be necessary.

Here's how I might write a script to do the same thing.

#!/usr/bin/env bash

#Check if process is running
PID1=$(/usr/ucb/ps aux | grep "[P]ROCESS TEST1" | awk '{print $2}')
PID2=$(/usr/ucb/ps aux | grep "[P]ROCESS TEST2" | awk '{print $2}')

err=0

if [ "x$PID1" == "x" ]; then
        # PROCESS TEST1 died
        err=$(( err + 1 ))
else
        echo "$(date) - PROCESS TEST1 $VAL2 is Running" >> /var/tmp/Log.txt;
fi

if [ "x$PID2" == "x" ]; then
        # PROCESS TEST2 died
        err=$(( err + 2 ))
else
        echo "$(date) - PROCESS TEST2  is Running" >> /var/tmp/Log.txt;
fi

if (( $err > 0 )); then
        # identify which PROCESS TEST had the problem.
        if $(( err == 1 )); then
                condition="PROCESS TEST1 is down"
        elif (( $err == 2 )); then
                condition="PROCESS TEST2 is down"
        else
                condition="PROCESS TEST1 and PROCESS TEST2 are down"
        fi

        # let's send an email to get eyes on the issue, but we will restart the process after
        # we send the email.
        SUBJ="Process Error Detected"
        FROM="Server"
        TO="[email protected]"
        (
        cat <<-EOT
        To : ${TO}
        From : ${FROM}
        Subject : ${SUBJ}

        $condition at $(date) please login to the server to check that the processes were restarted successfully.

        EOT
        ) | sendmail -v ${TO}

        # we reached an error condition, and we sent mail
        # now let's restart the svc.
        /usr/sbin/svcadm restart Foo
fi