I realize that I can’t reliably count on ps | grep or variants to accurately tell me what PID is started. However I know what I need for interim until this problem is resolved in the next release.
I have a process named Foo that is the parent, TEST1 and TEST2 are the child processes. If TEST1 and/or TEST2 dies off Foo will continue to run and will not respawn TEST1 and/or TEST2 which is needed to function properly. How do I know this because the program to restart TEST1 and/or TEST2 requires Foo to be restarted first.
So when I want to monitor a child process, if failed sendemail that it failed then restart the service and send another email that it is started again. I plan to run the script via CRON every 5 minutes.
The check works independently and so does the sendmail. The problem is when I create a if else statement. When TEST1 or TEST2 dies it still logs that it is running when it is not. Can someone help me on this please.
#!/bin/bash
#Check if process is running
VAL1=`/usr/ucb/ps aux | grep "[P]ROCESS TEST1" >/dev/null`
VAL2=`/usr/ucb/ps aux | grep "[P]ROCESS TEST2" >/dev/null`
if $VAL1 && $VAL2; then
echo "$(date) - $VAL1 & $VAL2 is Running" >> /var/tmp/Log.txt;
else
SUBJ="Process has stopped"
FROM="Server"
TO="[email protected]"
(
cat << !
To : ${TO}
From : ${FROM}
Subject : ${SUBJ}
!
cat << !
The $VAL1 and $VAL2 went down at $(date) please login to the server to restart
!
) | sendmail -v ${TO}
elseif
/usr/sbin/svcadm disable Foo;
wait 10;
/usr/sbin/svcadm enable Foo;
fi
So, one thing about your tests is that you're pushing the output to /dev/null
, which means that VAL1 and VAL2 will always be empty.
Secondly, you don't need the elif. You have two basic conditions. Either things are running, or they are not. If anything is not running, send an email. You could do some additional testing to determine whether it's PROCESS TEST1 or PROCESS TEST2 that died, but that wouldn't strictly be necessary.
Here's how I might write a script to do the same thing.
#!/usr/bin/env bash
#Check if process is running
PID1=$(/usr/ucb/ps aux | grep "[P]ROCESS TEST1" | awk '{print $2}')
PID2=$(/usr/ucb/ps aux | grep "[P]ROCESS TEST2" | awk '{print $2}')
err=0
if [ "x$PID1" == "x" ]; then
# PROCESS TEST1 died
err=$(( err + 1 ))
else
echo "$(date) - PROCESS TEST1 $VAL2 is Running" >> /var/tmp/Log.txt;
fi
if [ "x$PID2" == "x" ]; then
# PROCESS TEST2 died
err=$(( err + 2 ))
else
echo "$(date) - PROCESS TEST2 is Running" >> /var/tmp/Log.txt;
fi
if (( $err > 0 )); then
# identify which PROCESS TEST had the problem.
if $(( err == 1 )); then
condition="PROCESS TEST1 is down"
elif (( $err == 2 )); then
condition="PROCESS TEST2 is down"
else
condition="PROCESS TEST1 and PROCESS TEST2 are down"
fi
# let's send an email to get eyes on the issue, but we will restart the process after
# we send the email.
SUBJ="Process Error Detected"
FROM="Server"
TO="[email protected]"
(
cat <<-EOT
To : ${TO}
From : ${FROM}
Subject : ${SUBJ}
$condition at $(date) please login to the server to check that the processes were restarted successfully.
EOT
) | sendmail -v ${TO}
# we reached an error condition, and we sent mail
# now let's restart the svc.
/usr/sbin/svcadm restart Foo
fi