I've seen monitoring programs either in scripts that check process status using 'ps' or 'service status(on Linux)' periodically, or in C/C++ that forks and wait on the process...
I wonder if it is possible to use bash with trap and restart the sub-process when SIGCLD received?
I have tested a basic suite on RedHat Linux with following idea (and certainly it didn't work...)
#!/bin/bash
set -o monitor # can someone explain this? discussion on Internet say this is needed
trap startProcess SIGCHLD
startProcess() {
/path/to/another/bash/script.sh & # the one to restart
while [ 1 ]
do
sleep 60
done
}
startProcess
what the bash script being started just sleep for a few seconds and exit for now.
several issues observed:
... anyway, this does not work at all. I have to say I know too little about this topic. Can someone suggest or give some working examples? Are there scripts for such use?
how about use wait in bash, then?
Thanks
I can try to answer some of your questions but not all based on what I know.
The line set -o monitor
(or equivalently, set -m
) turns on job
control, which is only on by default for interactive shells. This seems
to be required for SIGCHLD to be sent. However, job control is more of
an interactive feature and not really meant to be used in shell scripts
(see also this question).
Also keep in mind this is probably not what you intended to do
because once you enable job control, SIGCHLD will be sent for every
external command that exists (e.g. every time you run ls
or grep
or
anything, a SIGCHLD will fire when that command completes and your trap
will run).
I suspect the reason the SIGCHLD trap only appears to run once is because your trap handler contains a foreground infinite loop, so your script gets stuck in the trap handler. There doesn't seem to be a point to that loop anyways, so you could simply remove it.
The script's "immunity" to SIGINT seems to be an effect of enabling
job control (the monitor part). My hunch is with job control turned on,
the sub-instance of bash that runs your script no longer terminates
itself in response to a SIGINT but instead passes the SIGINT through to
its foreground child process. In your script, the ^C
i.e. SIGINT
simply acts like a continue
statement in other programming languages
case, since SIGINT will just kill the currently running sleep 60
,
whereupon the while loop will immediately run a new sleep 60
.
When I tried running your script and then killing it (from another terminal), all I ended up with were two stray sleep processes.
Backgrounding that script also kills my shell for me, although the behavior is not terribly consistent (sometimes it happens immediately, other times not at all). It seems typing any keys other than enter causes an EOF to get sent somehow. Even after the terminal exits the script continues to run in the background. I have no idea what is going on here.
Being more specific about what you want to accomplish would help. If you just want a command to run continuously for the lifetime of your script, you could run an infinite loop in the background, like
while true; do
some-command
echo some-command finished
echo restarting some-command ...
done &
Note the &
after the done
.
For other tasks, wait
is probably a better idea than using job control
in a shell script. Again, it would depend on what exactly you are trying
to do.