bash restart sub-process using trap SIGCHLD?

X.M. picture X.M. · Jul 21, 2011 · Viewed 7.6k times · Source

I've seen monitoring programs either in scripts that check process status using 'ps' or 'service status(on Linux)' periodically, or in C/C++ that forks and wait on the process...

I wonder if it is possible to use bash with trap and restart the sub-process when SIGCLD received?

I have tested a basic suite on RedHat Linux with following idea (and certainly it didn't work...)

#!/bin/bash
set -o monitor # can someone explain this? discussion on Internet say this is needed
trap startProcess SIGCHLD
startProcess() { 
  /path/to/another/bash/script.sh & # the one to restart
  while [ 1 ]
  do
    sleep 60
  done
}
startProcess

what the bash script being started just sleep for a few seconds and exit for now.

several issues observed:

  • when the shell starts in foreground, SIGCHLD will be handled only once. does trap reset signal handling like signal()?
  • the script and its child seem to be immune to SIGINT, which means they cannot be stopped by ^C
  • since cannot be closed, I closed the terminal. The script seems to be HUP and many zombie children left.
  • when run in background, the script caused terminal to die

... anyway, this does not work at all. I have to say I know too little about this topic. Can someone suggest or give some working examples? Are there scripts for such use?

how about use wait in bash, then?

Thanks

Answer

jw013 picture jw013 · Jul 21, 2011

I can try to answer some of your questions but not all based on what I know.

  1. The line set -o monitor (or equivalently, set -m) turns on job control, which is only on by default for interactive shells. This seems to be required for SIGCHLD to be sent. However, job control is more of an interactive feature and not really meant to be used in shell scripts (see also this question).

    Also keep in mind this is probably not what you intended to do because once you enable job control, SIGCHLD will be sent for every external command that exists (e.g. every time you run ls or grep or anything, a SIGCHLD will fire when that command completes and your trap will run).

  2. I suspect the reason the SIGCHLD trap only appears to run once is because your trap handler contains a foreground infinite loop, so your script gets stuck in the trap handler. There doesn't seem to be a point to that loop anyways, so you could simply remove it.

  3. The script's "immunity" to SIGINT seems to be an effect of enabling job control (the monitor part). My hunch is with job control turned on, the sub-instance of bash that runs your script no longer terminates itself in response to a SIGINT but instead passes the SIGINT through to its foreground child process. In your script, the ^C i.e. SIGINT simply acts like a continue statement in other programming languages case, since SIGINT will just kill the currently running sleep 60, whereupon the while loop will immediately run a new sleep 60.

  4. When I tried running your script and then killing it (from another terminal), all I ended up with were two stray sleep processes.

  5. Backgrounding that script also kills my shell for me, although the behavior is not terribly consistent (sometimes it happens immediately, other times not at all). It seems typing any keys other than enter causes an EOF to get sent somehow. Even after the terminal exits the script continues to run in the background. I have no idea what is going on here.

Being more specific about what you want to accomplish would help. If you just want a command to run continuously for the lifetime of your script, you could run an infinite loop in the background, like

while true; do
    some-command
    echo some-command finished
    echo restarting some-command ...
done &

Note the & after the done.

For other tasks, wait is probably a better idea than using job control in a shell script. Again, it would depend on what exactly you are trying to do.