I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
You can seamlessly restart your process as it dies with fork
and waitpid
as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm
syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm
(i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.