One of the projects I have running is a temperature monitor. I built a QK145 temperature monitor kit a few years ago and have been monitoring local temperature data for a few years now. However, the process I have monitoring and logging the temperature occasionally crashes with a bus error or illegal instruction or some other weird error. I don't know whether it's related to the old version of FreeBSD running the server, or dodgy hardware (memory?), a broken driver, or what. It doesn't happen often enough to get concerned about (yet).
Anyway, I got annoyed with having to restart the monitor if it happened to fall over. The result is relight, a small Python script that automatically restarts a process that crashes occasionally. Here's the usage:
Usage: ./relight.py [options] command args ... -n restarts number of restarts within a minute before we give up (default 5) -l logfile name of log file (default relight.log) -w wait seconds to wait between restarts (default 5)
Relight also comes with complete unit tests! It was a bit tricky to write automated tests that deliberately killed a process spawned by a child process (a "grandchild" process). But by having the grandchild process echo its own pid, the test code was able to read that and send a SIGKILL to the correct process.
2009-12-25T14:42:11Z