Date: 2009-12-18 00:09:00
relight: restart a crashing process

One of the projects I have running is a temperature monitor. I built a QK145 temperature monitor kit a few years ago and have been monitoring local temperature data for a few years now. However, the process I have monitoring and logging the temperature occasionally crashes with a bus error or illegal instruction or some other weird error. I don't know whether it's related to the old version of FreeBSD running the server, or dodgy hardware (memory?), a broken driver, or what. It doesn't happen often enough to get concerned about (yet).

Anyway, I got annoyed with having to restart the monitor if it happened to fall over. The result is relight, a small Python script that automatically restarts a process that crashes occasionally. Here's the usage:

Usage: ./ [options] command args ...

        -n restarts
            number of restarts within a minute before we give up (default 5)
        -l logfile
            name of log file (default relight.log)
        -w wait
            seconds to wait between restarts (default 5)

Relight also comes with complete unit tests! It was a bit tricky to write automated tests that deliberately killed a process spawned by a child process (a "grandchild" process). But by having the grandchild process echo its own pid, the test code was able to read that and send a SIGKILL to the correct process.

For this scenario, I typically run the process in do-not-daemonize mode (when possible), and launch it from init (/etc/inittab).

- d.
Greg Hewgill <>