Systemd watchdog

I haven’t been posting much lately, since last time I’ve switched jobs. My silence is partly because this new job is more fun than the previous one – hence less need to escape into the sanity offered by Linux, FP (Haskell, OCaml, …), and the other stuff I like to fill my free time with. Now I’ve started to get into the job, and I hope I’ll find stuff to write about and time to do it.

Lately I’ve spent a lot of time reading up on, and playing with systemd. It’s really a very impressive piece of software. I found the series “systemd for Administrators” by Lennart Poettering a very good introduction:

There’s also a (much shorter) “systemd for Developers”

Watchdog in systemd

One of the nice things offered by systemd is watchdog functionality. For work I had a need to try it out, partly for my own sake (I’ve never really had a need to play with watchdog functinality in any way before) and partly to communicate to the rest of the team how they can modify their parts to integrate with systemd.

As is described in part 15 the watchdog period is communicated via the environment variable WATCHDOG_USEC. As the name suggests it holds the period in micro seconds. To then tickle the watchdog one uses sd_notify(0, "WATCHDOG=1"). To test this all I wrote the follwoing bit of C (sd-watch.c):

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <unistd.h>
#include <systemd/sd-daemon.h>

int main(int ac, char **av)
{
  (void)ac;

  char *e = getenv("WATCHDOG_USEC");
  if(!e) {
    printf("No WATCHDOG_USEC set!\n");
    exit(1);
  } else
    printf("WATCHDOG_USEC: %s\n", e);

  int i = atoi(e);
  i /= 2;
  printf("* Barking every %i usec.\n", i);

  bool cont = true;
  while(cont) {
    usleep(i);

    struct stat b;
    int r = stat(av[1], &b);
    if(0 == r) {
      continue;
    } else {
      r = sd_notify(0, "WATCHDOG=1");
      printf("Barked!!! (%i)\n", r);
    }
  }

  return 0;
}

Compile it with gcc -o sd-watch sd-watch.c $(pkg-config --libs libsystemd).

Together with this comes a service description:

[Unit]
Description=Test for systemd watchdog

[Service]
ExecStart=/usr/local/bin/sd-watch /tmp/foobar
WatchdogSec=5s
Restart=on-failure
StartLimitInterval=1min
StartLimitBurst=2
StartLimitAction=reboot

As you can see I placed the binary in /usr/local/bin, then I placed the service description in /usr/local/lib/systemd/system/sd-watch.service. A modification to /etc/systemd/systemd.con is the last piece, uncomment RuntimeWatchdogSec and set it to a suitable value, I chose 60:

RunimeWatchdogSec=60

Just reboot,log in and start the service and follow the log:

# systemctl start sd-watch
# journalctl -f

In another terminal create the file, touch /tmp/foobar, and watch what’s written to the log, and after a little while the system should reboot.

⟸ Using QuickCheck to test C APIs Extracting titles and links from atom feed ⟹
Leave a comment