We’re using the excellent (Ruby) Sneakers library for running background processes, and Consul and envconsul for handling environment variables. Turns out that getting all of this to work, along with a service to keep the processes up, is harder than it seems.
Installing Sneakers and envconsul are easy enough: use RVM to install Ruby and sneakers, compile envconsul, and run everything together:
export WORKERS=MyWorker,MyOtherWorker envconsul -config=/etc/envconsul.hcl bundle exec sneakers work $WORKERS --require ./config/environment.rb
Cool – so far, so good. Next, put this into a bash script. Then write the supervisor conf file to handle this:
[program:sneakers] command=/app/run-workers.sh directory=/app user=sneakers stopsignal=SIGTERM
And on some base level, this should work. When you start the program in supervisor, it comes up – great! But when you stop the service, all of the child processes spawned by sneakers will get orphaned. The reason this happens is because of the joy of PIDs.
First: envconsul has a setting where it will pass along a kill signal to its child process, which is critical to this. Next: when supervisor starts a program, that program gets a PID that supervisor holds onto. So when you tell supervisor to stop, it sends the stop signal (TERM by default, but I changed it to SIGTERM for Sneakers) to the next process. Normally that would be fine; in this case, if the supervisor command was the envconsul line, it would work seamlessly. But since there’s a bash script running envconsul, it gets the signal. So the signal needs to be propagated down the chain. Here’s a fun diagram to explain:
So how do we fix this? It’s all about the bash script. I spent a lot of time reading how to do this, but this blog post helped tremendously. Here’s what I’m running:
#!/bin/bash trap 'kill -SIGTERM $PID' SIGTERM TERM INT export WORKERS=MyWorker,MyOtherWorker cd /app envconsul -config=/etc/envconsul.hcl --kill-signal SIGTERM bundle exec sneakers work $WORKERS --require ./config/environment.rb & PID=$! wait $PID trap - SIGTERM TERM INT wait $PID EXIT_STATUS=$?
With this, everything should work fine. Starting and stopping the process in supervisor works because the SIGTERM signal gets propagated down. One thing to note is that the wait lines in the bash script are necessary, otherwise the bash script will finish before envconsul stops running, and you’ll end up with orphaned processes.