Recently, I’ve been noticing that a high number of folks using Node Resque have been reporting similar problems relating to the topics of shutting down your node application and property handling uncaught exceptions and unix signals. These problems are exacerbated with deployments involving Docker or a platform like Heroku, which uses Docker under-the-hood. However, if you keep these tips in mind, it’s easy to have your app work exactly like you want it too… even when things are going wrong!
I’ve added a Docker-specific example to Node Rescue which you can check out here https://github.com/actionhero/node-resque/tree/master/examples/docker, and this blog post will dive deeper into the 3 areas the the example focuses on. Node Resque is a background-job processing framework for Node & Typescript which stores jobs in Redis. It support delayed and recurring jobs, plugins, and more. Node Rescue is a core component of the Actionhero framework.
You shouldn’t be using NPM, YARN, PM2 or any other tool to "run" your application inside of your Docker images. You should be calling only the node executable and the file you want to run. This is important so that the signals Docker wants to pass to your application actually get to your app!
There are lots of Unix signals that all mean different things, but in a nutshell it’s a way for the Operating System (OS) to tell your application to do something, usually implying that it should change its lifecycle state (stop, reboot, etc). For web servers, the most common signals will be
SIGTERM (terminate) ,
SIGKILL(kill, aka: "no really stop right now I don’t care what you are working on") and
Docker, assuming your base OS is a *NIX operating system like Ubuntu, Red Hat, Debian, Alpine, etc, uses these signals too. For example, when you tell a running Docker instance to stop
docker stop, it will send
SIGETERM to your application, wait some amount of time for it to shut down, and then do a hard stop with
SIGKILL. That’s the same thing that would happen with docker kill — it sends
SIGKILL too. What are the differences between stop and kill? That depends on how you write your application! We’ll cover that more in section #2.
So how to you start your node application directly? Assuming you can run your application on your development machine with
node ./dist/server.js, your docker file might look like this:
And, be sure you don’t copy your local
node_modules with a
We are using the
CMD directive, not
ENTRYPOINT because we don’t want Docker to use a subshell.
CMD without 2+ arguments works by calling
/bin/sh -c and then your command… which can trap the signals it gets itself and not pass them on to your application. If you used a process runner like
npm start, the same thing could happen.
You can learn more about docker signals & node here https://hynek.me/articles/docker-signals/
Ok, so we are sure we will get the signals from the OS and Docker… how do we handle them? Node makes it really easy to listen for these signals in your app via:
This will prevent Node.JS from stopping your application outright, and will give you an event so you can do something about it.
… but what should you do? If you application is a web server, you might:
GET /status) to return `false` so the load balancer will stop sending traffic to this instance
If your application uses Node Resque, you should call
await scheduler.end() etc. This will tell the rest of the cluster that this worker is:
In Actionhero we manage this at the application level
await actionhero.process.stop() and allow all of the sub-systems (initializers) to gracefully shut down — servers, task workers, cache, chat rooms, etc. It’s important to hand off work to other members in the cluster and/or let connected clients know what to do.
A robust collection of process events for your node app might look like:
Let's walk though this:
shutdown, which contains our application-specific shutdown logic.
awaitHardStop. This is to help with situations where an exception might happen during your shutdown behavior, a background task is taking too long, a timer doesn’t resolve, you can’t close your database connection… there are lots of things that could go wrong. We also use an Environment Variable to customize how long we wait
process.env.SHUTDOWN_TIMEOUT which you can configure via Docker. If the app doesn’t exist in in this time, we forcibly exit the program with `1`, indicating a crash or error,
We can listen for any unix signal we want, but we should never listen for
SIGKILL. If we try to catch it with a process listener, and we don’t immediately exit the application, we’ve broken our promise to the operating system that
SIGKILL will immediately end any process… and bad things could happen.
Finally, log the heck out of signaling behavior in your application. It’s innately hard to debug this type of thing, as you are telling your app to stop… but you haven’t yet stopped. Even after
docker stop, logs are still generated and stored…. And you might need them!
In the Node Rescue examples, we log all the stop events and when the application finally exists:
So, if you:
You should have no problem creating robust node applications that are deployed via Docker, and are a pleasure to monitor and debug.