Graceful Shutdown and SIGTERM: Deploy Without Dropping Requests

Every deploy, every autoscale-down, every pod reschedule stops a running process. The question is how it stops. If your service is killed mid-request, users get dropped connections, half-written data, and 502s — during what was supposed to be a routine, zero-downtime deploy. The difference between a clean release and a wave of errors is whether your app shuts down gracefully: finishing what it's doing before it exits.

This guide explains how process termination actually works, what graceful shutdown means, and how to implement it so deploys and restarts don't drop a single request.

How a Process Gets Stopped: SIGTERM vs SIGKILL

When an orchestrator, init system, or docker stop wants a process gone, it doesn't just yank the plug — it asks first:

SIGTERM — the "please shut down" signal. It's a polite request: the process can catch it, stop accepting new work, finish in-flight requests, flush buffers, close connections, and exit cleanly. This is the signal graceful shutdown is built around.
A grace period — the system then waits a set amount of time (Kubernetes defaults to 30 seconds via terminationGracePeriodSeconds).
SIGKILL — if the process hasn't exited by the end of the grace period, it's force-killed. SIGKILL can't be caught or handled; the process dies instantly, mid-whatever-it-was-doing.

The entire goal of graceful shutdown is to do all your cleanup during the SIGTERM window so SIGKILL never has to fire. An app that ignores SIGTERM gets exactly the grace period of doing nothing, then a hard kill — dropping every request still in flight.

What Graceful Shutdown Actually Does

A well-behaved service, on receiving SIGTERM, runs an ordered shutdown sequence:

Stop accepting new work. Close the listening socket / stop pulling from the queue so no new requests arrive. Existing ones continue.
Finish in-flight requests. Give active requests a chance to complete (within the grace period). This is the part that prevents dropped connections.
Drain and close resources. Finish or safely abort in-progress jobs, flush logs and buffers, commit or roll back transactions, and close database connections and pools cleanly.
Exit with status 0. A clean exit tells the orchestrator the shutdown succeeded.

Done right, a deploy looks like this from the outside: old instances quietly finish their work and bow out while new instances take over — and no user ever notices.

The Load Balancer Race Condition

Here's the subtle part that trips up even teams who have graceful shutdown: a service can shut down perfectly and still drop requests, because of a race with the load balancer.

When a pod is terminating, two things happen roughly in parallel: it receives SIGTERM, and it gets removed from the load balancer / service endpoints. These are not synchronized. If the app stops accepting connections the instant it gets SIGTERM, but the load balancer hasn't yet noticed the pod is leaving, it keeps routing new requests to a socket that's already closed — connection refused, 502s, errors.

The fix is a deliberate, slightly counterintuitive step: on SIGTERM, wait briefly before closing the socket (often with a preStop hook sleeping a few seconds, or by failing the readiness probe first). That pause lets the load balancer deregister the instance before it stops accepting connections, so new traffic stops arriving on its own. Only then do you drain in-flight requests and exit. Readiness-gating plus a short drain delay is what makes shutdown truly seamless — and it's the missing piece behind a lot of "but we have graceful shutdown and still see 502s on deploy" mysteries.

Common Pitfalls

Ignoring SIGTERM entirely. The default for many runtimes is to exit immediately — no draining. You have to explicitly handle the signal.
PID 1 in containers. A process running as PID 1 doesn't get default signal handling, and shell-form Docker entrypoints can swallow SIGTERM so it never reaches your app. Use exec-form CMD or an init shim (tini) so signals propagate.
Grace period too short. If your longest legitimate request takes 45s but the grace period is 30s, SIGKILL cuts it off. Size terminationGracePeriodSeconds to your real request durations.
Skipping the load-balancer drain delay. The race above — the single most common reason "graceful" shutdowns still drop traffic.
Long-running jobs treated like web requests. Background jobs need their own strategy: checkpoint and requeue, or make them idempotent so a killed job can be safely retried.

How Webalert Helps

Graceful shutdown is implemented inside your app — but the proof that it works is external: do deploys and restarts cause user-visible errors, or not? That's exactly what outside-in monitoring measures:

Deploy-time verification that confirms, from multiple regions, whether releases cause a blip of 502s and timeouts or stay clean — the real test of your shutdown logic.
Continuous uptime checks that catch the dropped-connection errors a botched termination produces, the moment they happen.
Response-time and error tracking across a rollout, so a regression in shutdown behavior shows up as data, not as a customer complaint.
Sustained-failure alerting that tells real deploy-induced downtime apart from a single expected blip, without noise.

Webalert answers the question your CI/CD pipeline can't: did real users stay connected through the deploy?

Summary

Graceful shutdown is how a service stops cleanly when SIGTERM arrives — stop taking new work, finish in-flight requests, drain resources, and exit before the grace period runs out and SIGKILL force-kills it. Skip it and every deploy, autoscale, or pod reschedule risks dropped connections and 502s.

Handle SIGTERM explicitly, make sure signals actually reach your app (mind PID 1 and Docker entrypoint form), size the grace period to your longest real request, and — the step most teams miss — add a short drain delay so the load balancer deregisters the instance before it stops accepting connections. Give background jobs their own checkpoint-or-idempotent strategy. Then verify it works the only way that counts: outside-in monitoring confirming that real users sail through your deploys without a single dropped request.

Prove your deploys are truly zero-downtime

Start monitoring with Webalert ->

See features and pricing. No credit card required.

Graceful Shutdown and SIGTERM: Deploy Without Dropping Requests

How a Process Gets Stopped: SIGTERM vs SIGKILL

What Graceful Shutdown Actually Does

The Load Balancer Race Condition

Common Pitfalls

How Webalert Helps

Summary

Prove your deploys are truly zero-downtime

Related Articles

Database Schema Migrations: Safe DDL Without Downtime

Kubernetes ImagePullBackOff and ErrImagePull: How to Fix

Blue-Green vs Canary Deployments: Which Should You Use?

Stop guessing about downtime