Docker Container Unhealthy: How to Debug Health Checks

You run docker ps and there it is, next to your container name: (unhealthy). The container is still running, your app might even be serving traffic, but Docker has decided something is wrong — and if you're using an orchestrator, it may be about to kill and restart the container. The frustrating part is that unhealthy tells you that something failed, not what.

This guide is a practical walkthrough for debugging an unhealthy Docker container: what the status actually means, how to read the health check logs Docker hides, the most common root causes, and how to fix them. For the bigger picture on why in-container health checks alone aren't enough, see our Docker container monitoring guide.

What "Unhealthy" Actually Means

Docker's health status doesn't come from Docker guessing — it comes from a HEALTHCHECK instruction that you (or a base image) defined. Docker runs that command on a schedule inside the container and tracks the exit code:

Exit 0 → healthy
Exit 1 (or any non-zero) → unhealthy

A container starts as health: starting. After the health check passes once, it becomes healthy. If the check fails --retries times in a row, the status flips to unhealthy. Crucially, unhealthy is a report, not an action — plain Docker won't restart the container on its own. But Docker Swarm, Kubernetes (via equivalent probes), and restart policies tied to orchestration will act on it.

So the first mental shift: unhealthy means your health check command is exiting non-zero, repeatedly. Everything below is about finding out why.

Step 1: Read the Health Check Logs

Docker stores the output of the last few health check runs — but docker ps and docker logs don't show them. This is the single most useful and most overlooked debugging step. Use docker inspect:

docker inspect --format='{{json .State.Health}}' <container> | jq

This returns the current Status, the FailingStreak, and a Log array containing the exit code and stdout/stderr of recent probes. That captured output is usually the smoking gun — a connection-refused message, a 500 body, a "command not found", or a timeout.

A few things to know about this log:

Docker keeps only the last 5 results by default, and truncates each output to 4 KB. If your check prints a lot, the real error may be cut off — make the command terse.
The ExitCode field tells you whether the command ran and failed (e.g. 1) versus couldn't run at all (127 = not found, 126 = not executable).
If docker logs and .State.Health.Log disagree, trust the health log — it reflects what the probe actually saw.

Step 2: Run the Health Check Command Manually

Once you know what command Docker is running, run it yourself inside the container. Find the command first:

docker inspect --format='{{json .Config.Healthcheck.Test}}' <container>

Then exec into the container and run it exactly as written:

docker exec -it <container> sh -c "curl -f http://localhost:8080/healthz"
echo "exit code: $?"

This collapses most mysteries instantly. You'll typically discover one of:

The endpoint returns a non-2xx status, so curl -f exits non-zero (see HTTP status codes).
The tool the check depends on (curl, wget, nc) isn't installed in the image — extremely common on slim/Alpine images.
The app binds to the wrong interface (127.0.0.1 vs 0.0.0.0) so it's unreachable even from inside.
The check works, but only after the app has finished a slow startup.

The Most Common Causes (and Fixes)

1. The health check tool isn't in the image

A HEALTHCHECK using curl on an image that doesn't ship curl fails with exit code 127. Either install the tool, or use something already present. Many modern images include a tiny native check binary, or you can use the language runtime directly (e.g. a one-line Node/Python request) instead of adding curl.

2. Start period too short

If your app takes 30 seconds to boot but the health check starts probing at second 5 and gives up after 3 retries, the container is marked unhealthy before it ever had a chance. Use --start-period (or start_period in compose) to give the app a grace window during which failures don't count against the retry budget:

HEALTHCHECK --interval=10s --timeout=3s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8080/healthz || exit 1

This is the same principle as a Kubernetes startup probe — see health check endpoint design for getting livez/readyz right.

3. Timeout too aggressive

If the check's --timeout is shorter than your endpoint's real response time under load, probes time out and count as failures. A health endpoint that does heavy work (checking every dependency, running queries) can be slow precisely when the system is busy — measure its latency and set the timeout with headroom.

4. The health endpoint is too strict

A health check that fails if any downstream dependency is unavailable will mark your container unhealthy during a transient blip in a service that isn't even critical. Separate liveness ("is the process alive?") from readiness ("can it serve traffic?") and don't fail liveness on optional dependencies.

5. The app genuinely is broken

Sometimes unhealthy is correct: the process is deadlocked, out of memory, or crash-looping. Check docker inspect for OOMKilled: true and look at restart counts. In that case the health check did its job — now it's an application problem, not a health-check-config problem.

Debugging docker-compose Health Checks

docker-compose adds a layer worth calling out, because it's a frequent source of confusion:

See status: docker compose ps shows a health column; docker inspect on the underlying container still gives you the full log.
depends_on with condition: service_healthy means a dependent service won't start until this one is healthy. A misconfigured health check here doesn't just mark one container unhealthy — it blocks your whole stack from starting, which often surfaces as "compose hangs."
Compose health checks use the same fields (test, interval, timeout, retries, start_period); a common mistake is YAML formatting of the test array vs string form (CMD vs CMD-SHELL).

If a compose stack won't come up, check whether a service_healthy dependency is stuck unhealthy first — that's usually the root cause.

Why Internal Health Checks Aren't the Whole Story

Here's the limitation that catches teams out: a Docker health check runs inside the container, probing localhost. It can pass while users still can't reach your service — because it never tests DNS, the host's port mapping, the load balancer, TLS, or the network path. A container can be healthy and your site can still be down from the outside.

This is the classic black-box vs white-box gap. The internal HEALTHCHECK is white-box: useful for orchestration decisions, blind to everything outside the container. You also need an outside-in check that hits the service the way a real user does.

How Webalert Helps

Webalert is the outside-in half your HEALTHCHECK can't provide:

External uptime checks that hit your service through DNS, the host port mapping, and TLS — catching the failures a localhost probe inside the container will never see.
Content validation, so a container reporting healthy while serving a broken page gets flagged anyway.
Multi-region monitoring to distinguish "the container is fine but a region can't reach it" from a real outage.
Alerting that complements orchestration: Docker restarts the container; Webalert confirms users can actually reach it again.

Use Docker's health checks for orchestration decisions, and Webalert to confirm the result is true from where your customers sit.

Summary

An unhealthy Docker container means its HEALTHCHECK command is exiting non-zero repeatedly. The fastest path to a fix is almost always the same: read .State.Health.Log with docker inspect, then run the check command manually inside the container. From there, the usual culprits are a missing tool in the image, a start period that's too short, an overly aggressive timeout, an over-strict endpoint, or a genuinely broken app.

Get the health check config right, separate liveness from readiness, and remember that a healthy container only proves the inside is fine — pair it with outside-in monitoring to know your users can actually reach it.

Know when "healthy" isn't actually reachable

Start monitoring with Webalert ->

See features and pricing. No credit card required.