
You run docker ps and there it is, next to your container name: (unhealthy). The container is still running, your app might even be serving traffic, but Docker has decided something is wrong — and if you're using an orchestrator, it may be about to kill and restart the container. The frustrating part is that unhealthy tells you that something failed, not what.
This guide is a practical walkthrough for debugging an unhealthy Docker container: what the status actually means, how to read the health check logs Docker hides, the most common root causes, and how to fix them. For the bigger picture on why in-container health checks alone aren't enough, see our Docker container monitoring guide.
What "Unhealthy" Actually Means
Docker's health status doesn't come from Docker guessing — it comes from a HEALTHCHECK instruction that you (or a base image) defined. Docker runs that command on a schedule inside the container and tracks the exit code:
- Exit
0→ healthy - Exit
1(or any non-zero) → unhealthy
A container starts as health: starting. After the health check passes once, it becomes healthy. If the check fails --retries times in a row, the status flips to unhealthy. Crucially, unhealthy is a report, not an action — plain Docker won't restart the container on its own. But Docker Swarm, Kubernetes (via equivalent probes), and restart policies tied to orchestration will act on it.
So the first mental shift: unhealthy means your health check command is exiting non-zero, repeatedly. Everything below is about finding out why.
Step 1: Read the Health Check Logs
Docker stores the output of the last few health check runs — but docker ps and docker logs don't show them. This is the single most useful and most overlooked debugging step. Use docker inspect:
docker inspect --format='{{json .State.Health}}' <container> | jq
This returns the current Status, the FailingStreak, and a Log array containing the exit code and stdout/stderr of recent probes. That captured output is usually the smoking gun — a connection-refused message, a 500 body, a "command not found", or a timeout.
A few things to know about this log:
- Docker keeps only the last 5 results by default, and truncates each output to 4 KB. If your check prints a lot, the real error may be cut off — make the command terse.
- The
ExitCodefield tells you whether the command ran and failed (e.g.1) versus couldn't run at all (127= not found,126= not executable). - If
docker logsand.State.Health.Logdisagree, trust the health log — it reflects what the probe actually saw.
Step 2: Run the Health Check Command Manually
Once you know what command Docker is running, run it yourself inside the container. Find the command first:
docker inspect --format='{{json .Config.Healthcheck.Test}}' <container>
Then exec into the container and run it exactly as written:
docker exec -it <container> sh -c "curl -f http://localhost:8080/healthz"
echo "exit code: $?"
This collapses most mysteries instantly. You'll typically discover one of:
- The endpoint returns a non-2xx status, so
curl -fexits non-zero (see HTTP status codes). - The tool the check depends on (
curl,wget,nc) isn't installed in the image — extremely common on slim/Alpine images. - The app binds to the wrong interface (
127.0.0.1vs0.0.0.0) so it's unreachable even from inside. - The check works, but only after the app has finished a slow startup.
The Most Common Causes (and Fixes)
1. The health check tool isn't in the image
A HEALTHCHECK using curl on an image that doesn't ship curl fails with exit code 127. Either install the tool, or use something already present. Many modern images include a tiny native check binary, or you can use the language runtime directly (e.g. a one-line Node/Python request) instead of adding curl.
2. Start period too short
If your app takes 30 seconds to boot but the health check starts probing at second 5 and gives up after 3 retries, the container is marked unhealthy before it ever had a chance. Use --start-period (or start_period in compose) to give the app a grace window during which failures don't count against the retry budget:
HEALTHCHECK --interval=10s --timeout=3s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8080/healthz || exit 1
This is the same principle as a Kubernetes startup probe — see health check endpoint design for getting livez/readyz right.
3. Timeout too aggressive
If the check's --timeout is shorter than your endpoint's real response time under load, probes time out and count as failures. A health endpoint that does heavy work (checking every dependency, running queries) can be slow precisely when the system is busy — measure its latency and set the timeout with headroom.
4. The health endpoint is too strict
A health check that fails if any downstream dependency is unavailable will mark your container unhealthy during a transient blip in a service that isn't even critical. Separate liveness ("is the process alive?") from readiness ("can it serve traffic?") and don't fail liveness on optional dependencies.
5. The app genuinely is broken
Sometimes unhealthy is correct: the process is deadlocked, out of memory, or crash-looping. Check docker inspect for OOMKilled: true and look at restart counts. In that case the health check did its job — now it's an application problem, not a health-check-config problem.
Debugging docker-compose Health Checks
docker-compose adds a layer worth calling out, because it's a frequent source of confusion:
- See status:
docker compose psshows a health column;docker inspecton the underlying container still gives you the full log. depends_onwithcondition: service_healthymeans a dependent service won't start until this one is healthy. A misconfigured health check here doesn't just mark one container unhealthy — it blocks your whole stack from starting, which often surfaces as "compose hangs."- Compose health checks use the same fields (
test,interval,timeout,retries,start_period); a common mistake is YAML formatting of thetestarray vs string form (CMDvsCMD-SHELL).
If a compose stack won't come up, check whether a service_healthy dependency is stuck unhealthy first — that's usually the root cause.
Why Internal Health Checks Aren't the Whole Story
Here's the limitation that catches teams out: a Docker health check runs inside the container, probing localhost. It can pass while users still can't reach your service — because it never tests DNS, the host's port mapping, the load balancer, TLS, or the network path. A container can be healthy and your site can still be down from the outside.
This is the classic black-box vs white-box gap. The internal HEALTHCHECK is white-box: useful for orchestration decisions, blind to everything outside the container. You also need an outside-in check that hits the service the way a real user does.
How Webalert Helps
Webalert is the outside-in half your HEALTHCHECK can't provide:
- External uptime checks that hit your service through DNS, the host port mapping, and TLS — catching the failures a
localhostprobe inside the container will never see. - Content validation, so a container reporting
healthywhile serving a broken page gets flagged anyway. - Multi-region monitoring to distinguish "the container is fine but a region can't reach it" from a real outage.
- Alerting that complements orchestration: Docker restarts the container; Webalert confirms users can actually reach it again.
Use Docker's health checks for orchestration decisions, and Webalert to confirm the result is true from where your customers sit.
Summary
An unhealthy Docker container means its HEALTHCHECK command is exiting non-zero repeatedly. The fastest path to a fix is almost always the same: read .State.Health.Log with docker inspect, then run the check command manually inside the container. From there, the usual culprits are a missing tool in the image, a start period that's too short, an overly aggressive timeout, an over-strict endpoint, or a genuinely broken app.
Get the health check config right, separate liveness from readiness, and remember that a healthy container only proves the inside is fine — pair it with outside-in monitoring to know your users can actually reach it.