Kubernetes CrashLoopBackOff: Causes and How to Fix It

You run kubectl get pods and there it is: a pod stuck in CrashLoopBackOff, restart count climbing — 5, 12, 30. The container starts, dies, starts again, dies again, and Kubernetes keeps trying with longer and longer pauses between attempts. It's one of the most common states you'll hit running workloads on Kubernetes, and also one of the most misunderstood: CrashLoopBackOff isn't an error in itself — it's Kubernetes telling you that your container keeps exiting and it's backing off before the next retry.

This guide explains exactly what the state means, the handful of causes behind almost every crash loop, and a repeatable way to diagnose and fix it.

What CrashLoopBackOff Actually Means

Break the name in two. CrashLoop: your container starts and then crashes (exits), repeatedly. BackOff: Kubernetes is deliberately waiting longer between each restart attempt so it doesn't hammer a broken container — an exponential backoff that grows 10s, 20s, 40s, up to a 5-minute cap.

So the status is really describing the kubelet's behavior, not a specific fault. The kubelet's default restart policy (Always) keeps trying to bring the container back; when it dies fast enough, often enough, the pod enters CrashLoopBackOff. The crucial implication: the real cause is whatever made your container exit — Kubernetes is just reporting the symptom. Your job is to find why the process inside died.

The Common Causes

Nearly every crash loop traces to one of these:

The application errors out on startup. An unhandled exception, a stack trace, a panic — the process starts, throws, and exits non-zero. By far the most common cause.
Missing or wrong configuration. A required environment variable, config file, or secret isn't there, so the app refuses to start. Database URLs and API keys are classic culprits.
A failed dependency. The app can't reach its database, cache, or an upstream API at boot and exits instead of waiting — closely related to connection errors like "connection refused."
OOMKilled. The container exceeds its memory limit, the kernel kills it (exit code 137), and it loops. Crash loops with exit 137 are a memory problem, not a code one.
Misconfigured liveness probe. If a liveness probe is too aggressive — too short a timeout, or pointing at an endpoint that isn't ready yet — Kubernetes kills a healthy container before it finishes booting, creating an artificial loop.
The command or entrypoint is wrong. A bad command/args, a missing binary, or a script that exits immediately (a container with nothing long-running to do will exit 0 and loop too).

Notice the pattern: the container almost always dies at or near startup. That's what makes crash loops both frustrating and, once you know where to look, fast to diagnose.

How to Diagnose It Step by Step

Work the problem in this order — each step narrows it down:

Describe the pod. kubectl describe pod <name> is your first stop. Look at the State and Last State of the container, the exit code, and the Events at the bottom. The exit code alone often tells the story (more below).
Read the logs. kubectl logs <name> shows the current attempt. But a crashed container's logs are gone on restart — so use kubectl logs <name> --previous to see the output of the instance that just died. This is where the actual stack trace or error usually lives.
Interpret the exit code. It's a precise clue:
- 0 — exited "successfully"; usually means there's no long-running process (bad command, or a one-shot script). A loop on exit 0 is a design problem.
- 1 / 2 — a generic application error; check --previous logs for the exception.
- 137 — killed by SIGKILL, almost always OOMKilled (memory limit) — confirm in describe.
- 139 — segfault (SIGSEGV); a native/binary crash.
- 143 — SIGTERM; killed during shutdown, often probe- or eviction-related.
Check the events. The Events section flags probe failures, image issues, and OOM kills explicitly.
Inspect config and probes. If logs look clean, suspect a too-aggressive liveness probe or a missing env var / secret the app needs.

How to Fix the Common Cases

Once you've found the cause, the fix usually follows directly:

App crashes on startup → fix the code path or the bad input the --previous logs revealed; reproduce locally with the same image and config.
Missing config/secret → add the env var, ConfigMap, or Secret the app expects, and verify it's mounted/referenced correctly.
Failed dependency → make the app wait and retry for dependencies at startup rather than exiting (an init container or readiness gating helps), so a slow database doesn't trigger a loop.
OOMKilled (137) → raise the memory limit or fix the leak — see the OOMKilled guide.
Aggressive liveness probe → increase initialDelaySeconds/timeoutSeconds, or use a startupProbe so slow-booting apps aren't killed before they're ready. Point liveness at a genuine health endpoint, not a heavy route.
Wrong command → correct the command/args or Dockerfile entrypoint; ensure the main process runs in the foreground.

A useful trick for stubborn cases: temporarily override the entrypoint with a sleep (command: ["sleep", "3600"]) so the pod stays up, then kubectl exec in and run the real command by hand to watch it fail interactively.

How Webalert Helps

Fixing a crash loop happens inside the cluster — but knowing your service is degraded, and confirming it's healthy again after the fix, is where outside-in monitoring earns its place:

Health-check and uptime monitoring that tells you when crashing pods have actually taken your service down for users — the impact that decides how urgent the page is.
External verification that your endpoints respond correctly once pods recover, independent of what the cluster's own dashboards claim.
Multi-region checks so you know whether a rollout-induced crash loop is affecting real traffic everywhere or just failing internally.
Sustained-failure alerting that distinguishes a brief restart from a service-down crash loop, without alert noise.

Kubernetes restarts the pod; Webalert tells you whether your users could reach it the whole time.

Summary

CrashLoopBackOff means your container keeps starting and exiting, and Kubernetes is backing off between restarts — it's the symptom, not the cause. The cause is almost always something at startup: an app that errors out, missing config or secrets, an unreachable dependency, an out-of-memory kill, or a liveness probe that's too aggressive.

Diagnose it methodically: kubectl describe pod for the exit code and events, kubectl logs --previous for the dying instance's output, and the exit code itself as a shortcut (137 = memory, 1/2 = app error, 0 = no long-running process). Fix the root cause, loosen overly strict probes, and make startup resilient to slow dependencies. Then confirm with outside-in monitoring that users can actually reach the recovered service.

Know when a crash loop takes your service down

Start monitoring with Webalert ->

See features and pricing. No credit card required.

Kubernetes CrashLoopBackOff: Causes and How to Fix It

What CrashLoopBackOff Actually Means

The Common Causes

How to Diagnose It Step by Step

How to Fix the Common Cases

How Webalert Helps

Summary

Know when a crash loop takes your service down

Related Articles

Kubernetes ImagePullBackOff and ErrImagePull: How to Fix

Kubernetes OOMKilled (Exit Code 137): Causes and Fixes

Docker Container Unhealthy: How to Debug Health Checks

Stop guessing about downtime