Kubernetes OOMKilled (Exit Code 137): Causes and Fixes

A container that was running fine suddenly restarts, and kubectl describe pod shows the grim detail: Last State: Terminated, Reason: OOMKilled, Exit Code: 137. No stack trace, no application error — the process was killed from underneath the app, mid-work. This is Kubernetes (really, the Linux kernel) enforcing a memory limit: your container tried to use more memory than it was allowed, and it got terminated for it.

OOMKilled is one of the most common reasons pods restart, and unlike a code crash it leaves almost no trace in your application logs. This guide explains what's happening, why exit code 137 appears, and how to diagnose and fix out-of-memory kills for good.

What OOMKilled and Exit Code 137 Mean

OOM stands for Out Of Memory. When a container exceeds its memory limit, the Linux kernel's OOM killer steps in and sends the process a SIGKILL — an immediate, unblockable termination. Kubernetes reports this as OOMKilled.

The number 137 is the giveaway: exit codes for signal-terminated processes are 128 + signal number, and SIGKILL is signal 9, so 128 + 9 = 137. Any time you see exit code 137, think killed by SIGKILL — and in a container with a memory limit, that almost always means an out-of-memory kill.

Because it's a SIGKILL, the process gets no chance to clean up or log anything. That's why OOM kills are confusing: the app logs just stop, with no error, and the only evidence is in the pod's state and events — not in the application's own output.

Requests vs Limits: The Root of It

To understand OOM kills you have to understand Kubernetes' two memory settings:

requests — the memory the pod is guaranteed. The scheduler uses this to decide which node the pod fits on.
limits — the hard ceiling. Exceed the memory limit and the container is OOMKilled. No grace, no swap (by default).

Two distinct OOM scenarios follow from this:

Container exceeds its own limit. The most common case: your container's limit is 512Mi, it tries to use 600Mi, and it's killed. This is a per-container limit kill.
Node runs out of memory. If pods collectively use more than the node has (often because requests were set too low and the node is oversubscribed), the kernel kills processes to reclaim memory — and pods using more than their request are first in line. This is node memory pressure, and it can take down pods that were within their own limits.

The relationship between requests and limits also sets the pod's Quality of Service class (Guaranteed, Burstable, BestEffort), which determines eviction order under pressure. Pods with requests == limits (Guaranteed) are the last to be killed.

How to Diagnose an OOM Kill

Confirm it's actually OOM. kubectl describe pod <name> — look for Reason: OOMKilled and Exit Code: 137 under Last State. The Events may also show memory-pressure evictions.
Distinguish limit-kill from node-pressure. If a single container is OOMKilled while the node looks healthy, it's hitting its own limit. If multiple pods are being killed or evicted at once, suspect node-level memory pressure.
Look at actual usage. kubectl top pod <name> (and kubectl top nodes) shows real-time memory; compare it to the configured limit. A metrics dashboard showing usage over time is even better — it reveals whether the app spikes or grows steadily.
Spike or leak? A steady climb toward the limit over hours or days is a memory leak. A sudden jump under load is a spike (a big request, a large batch, an unbounded cache). The fix differs.
Check recent changes. A new deploy, a dependency upgrade, or a traffic increase often precedes a fresh wave of OOM kills.

How to Fix It

Match the fix to what you found — don't just blindly raise the limit:

Right-size the limit. If the app legitimately needs more memory than its limit allows, raise the limit (and usually the request with it). This is the correct fix when the limit was simply too low for the real workload.
Fix the memory leak. If usage grows without bound, raising the limit only delays the kill. Profile the app, find what's retained (unbounded caches, accumulating buffers, leaked connections), and fix it. Monitoring memory as it trends abnormally upward catches leaks before the kill.
Cap what the app thinks it has. Runtimes that auto-size to the node's memory rather than the container's limit will overshoot. Set heap/cache limits explicitly (e.g. JVM -XX:MaxRAMPercentage, Node --max-old-space-size) to fit inside the container limit.
Set requests sensibly. Avoid heavy node oversubscription by setting realistic requests, so the scheduler doesn't pack more onto a node than it can hold.
Bound spiky workloads. Page or stream large operations instead of loading everything into memory; cap batch sizes and cache growth.

A practical rule: raise the limit to fix an undersized container; fix the code to stop a leak. Confusing the two is how teams end up with pods that need 8Gi to do a 512Mi job.

How Webalert Helps

OOM kills are an internal cluster event — but their impact is users hitting errors and timeouts when a pod dies mid-request. That's what outside-in monitoring catches:

Uptime and response-time checks that reveal when OOM-driven restarts cause real user-facing failures — the degradation the cluster's "pod recovered" status hides.
External confirmation that endpoints respond correctly after you've resized limits or fixed a leak.
Multi-region monitoring to tell a genuine service outage from a single pod cycling.
Sustained-failure alerting so repeated OOM restarts that add up to real downtime page you, without noise from every brief blip.

Kubernetes restarts the OOMKilled pod automatically; Webalert tells you whether that automatic recovery was fast enough that users never noticed — or not.

Summary

OOMKilled with exit code 137 means the kernel sent SIGKILL to a container that exceeded its memory — either its own limit or the node's available memory under pressure. Because it's a hard kill, the app logs nothing; the evidence lives in kubectl describe pod (Reason: OOMKilled, exit 137) and in usage metrics.

Diagnose by confirming it's truly OOM, separating per-container limit kills from node memory pressure, and checking whether memory spikes or leaks. Then fix the right thing: raise an undersized limit, fix a genuine leak, cap the runtime's memory to the container limit, and set realistic requests to avoid oversubscription. Watch memory trends so you catch leaks before the kill — and use outside-in monitoring to confirm users stayed unaffected through the restart.

See when memory kills actually hurt your users

Start monitoring with Webalert ->

See features and pricing. No credit card required.

Kubernetes OOMKilled (Exit Code 137): Causes and Fixes

What OOMKilled and Exit Code 137 Mean

Requests vs Limits: The Root of It

How to Diagnose an OOM Kill

How to Fix It

How Webalert Helps

Summary

See when memory kills actually hurt your users

Related Articles

Kubernetes CrashLoopBackOff: Causes and How to Fix It

Kubernetes ImagePullBackOff and ErrImagePull: How to Fix

Memory Leaks in Production: Causes, Detection, and Fixes

Stop guessing about downtime