Memory Leaks in Production: Causes, Detection, and Fixes

A memory leak is one of the slowest, quietest reliability failures in production. The process starts at 200 MB of RAM, runs fine for hours or days, and slowly — almost imperceptibly — climbs. 300 MB. 500 MB. 1 GB. Eventually it hits the container's memory limit and the kernel kills it with OOMKilled, or the process starts thrashing the garbage collector and response times fall off a cliff, or it crashes outright. Then the orchestrator restarts it, the leak resets, and the cycle begins again — sometimes so cleanly that nobody notices until the restart frequency gets embarrassing. Memory leaks don't break things now; they break things later, which makes them both easy to miss and expensive to ignore.

This guide explains what memory leaks are, why they happen in managed languages, how to detect them, and how to fix them.

What a Memory Leak Actually Is

A memory leak is when a program allocates memory it never frees, even though it's done with it. Over time, the unreachable (but not-yet-reclaimed) memory accumulates, and the process's resident set size (RSS) grows without bound.

The common misconception is that leaks only happen in C/C++ where you manually malloc/free. In reality, leaks happen in every language with a garbage collector — including Go, Java, Node.js, Python, and Ruby — because the GC can only reclaim memory that's unreachable. If your code holds a reference to an object in a long-lived collection (a cache, a map, a module-level array), the GC considers it "in use" forever, even if you'll never look it up again. A leak in a managed language is just a reference you forgot to clear.

That distinction matters because it changes what you're looking for. You're not hunting for missing free() calls; you're hunting for things stuck in long-lived data structures.

Why Leaks Happen in Practice

The recurring patterns:

Unbounded caches. A Map or dict used as a cache with no eviction policy grows forever. The classic case: caching user sessions, request results, or computed values "for performance," and never cleaning up.
Event listeners / callbacks never removed. You attach a listener to an emitter on every request, but only remove it on the success path. The error path leaks a listener, and each listener holds references to its closure — request, response, user context — forever.
Closures capturing large scopes. A closure that captures a request object holds the entire request alive for as long as the closure lives, even if it only uses one field. Accumulate closures in a long-lived array and you've leaked the entire request history.
Globals and module-level state. Anything stored at module scope lives for the process lifetime. A "small registry" of seen IDs that never gets cleared is a leak.
Goroutines / threads that never exit. Each goroutine has its own stack and any memory it references stays alive as long as the goroutine does. A goroutine blocked forever on a channel that nobody writes to is a leak — of the goroutine and of everything it references.
Connection / resource pools that don't return. A connection borrowed from a pool but never returned isn't just a pool-exhaustion problem — the connection holds buffers and context that leak.
String / byte accumulation. Building an ever-growing log buffer, an in-memory request history, or a "recent events" list with no cap.

The shared shape: a long-lived data structure that accumulates references to short-lived work, with no upper bound and no eviction.

Why Leaks Are Hard to Catch

Leaks are sneaky because they're slow and because the symptoms look like other problems:

The process runs fine for hours before anything is wrong. Tests don't catch leaks because a 5-minute test never sees the accumulation that matters.
The first symptom is usually a restart, not an error. The container gets OOMKilled (exit code 137 — see our OOMKilled guide) and restarts, and the leak resets. The alert fires on the restart, not on the cause.
GC pressure looks like CPU pressure. As the heap grows, the GC runs more often and longer — CPU usage climbs, latency climbs, and it looks like a performance problem rather than a memory problem.
The leak only happens in production. Production traffic patterns, payload shapes, and error paths often differ from test, so the leaky path is one you never exercise in CI.

The result is the classic support ticket: "the pod restarts every 6 hours for no reason." There's a reason — you just haven't been watching memory.

How to Detect Memory Leaks

What to monitor:

RSS (resident set size) over time, per process. The single most useful signal. A healthy process's RSS is roughly flat or grows slowly to a steady state. A leaky process's RSS grows monotonically and never plateaus. Plot it as a time series, not just a current value.
Heap size and GC pause time. A growing heap with lengthening GC pauses is the signature of a leak. Most runtimes expose this — Go's runtime.ReadMemStats, the JVM's GC logs and JFR, Node's --inspect heap snapshot, Python's tracemalloc.
Restart frequency. A pod restarting on a regular cadence (every N hours) with OOMKilled is almost certainly leaking. The cadence is the leak's fingerprint.
Memory pressure relative to the limit. Watch RSS against the container's memory limit, not in absolute terms — a process at 80% of its limit is about to die.
GC frequency and pause duration. A sudden increase in GC frequency with no traffic change means the heap is under pressure.

Alert on a sustained upward RSS trend, not just a single high value — the trend is what distinguishes a leak from a one-time spike.

How to Fix Memory Leaks

Once you've confirmed a leak:

Take heap snapshots before and after. Two snapshots taken some minutes apart under load, diffed, show exactly which object types are growing — that's usually the leak. Most runtimes support this (heapdump in Node, jmap/JFR in the JVM, pprof in Go, tracemalloc in Python).
Find the long-lived reference holding the leaked objects. The growing objects are usually held by something else — a cache, a registry, a listener array. The fix is to remove the holding reference, not to "free" the objects directly.
Add an eviction policy to caches. Use an LRU/TTL cache (Go's lru, Node's lru-cache, Python's cachetools) instead of a raw Map/dict. Bounded caches can't leak by definition.
Remove listeners and cancel goroutines on every path. Use defer, finally, or context cancellation so cleanup runs on success and error paths.
Don't hold large objects in long-lived closures. If a closure only needs one field, extract it before capturing, so the rest of the object can be reclaimed.
Cap any "recent events" / history buffer. A ring buffer with a fixed size can't grow forever.
Set memory limits and restart thresholds. A memory limit turns an invisible leak into a visible restart — and a restart threshold (or graceful restart on high RSS) gives you a controlled recovery instead of an OOM kill.

The right long-term fix is almost always "bound the data structure that's growing." If it can't grow forever, it can't leak forever.

How Webalert Helps

Leaks are an internal process problem, but their symptoms are user-visible — and outside-in monitoring catches them:

Outside-in latency monitoring that catches the slow responses caused by GC pressure, often the first user-visible symptom of a leak before any restart happens.
Error and 5xx alerting for the failures that arrive when the process finally runs out of memory or restarts under load.
Restart-aware monitoring — if a leaking process keeps restarting, Webalert catches the user-facing interruptions and slow responses between restarts.
Confirmation of recovery — once you've added a bounded cache or fixed the listener leak, monitoring verifies real requests succeed on time and the restart-induced blips stop.

Webalert won't take your heap snapshot, but it tells you the moment a leak has crossed from a metric into a user-facing problem — and confirms when your fix held.

Summary

A memory leak is when a program allocates memory it never frees — and in managed languages, that almost always means a reference you forgot to clear, held by a long-lived data structure. The recurring patterns are unbounded caches, unremoved event listeners, closures capturing large scopes, globals, leaked goroutines, unreturned pooled resources, and uncapped buffers. Leaks are hard to catch because they're slow, the first symptom is usually a restart rather than an error, GC pressure looks like CPU pressure, and the leaky path often only exists in production.

Detect them by watching RSS over time per process (a monotonic climb is the signature), heap size and GC pause time, restart frequency with OOMKilled, and memory pressure relative to the limit — and alert on the upward trend, not a single value. Fix them by taking before/after heap snapshots, removing the long-lived reference, adding eviction to caches, cleaning up on every path with defer/finally/context, capturing only what you need in closures, capping history buffers, and setting memory limits. Pair internal memory metrics with outside-in monitoring so a slow leak never silently degrades into an outage.

Catch slow leaks before they cause a restart

Start monitoring with Webalert ->

See features and pricing. No credit card required.

Memory Leaks in Production: Causes, Detection, and Fixes

What a Memory Leak Actually Is

Why Leaks Happen in Practice

Why Leaks Are Hard to Catch

How to Detect Memory Leaks

How to Fix Memory Leaks

How Webalert Helps

Summary

Catch slow leaks before they cause a restart

Related Articles

Database Schema Migrations: Safe DDL Without Downtime

Backpressure Explained: Flow Control for Distributed Systems

Queue Depth Monitoring: Catch Backlog and Latency Before Users Do

Stop guessing about downtime