
Your app is humming along, then traffic ticks up and suddenly every request hangs or fails with FATAL: too many connections or connection pool timeout. The database itself looks healthy — CPU is fine, queries are fast — but your application can't get a connection to run them. This is connection pool exhaustion, one of the most common ways a perfectly healthy database becomes completely unusable, and it almost always strikes at the worst possible moment: under load.
This guide explains what a connection pool is, why it runs dry, and how to diagnose, size, and fix the problem before it takes your application down.
What a Connection Pool Is
Opening a database connection is expensive — a TCP handshake, authentication, and session setup every time. To avoid paying that cost on every query, applications use a connection pool: a fixed set of pre-opened connections that requests borrow, use, and return.
When a request needs the database, it checks out a connection from the pool, runs its queries, and returns it. If all connections are busy, the request waits for one to free up — up to a timeout, after which it fails. This works beautifully until demand for connections outstrips supply. Then the pool is exhausted: every connection is checked out, new requests queue, and once the wait timeout hits, they start failing.
There are really two limits in play, and either can be the bottleneck: the application pool size (how many connections your app will open) and the database's max connections (max_connections in Postgres/MySQL — the hard server-side ceiling). Exhaustion happens when you hit whichever is smaller.
What Causes Exhaustion
Pool exhaustion is rarely "we just need more connections." It's usually one of these:
- Connections held too long. The classic cause. A connection checked out for a slow query, an external API call made while holding the connection, or a long transaction keeps a connection busy far longer than needed — so the pool drains even at modest traffic. Holding a DB connection across a network call is a recipe for exhaustion.
- Connection leaks. Code that checks out a connection and never returns it — a missing
close(), an exception that skips cleanup, a transaction never committed or rolled back. Leaked connections are gone forever; the pool shrinks until it's empty. - Too many app instances. Each instance/pod has its own pool. Twenty pods × a pool of 20 = 400 connections demanded — which can blow past the database's
max_connectionseven though each pod looks reasonable. This is the most-missed cause in containerized and autoscaled setups. - Traffic spikes. A surge of concurrent requests all wanting connections at once, outpacing how fast queries return them.
- Slow queries under load. Slow queries hold connections longer, which under load cascades into exhaustion — a small slowdown becomes a full outage.
- A struggling database. If the DB itself slows (lock contention, a failover, saturated I/O), queries take longer, connections stay checked out, and the pool empties — exhaustion as a symptom of a deeper problem.
How to Diagnose It
- Read the error.
too many connections/remaining connection slots are reservedmeans you've hit the database'smax_connections. A client-sidepool timeout/unable to acquire connectionmeans your application pool ran dry first. The two point at different fixes. - Count active connections. Check the database directly —
SELECT count(*) FROM pg_stat_activity;(Postgres) orSHOW STATUS LIKE 'Threads_connected';(MySQL) — and compare tomax_connections. Watch how it trends under load. - Look for connections stuck
idle in transaction. In Postgres, connections in this state are holding a transaction open without doing work — a hallmark of leaks and of holding connections across slow external calls. - Correlate with query latency. If exhaustion tracks rising query times, the root cause is slow queries, not pool size. Fix the queries, not the number.
- Account for total demand. Multiply pool size by the number of running instances and compare to
max_connections. If that product exceeds the ceiling, you've found it.
A steadily climbing connection count that never drops back is a leak; a count that spikes with traffic and recovers is a sizing or slow-query issue. Watching the trend tells you which.
How to Fix It
Match the fix to the cause — blindly raising limits often makes things worse:
- Fix leaks first. Ensure every checkout is returned, even on error (use your language's
try/finally, context managers, or scoped helpers). One leak will exhaust any pool size eventually. - Hold connections for less time. Never make external API calls while holding a DB connection. Keep transactions short. Move slow work out of the request path. This is usually the highest-leverage fix.
- Size the pool deliberately. Bigger is not better — a pool larger than the database can serve just moves the queue from your app to the DB. A common rule of thumb for CPU-bound workloads is a relatively small pool (often around
cores × 2), because a few busy connections doing fast work beat hundreds fighting for resources. - Respect
max_connectionsacross all instances.instances × pool_sizemust stay safely under the database ceiling, with headroom for admin/monitoring connections. Right-size pools per instance as you scale out. - Add a connection pooler. For Postgres especially, a pooler like PgBouncer multiplexes many client connections onto few database ones — the standard fix for "too many instances" and serverless workloads that open connections per invocation.
- Tune timeouts. A sane checkout timeout fails fast instead of hanging forever, and pairs well with retries and backoff so a brief spike degrades gracefully instead of cascading.
How Webalert Helps
Connection pool exhaustion is an internal failure with a very external symptom: requests hang and time out for real users while your database dashboards look fine. That gap is where outside-in monitoring proves its worth:
- Endpoint and response-time checks that catch the latency and timeouts exhaustion causes — the user-facing impact a DB CPU graph won't show.
- Early warning on creeping latency, so you see response times climbing as the pool drains before it fully empties and starts erroring.
- Multi-region uptime checks that confirm whether the slowdown is hitting real traffic everywhere or just one path.
- Sustained-failure alerting that distinguishes a genuine pool-exhaustion outage from a momentary blip, without alert noise.
Webalert tells you your users are getting timeouts — the signal that sends you to check the pool before a slow drain becomes a hard outage.
Summary
A connection pool reuses expensive database connections, and it's exhausted when every connection is checked out and new requests queue until they time out. The database is often healthy — the bottleneck is connection availability, capped by either your app's pool size or the database's max_connections, whichever is smaller.
The real causes are usually connections held too long (slow queries, external calls inside transactions), outright leaks, or too many app instances each running their own pool. Diagnose by reading the exact error, counting active connections against the limit, and checking whether the count leaks upward or spikes with traffic. Then fix the right thing: close leaks, shorten how long connections are held, size pools modestly, respect max_connections across all instances, and add a pooler like PgBouncer when you scale out. Catch the symptom early with outside-in monitoring, and pool exhaustion stops being the outage that blindsides you under load.