Skip to content

5xx Server Errors Explained: 500, 502, 503, 504 Fix Guide

Webalert Team
May 27, 2026
16 min read

5xx Server Errors Explained: 500, 502, 503, 504 Fix Guide

A 5xx status code means the server knows something went wrong on its side and is owning it. Unlike a 4xx error - which says "you (the client) did something wrong" - a 5xx is your problem to fix. And the specific code carries an enormous amount of information about which layer broke.

This is the per-code diagnostic guide. For each of the major 5xx codes - 500, 502, 503, 504, plus Cloudflare's 520-526 and a few rarer cases - what it actually means, the most common real-world causes, and how to confirm and fix it fast.

For the broader monitoring policy (alert thresholds, error budgets), see 5xx Error Rate Monitoring. For all status codes including 1xx, 2xx, 3xx, 4xx, see HTTP Status Codes Explained.


The 5xx Family At A Glance

Code Name Layer that failed Owner
500 Internal Server Error Application App team
501 Not Implemented Application / framework App team
502 Bad Gateway Proxy ↔ upstream Platform / infra
503 Service Unavailable Application / load balancer App / platform
504 Gateway Timeout Proxy waiting on upstream Platform / app
505 HTTP Version Not Supported Server config Platform
507 Insufficient Storage Disk / object storage Platform
508 Loop Detected App routing App team
511 Network Authentication Required Captive portal Network
520-526 Cloudflare-specific Cloudflare ↔ origin Origin / Cloudflare

The most useful diagnostic step is always: which layer returned the 5xx? A 502 from the load balancer means something very different from a 502 from the WAF in front of it.


500 Internal Server Error

The server encountered an unexpected condition that prevented it from fulfilling the request.

A 500 means the application crashed or threw an uncaught exception while handling the request. The framework caught it and turned it into a generic 500. It is the most common 5xx in poorly observed systems because everything that is not specifically handled bubbles up as 500.

Common causes

  • Uncaught exception in a route handler (null deref, divide by zero, missing key).
  • Database query exception (connection lost, deadlock, broken constraint).
  • Misconfiguration loaded at request time (missing env var, bad credentials).
  • Out-of-memory kill (OOM) on the worker, restart in progress.
  • A deploy that included a startup bug only visible under traffic.

Diagnose

curl -sSI -L https://example.com/path-that-fails | head -10
curl -sSL https://example.com/path-that-fails | head -40

Then check:

  • Application logs around the request id / trace id.
  • Error tracker (Sentry, Rollbar, Honeybadger, etc).
  • Recent deploy timeline. See CI/CD Pipeline Monitoring.
  • Database error log for connection pool exhaustion or slow query timeouts. See Database Monitoring.

Fix

  • Roll back the recent deploy if the spike correlates.
  • Add the missing null check / error handler.
  • Increase connection pool size or DB instance size if it is saturated.
  • Add a real exception boundary so the error is logged with context, not just turned into 500.

A 500 is the application's problem. The proxy is just the messenger.


501 Not Implemented

The server does not support the functionality required to fulfill the request.

Rarer than 500. Two real-world causes:

  • The HTTP method is not supported (e.g. PATCH on a server that only knows GET/POST/PUT/DELETE).
  • The framework returned this for an unimplemented endpoint or feature flag off in production.

If users hit a 501 on a request your client sent, your client and your server disagree about the API contract. Audit and align.


502 Bad Gateway

The server, while acting as a gateway or proxy, received an invalid response from the upstream server.

This is the proxy's verdict on the upstream. NGINX, an AWS ALB, a Kubernetes ingress, Cloudflare, or any reverse proxy returns 502 when the thing it tried to talk to gave it something it could not parse.

Common causes

  • Upstream service is down or restarting (no listening process on the port).
  • Upstream process crashed mid-response.
  • Upstream timed out (sometimes also returns 504, depending on the proxy).
  • Upstream returned malformed HTTP (missing status line, bad chunked transfer).
  • TLS handshake to the upstream failed (cert mismatch, expired chain).
  • Connection pool exhausted between proxy and upstream.

Diagnose

# Confirm the proxy is reachable
curl -sSI https://example.com

# Compare with hitting the origin directly (if accessible)
curl -sSI https://origin.example.com

Then:

  • Check the upstream process is running and listening. See Port Monitoring.
  • Check upstream logs for crashes around the time of the 502.
  • Check container restarts and OOM kills. See Docker Container Monitoring and Kubernetes Monitoring.
  • Check NGINX error.log for upstream prematurely closed connection or recv() failed.

Fix

  • Restart or scale the upstream service.
  • Increase health-check sensitivity so the proxy stops sending traffic to a dying pod sooner.
  • Add a circuit breaker so the proxy gives a graceful 503 rather than a confusing 502.
  • Tighten the deploy strategy - 502s during deploy often mean traffic was sent to a not-yet-ready pod. See Health Check Endpoint Design.

503 Service Unavailable

The server is currently unable to handle the request due to a temporary overload or scheduled maintenance.

503 is the "we are intentionally not serving this right now" code. It says the proxy or the app deliberately refused.

Common causes

  • App is in maintenance mode.
  • Rate limiter or queue depth circuit breaker tripped.
  • Auto-scaler has no healthy pods.
  • Load balancer has zero healthy targets.
  • Application returned 503 because a critical dependency (DB, cache, queue) is down.

Diagnose

  • Check the Retry-After header - a well-behaved 503 includes it.
  • Check if a maintenance window is intentional. See Scheduled Maintenance Windows.
  • Check load balancer healthy-target count.
  • Check downstream dependencies (DB, cache, queue) for outages.

Fix

  • If maintenance is intentional but unannounced, document it on the status page.
  • If unintentional: roll back, scale up, restart the dependency that took the app to "unhealthy" state.
  • Add load shedding rather than crashing - returning 503 for excess load is healthier than 502 from crashed pods.

503 is often a better outcome than 500 or 502 - it means you saw the load and chose to shed rather than crash. But it has to come with an explanation on the status page or in your runbook.


504 Gateway Timeout

The proxy did not receive a timely response from the upstream server.

504 is the timeout-flavoured cousin of 502. The proxy reached the upstream, but the upstream took too long to respond.

Common causes

  • Slow database query blocking the worker.
  • Synchronous external API call exceeding the proxy timeout.
  • Long-running migration or background job in the request path.
  • Network partition between proxy and upstream.
  • An auth provider or third-party identity service is slow. See Auth Provider Monitoring.
  • An LLM or AI API call exceeded the timeout. See AI/LLM API Monitoring.

Diagnose

  • Look at p95/p99 latency for the affected route.
  • Trace a slow request end-to-end with distributed tracing - see OpenTelemetry Monitoring.
  • Check database slow-query log.
  • Check synchronous third-party calls in the request path.

Fix

  • Move slow operations behind a job queue. See Job Queue Monitoring.
  • Add timeouts at every external call, shorter than the proxy timeout, so you fail fast and gracefully.
  • Add database query indexes; tune connection pool.
  • Increase proxy timeout only as a last resort - long timeouts hide problems and tie up worker capacity.

505 HTTP Version Not Supported

The server does not support the HTTP protocol version used in the request.

Almost always a misconfiguration: a client trying HTTP/2 against a server that only speaks HTTP/1.1, or an old HTTP/0.9 request being rejected. Modern servers and clients negotiate version cleanly, so this is rare. Verify with:

curl --http1.1 -sSI https://example.com
curl --http2 -sSI https://example.com

If one works and the other does not, the issue is the unsupported version.


507 Insufficient Storage

The server is unable to store the representation needed to complete the request.

Originates from WebDAV but increasingly seen in modern APIs to indicate disk-full / object-store-full conditions. Common when:

  • A file-upload service ran out of disk on the worker.
  • An object storage bucket hit a quota or billing limit.
  • A database disk is full and writes are failing.

If you see this in production, treat it as a hard outage of the write path until storage is freed or expanded.


508 Loop Detected

The server detected an infinite loop while processing the request.

Almost always a misconfigured redirect chain or recursive include. Use:

curl -sSI -L --max-redirs 0 https://example.com

And see Redirect Chain Monitoring. The fix is to break the loop in the redirect rules.


511 Network Authentication Required

The client needs to authenticate to gain network access.

This is the captive-portal code (hotel Wi-Fi, conference network). If a user sees this for your site, they have a network problem, not a site problem.


Cloudflare-Specific 5xx (520-526)

Cloudflare returns its own 5xx codes when it cannot reach or interpret your origin. These are extremely valuable diagnostically.

520 Web Server Returned an Unknown Error

The origin returned an empty, unknown, or unexpected response. Common causes:

  • Origin process crashed mid-response.
  • Origin returned non-HTTP data on port 80/443.
  • Connection was reset mid-response.

Look at origin logs around the time of the 520. Often correlates with OOM kills.

521 Web Server Is Down

Cloudflare could not establish a TCP connection to the origin. Causes:

  • Origin process is not running.
  • Firewall is blocking Cloudflare's IP ranges (very common after a security rule change).
  • Origin is overloaded and refusing connections.

Verify Cloudflare's IP ranges are allow-listed in your origin firewall and security group.

522 Connection Timed Out

Cloudflare connected but the origin did not respond in time. Usually:

  • Origin is overloaded.
  • Network path is saturated.
  • Application is hanging in the request handler.

This is the Cloudflare equivalent of 504 at the edge.

523 Origin Is Unreachable

Routing problem. Cloudflare cannot find a network path to the origin IP. Often DNS or BGP-level:

  • Origin DNS record points at a non-routable IP.
  • Origin server was moved without updating DNS.

524 A Timeout Occurred

Cloudflare connected to origin, request was sent, but the origin took longer than 100 seconds to respond. Either:

  • The endpoint is genuinely slow (move to a queue).
  • A streaming or long-poll endpoint is hitting the Cloudflare timeout - use a different endpoint or move behind a WebSocket. See WebSocket Monitoring.

525 SSL Handshake Failed

TLS handshake between Cloudflare and the origin failed. Causes:

  • Origin certificate expired.
  • Origin certificate is self-signed and "Full (strict)" SSL mode is enabled in Cloudflare.
  • Cipher suite mismatch.

See TLS Configuration Monitoring.

526 Invalid SSL Certificate

Origin presented an invalid certificate (wrong host, expired chain, untrusted CA) and "Full (strict)" mode is enforcing validation.

For broader Cloudflare-related incidents - origin outages, edge errors, propagation lag - see Cloudflare Monitoring.


Differentiating 502 vs 503 vs 504

These three look similar at a glance and are constantly confused. The crisp distinctions:

  • 502 = "I (the proxy) got something garbled or nothing from the upstream."
  • 503 = "I am refusing to serve right now, on purpose or because nothing healthy exists."
  • 504 = "I waited too long for the upstream to answer."

If you cannot tell which is firing in your stack, look at the proxy logs - NGINX, Envoy, ALB access logs all distinguish.


How To Tell Where The 5xx Comes From

Production stacks layer many proxies and you need to know which layer returned the error:

Client → CDN/WAF → Load Balancer → Reverse Proxy → App

Add a unique server identifier header at each layer:

  • Cloudflare adds cf-ray automatically.
  • ALB adds x-amzn-trace-id.
  • NGINX should set x-served-by: nginx-<pod>.
  • Your app should return a custom x-app-version header.

Then:

curl -sSI https://example.com | grep -iE 'cf-ray|x-amzn-trace|x-served-by|x-app-version'

If you see cf-ray but no app headers - Cloudflare returned the 5xx, never reached the app. If you see app headers - your application generated the 5xx.

This single trick saves hours of "is it Cloudflare or us?" arguments during incidents.


Per-Code Cheat Sheet

Code First check Most common cause Owner
500 App logs / error tracker Uncaught exception, recent deploy App
501 Method support Wrong HTTP verb / unimplemented App
502 Upstream health Crashed worker, OOM, malformed response Platform
503 LB target count, maintenance flag No healthy targets / intentional Platform / app
504 Upstream latency Slow query, slow third-party call App / platform
507 Disk / object store Storage full Platform
508 Redirect chain Redirect loop App
520 Origin logs Origin returned garbage / crashed Origin
521 Origin process + firewall Origin down / blocking CF Origin
522 Origin latency Origin overloaded Origin
523 Origin DNS / route DNS pointing at unreachable IP Origin
524 Endpoint duration Origin took > 100s Origin
525 Origin TLS Expired / wrong cert Origin
526 Origin cert validity Cert wrong host / chain broken Origin

Monitoring 5xx Errors The Right Way

A few signals are worth tracking continuously, not just looking at after the fact:

  • 5xx rate per endpoint - not site-wide. A 5xx storm on /checkout and a quiet homepage look identical in a site-wide chart.
  • 5xx rate per status code - 500 spikes mean app crashes; 502/504 spikes mean infra issues. Treat them differently.
  • 5xx rate per region - regional spikes mean ISP / CDN issues, not app bugs.
  • 5xx by layer - using the headers above, attribute each 5xx to the layer that returned it.
  • p95 latency on the same route - 504s usually preceded by climbing latency.
  • External multi-region checks - so you know when 5xx hits real customers, not just internal monitoring.

For the alerting policy (thresholds, deduplication, paging rules), see 5xx Error Rate Monitoring and Alert Fatigue.


When 5xx Is Actually a 200 Lie

The opposite trap also happens: the app catches every exception, logs it, and returns 200 with an empty body or a friendly error page. The 5xx rate looks healthy because there are no 5xx codes - but real users are seeing a broken site.

Defend against this with content assertions on every monitored URL. See Response Body Validation Monitoring.


5xx Diagnosis Checklist

  • Captured the exact status code (500 ≠ 502 ≠ 504)
  • Identified the layer that returned it (Cloudflare, LB, proxy, app)
  • Correlated with recent deploys
  • Checked app logs / error tracker for stack traces
  • Checked upstream health (process running, ports listening)
  • Checked database / dependency status
  • Checked TLS certificate validity end-to-end
  • Checked CDN / WAF rules and IP allow-lists
  • Checked region distribution of failures
  • Documented the trigger and remediation in the runbook
  • Added or improved an alert so it does not surprise you next time

For the post-incident write-up, see Incident Post-Mortem Template.


How Webalert Helps

Webalert is built to catch 5xx errors from the outside, with enough detail to start the diagnosis:

  • External multi-region checks - 500/502/503/504 are recorded with the region that saw them, so you can tell global outages from regional ones.
  • Per-status code alerting - Separate notification rules for 5xx as a class, and for specific codes like 502 or 504 when you want them.
  • Content validation - Catch the "200 with a broken page" version of 5xx, where the app pretended everything was fine. See Response Body Validation Monitoring.
  • Latency alerts - Climbing latency typically precedes 504s; alert before the timeout fires.
  • TLS expiry warnings - Catch the cause behind 525 / 526 weeks before it happens.
  • Public status page - Customers see the incident as your monitor sees it, in real time.
  • Multi-channel alerts - Slack, Discord, Microsoft Teams, SMS.

Example Webalert check tuned for 5xx detection:

  • URL: https://example.com/checkout
  • Method: GET
  • Regions: US, EU, APAC
  • Frequency: every 60 seconds
  • Pass condition: HTTP 200 + body contains Continue to payment + response time under 2000ms
  • Alert: page on first 5xx in any region; SMS if 5xx for 3 consecutive checks
  • Tag: business-critical

Summary

5xx codes are not interchangeable. 500 is the app crashing; 502 is the proxy failing to talk to the app; 503 is the app or proxy intentionally refusing; 504 is the proxy waiting too long. Cloudflare 520-526 narrow the problem to the edge ↔ origin link.

Knowing which code, from which layer, on which endpoint, in which region, is the difference between "the website is broken" and "the auth dependency on /checkout is timing out in EU-West." The faster you can say the second sentence, the shorter the incident.


Catch 5xx errors before users do — and know which layer to blame

Start monitoring with Webalert ->

See features and pricing. No credit card required.

Monitor your website in under 60 seconds — no credit card required.

Start Free Monitoring

Written by

Webalert Team

The Webalert team is dedicated to helping businesses keep their websites online and their users happy with reliable monitoring solutions.

Ready to Monitor Your Website?

Start monitoring for free with 3 monitors, 10-minute checks, and instant alerts.

Start Free Monitoring