
Nginx handles the first connection for millions of websites. It serves static files, terminates SSL, balances load across backends, and proxies requests to application servers. When Nginx has a problem, everything behind it is unreachable.
A misconfigured upstream block, an expired SSL certificate, a full disk preventing log writes, or a worker process crash can take down your entire stack — even if the application server is perfectly healthy.
This guide covers how to monitor Nginx from the outside (what users experience) and what to watch for so you catch problems before they cascade.
Why Nginx Monitoring Matters
Nginx sits at the edge of your infrastructure. It is the first thing a user's browser connects to and the last thing between your application and the internet. This position makes it both critical and a single point of failure.
Common Nginx failure scenarios:
- Configuration error after reload — A typo in
nginx.confcauses Nginx to serve the wrong content or return 502 errors after a reload - Upstream server unreachable — Nginx cannot connect to the backend application, returning 502 Bad Gateway
- SSL certificate expired — Browsers refuse to connect, showing security warnings
- Worker process crash — Under high load or memory pressure, worker processes die and connections are dropped
- Disk full — Nginx cannot write logs or cache files, causing unpredictable behavior
- Rate limiting misconfigured — Legitimate users are blocked by overly aggressive rate limits
- DNS resolution failure — Nginx cannot resolve upstream hostnames at startup or during reload
Each of these can happen while the underlying application is running fine. Monitoring only the application misses the entire Nginx layer.
What to Monitor
1) HTTP Endpoint Availability
The most important check: can a user reach your site through Nginx?
- HTTPS check on your domain — Verify Nginx returns a 200 status on your main URL
- Content validation — Confirm the response contains expected content, not an Nginx error page
- Multiple endpoints — Check both static assets and proxied paths to verify both Nginx and the upstream
An HTTP check catches the majority of Nginx failures because Nginx is the component serving the response. If Nginx is down, misconfigured, or cannot reach the upstream, the check fails.
2) SSL Certificate Health
Nginx typically handles SSL termination. Monitor:
- Certificate expiry — Alert at least 14 days before expiry so you have time to renew
- Certificate chain — Intermediate certificates must be correctly configured or some browsers will reject the connection
- Certificate mismatch — The certificate must match the domain being served
- Protocol and cipher support — Outdated TLS versions can be exploited or rejected by modern browsers
SSL failures are the most common Nginx-related outage that monitoring catches early. A certificate that expires at 3 AM on a Saturday will be detected by monitoring within 1 minute.
3) Response Time
Nginx should add minimal latency to requests. Track response times to detect:
- Upstream slowness — If the application server is slow, Nginx passes that latency through to users
- Proxy buffer issues — Misconfigured proxy buffers cause Nginx to spool to disk, adding latency
- Connection queue buildup — When
worker_connectionsis exhausted, new connections wait - Cache misses — If you use Nginx caching, a cache invalidation can cause a sudden spike in response times as the upstream is hit directly
Set response time alerts at a threshold that matches your normal baseline. If your site normally responds in 200ms and suddenly takes 2 seconds, something changed.
4) HTTP Status Codes
Monitor for specific error codes that indicate Nginx-level problems:
| Status Code | What It Means in Nginx Context |
|---|---|
| 502 Bad Gateway | Nginx cannot connect to the upstream server. App is down or unreachable. |
| 503 Service Unavailable | Nginx is rate limiting, or the upstream is marked as unavailable. |
| 504 Gateway Timeout | The upstream server took too long to respond. Nginx gave up waiting. |
| 499 | Client closed the connection before Nginx finished. Often indicates slow responses. |
| 413 Request Entity Too Large | client_max_body_size is too small for the request. |
| 444 | Nginx closed the connection without sending a response (used to drop malicious requests). |
Content validation on your monitoring checks should verify you are getting the expected response, not an Nginx error page that still returns 200 (which happens with custom error pages).
5) Port Availability
Monitor that Nginx is listening on the expected ports:
- Port 80 (HTTP) — Should either serve content or redirect to HTTPS
- Port 443 (HTTPS) — Primary SSL-terminated port
- Custom ports — If you run Nginx on non-standard ports for internal services
A TCP port check detects Nginx process crashes, bind failures, and firewall changes faster than an HTTP check because it does not wait for a full response.
6) DNS Resolution
If your domain points to the server running Nginx, monitor DNS:
- A/AAAA records — Verify the domain resolves to the correct IP
- Multiple DNS providers — If you use DNS failover, verify both resolve correctly
- TTL changes — Unexpected TTL changes may indicate DNS hijacking
Common Nginx Configurations and What to Monitor
Nginx as Reverse Proxy
The most common setup — Nginx in front of Node.js, Python, Ruby, PHP, or Go applications:
upstream backend {
server 127.0.0.1:3000;
server 127.0.0.1:3001;
}
server {
listen 443 ssl;
server_name example.com;
location / {
proxy_pass http://backend;
}
}
Monitor:
- HTTP check on the public URL (catches both Nginx and upstream failures)
- TCP port check on 443 (catches Nginx process failures)
- Response time (catches upstream slowness proxied through Nginx)
- SSL certificate (catches expiry and misconfiguration)
Nginx as Load Balancer
When Nginx distributes traffic across multiple backends:
upstream app_servers {
server 10.0.1.10:8080;
server 10.0.1.11:8080;
server 10.0.1.12:8080;
}
Monitor:
- HTTP check from multiple regions (verify load balancing works globally)
- Content validation (ensure all backends serve correct content — a misconfigured backend in the pool serves wrong data for some requests)
- Response time (a slow backend in the pool increases average latency)
Nginx Serving Static Files
When Nginx serves a static site directly:
server {
listen 443 ssl;
root /var/www/html;
index index.html;
}
Monitor:
- HTTP check on key pages (homepage, important landing pages)
- Content validation (verify files are served correctly, not 403 or directory listing)
- Disk space indirectly — if the disk is full, Nginx cannot write temp files and may fail
Nginx with Caching
When Nginx caches upstream responses:
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=cache:10m;
location / {
proxy_cache cache;
proxy_pass http://backend;
}
Monitor:
- Content freshness — Verify cached content is not stale beyond acceptable limits
- Response time — Cache misses cause latency spikes
- Post-deploy content validation — After deploying new content, verify the cache updates
Nginx Failure Modes and Detection
| Failure Mode | User Impact | Detection Method |
|---|---|---|
| Nginx process not running | Site completely down | TCP port check + HTTP check |
| Configuration syntax error after reload | Previous config still active (if Nginx caught it) or partial failure | HTTP check + content validation |
| SSL certificate expired | Browser security warning, site inaccessible over HTTPS | SSL monitoring |
| Upstream server down | 502 Bad Gateway errors | HTTP check + status code validation |
| Upstream server slow | Slow page loads, potential 504 timeouts | Response time monitoring |
| Worker connections exhausted | New connections rejected or queued | Response time monitoring + availability check |
| Disk full | Log write failures, cache failures, unpredictable behavior | HTTP check + content validation |
| DNS misconfiguration | Domain resolves to wrong IP | DNS monitoring |
| Rate limiting too aggressive | Legitimate users get 429 or 503 | Multi-region HTTP checks (detect if one region is being limited) |
| Proxy buffer misconfiguration | Large responses truncated or very slow | Content validation on pages with dynamic content |
client_max_body_size too small |
File uploads fail with 413 | API endpoint monitoring with payload validation |
Monitoring Nginx Across Environments
Production
- 1-minute HTTP checks on all public endpoints
- SSL certificate monitoring with 14-day expiry alerts
- TCP port checks on 80 and 443
- Response time alerts with tight thresholds
- DNS monitoring on all production domains
- Multi-region checks to verify global availability
Staging
- 5-minute HTTP checks on primary endpoints
- SSL monitoring (staging certs expire too)
- Content validation to catch configuration drift between staging and production
Development / Preview
- Basic HTTP check to verify the environment is accessible
- Useful for catching Nginx misconfigurations before they reach production
Troubleshooting with Monitoring Data
When monitoring detects an Nginx issue, the alert context points you to the right place:
HTTP check fails with connection refused:
→ Nginx process is not running. Check systemctl status nginx or your container orchestrator.
HTTP check returns 502:
→ Nginx is running but cannot reach the upstream. Check the application server, verify the upstream block in nginx.conf, and check network connectivity.
HTTP check returns 504:
→ Upstream is too slow. Check application performance, database queries, and proxy_read_timeout setting.
SSL check fails:
→ Certificate expired, chain incomplete, or wrong certificate served. Check ssl_certificate and ssl_certificate_key paths in the server block.
Response time suddenly doubled: → Possible upstream degradation, cache invalidation, increased traffic, or Nginx configuration change. Check recent deployments and upstream health.
Content validation fails but status is 200:
→ Nginx is serving a custom error page or fallback content. The upstream may be down but Nginx is returning a friendly error page. Check the upstream and any error_page directives.
How Webalert Helps
Webalert monitors your Nginx-served endpoints the way users experience them:
- 60-second HTTP checks from global regions — detect Nginx failures within 2 minutes
- SSL monitoring — alerts before certificates expire, catches chain and mismatch issues
- TCP port monitoring — detect Nginx process crashes independent of HTTP
- Response time tracking — catch upstream slowness and proxy configuration issues
- Content validation — verify Nginx serves correct content, not error pages
- DNS monitoring — detect resolution issues before they affect users
- Multi-region checks — verify Nginx serves correctly from every geography
- Multi-channel alerts — Email, SMS, Slack, Discord, Teams, webhooks
See features and pricing for details.
Summary
- Nginx is the front door to your infrastructure. When it fails, everything behind it is unreachable.
- Monitor HTTP endpoints through Nginx, not just the application behind it.
- SSL certificate monitoring prevents the most common scheduled Nginx outage.
- Response time tracking catches upstream problems that Nginx proxies to users.
- TCP port checks detect Nginx process crashes faster than HTTP checks.
- Content validation catches cases where Nginx returns 200 with wrong content.
- Monitor across environments — production, staging, and preview.
Nginx handles the connection. Monitoring proves it is handling it correctly.