
Your Django app deploys successfully. Gunicorn is running. The admin panel loads. Your homepage returns 200.
But the Celery worker died 30 minutes ago. User signup emails are queued and not sending. The periodic task that syncs data from your third-party API stopped running yesterday. Nobody noticed because the web layer is completely healthy.
Django applications have the same problem as every other framework-based app: the request-response cycle is only one part of what needs to work. Celery workers, beat schedulers, database connections, cache layers, and static file serving can all fail independently while your Django views return 200.
This guide covers everything to monitor in a production Django app so you catch failures across every layer.
What Makes Django Monitoring Different
A production Django app typically has:
- WSGI/ASGI server — Gunicorn, uWSGI, or Daphne serving Django views
- Celery workers — Processing background tasks (emails, notifications, data jobs)
- Celery Beat — Scheduling periodic tasks
- Database — PostgreSQL, MySQL, or SQLite
- Cache — Redis or Memcached for sessions, caching, and Celery broker
- Static and media files — Served via Nginx, WhiteNoise, or a CDN
- Django management commands — Custom
manage.pycommands run on schedule
A basic HTTP uptime check on your homepage only validates that Gunicorn is running and Django can render a view. Everything else can be broken.
What to Monitor
1) Web Endpoints and Health Check
The starting point — verify your app responds correctly:
- Homepage or primary landing page — Content validation, not just status code
- Login page — Verify authentication renders correctly
- API endpoints — Test the routes that power your frontend or integrations
- Django admin (
/admin/) — Should load the login form for unauthenticated requests
Create a dedicated health check view that tests internal dependencies:
# urls.py
from django.http import JsonResponse
from django.db import connection
from django.core.cache import cache
def health_check(request):
try:
# Test database
with connection.cursor() as cursor:
cursor.execute("SELECT 1")
# Test cache
cache.set("health_check", "ok", timeout=10)
cache.get("health_check")
return JsonResponse({"status": "healthy"})
except Exception as e:
return JsonResponse(
{"status": "unhealthy", "error": str(e)},
status=503
)
Monitor /health/ with response body validation — check for "status": "healthy", not just a 200 status code.
2) Celery Worker Monitoring
Celery workers fail silently. When they stop:
- Password reset emails are not sent
- User notifications queue up forever
- Background data processing halts
- Webhooks from third-party services are not handled
Monitor Celery workers with a heartbeat task:
# tasks.py
from celery import shared_task
import requests
@shared_task
def celery_heartbeat():
requests.get(
"https://heartbeat.web-alert.io/your-celery-worker-id",
timeout=10
)
# celery.py or settings/celery.py
from celery.schedules import crontab
app.conf.beat_schedule = {
"celery-heartbeat": {
"task": "myapp.tasks.celery_heartbeat",
"schedule": 300.0, # Every 5 minutes
},
}
If the heartbeat does not arrive within the expected interval, the Celery worker is down or stuck.
3) Celery Beat Monitoring
Celery Beat is the scheduler that triggers periodic tasks. It runs as a separate process and can fail independently of the workers. If Beat stops:
- Scheduled tasks no longer fire
- Data sync jobs stop running
- Report generation halts
- Cleanup tasks (session expiry, cache warming) stop
Monitor Beat with its own heartbeat task — distinct from the worker heartbeat so you can distinguish which process failed:
@shared_task
def beat_heartbeat():
requests.get(
"https://heartbeat.web-alert.io/your-celery-beat-id",
timeout=10
)
# Schedule it with Beat
"beat-alive": {
"task": "myapp.tasks.beat_heartbeat",
"schedule": 300.0,
}
4) Database Connectivity
Django's database layer can fail from:
- Connection pool exhaustion under concurrent load
- Database server running out of disk space
- Slow queries degrading all page performance
- A migration leaving the schema in an inconsistent state
- Replica lag if using read replicas
The health check endpoint above covers basic connectivity. Additionally:
- Response time monitoring — Database slowness shows as increased HTTP response times
- Monitor data-driven endpoints — An endpoint that queries the database fails or slows when the database has issues
5) Static and Media Files
Django's static file serving breaks in common ways:
collectstaticnot run after deploy — static files are missing or stale- WhiteNoise not configured correctly —
/static/returns 404 - Nginx misconfigured — media files at
/media/are not served - S3 or CDN permissions — uploaded files return 403
Monitor:
- HTTP check on a known static asset — e.g.,
https://yourapp.com/static/css/main.css - Content validation on pages — Verify pages load correctly (broken static files affect rendering)
6) SSL and Domain
- SSL certificate monitoring — Alert before expiry, critical for apps using Let's Encrypt
- DNS monitoring — Verify domain resolution
- HTTPS redirect — Confirm HTTP redirects to HTTPS
7) Post-Deployment Validation
Django deployments commonly break due to:
- Missing environment variable in production
- Migration not run after deploy
collectstaticnot run- Gunicorn/uWSGI not restarted after code update
- Celery workers not restarted, still running old code
After every deployment:
#!/bin/bash
# deploy.sh
python manage.py migrate --noinput
python manage.py collectstatic --noinput
supervisorctl restart gunicorn celery celerybeat
# Validate the app is healthy
curl -sf https://yourapp.com/health/ || exit 1
# Signal deploy completed
curl -fsS https://heartbeat.web-alert.io/your-deploy-id
Common Django Failure Modes
| Failure | User Impact | Detection Method |
|---|---|---|
| Celery worker crashed | Emails, jobs, notifications stop | Heartbeat from worker task |
| Celery Beat stopped | Periodic tasks stop running | Heartbeat from Beat schedule |
| Database connection exhausted | 500 errors on data-driven pages | Health endpoint + HTTP monitoring |
| Cache (Redis) down | Slow pages, session loss | Health endpoint + response time |
| Gunicorn not restarted after deploy | Old code still running | Post-deploy content validation |
| Missing env variable after deploy | Partial functionality, 500 errors | HTTP check + content validation |
collectstatic not run |
Broken CSS/JS, missing images | Content validation |
| Migration not applied | 500 errors on changed schema | Post-deploy health check |
| SSL certificate expired | Browser blocks the site | SSL monitoring |
| Static storage permissions wrong | 403 on static/media files | HTTP check on static asset URL |
| Disk full | Log write failures, gunicorn crashes | HTTP check fails with 503 |
Monitoring by Deployment Setup
Gunicorn + Nginx (Most Common)
Internet → Nginx (port 80/443) → Gunicorn (port 8000) → Django
- HTTP check on public URL — Tests entire stack
- TCP port check on port 8000 — Tests Gunicorn directly (if accessible)
- HTTP check on
/health/— Tests database and cache - HTTP check on a static asset — Tests Nginx static file serving
- Heartbeat for Celery worker and Beat
Docker / Container Deployments
- HTTP check on public URL — Tests the exposed container
- Health check endpoint — Included in Docker
HEALTHCHECKdirective - Heartbeat for Celery containers
- Response time monitoring — Container restarts cause brief latency spikes
Heroku / PaaS
- HTTP check on app URL — Tests dyno health
- Heartbeat for worker dynos (Celery)
- Response time alerts — Heroku throttles idle dynos, causing cold starts
- SSL monitoring — Heroku uses shared certificates; custom domain certs need monitoring
Cloud (AWS, GCP, Azure) with Auto-Scaling
- Multi-region HTTP checks — Verify the load balancer distributes correctly
- Health endpoint — Load balancer health checks should use the
/health/endpoint - Heartbeat for SQS/Pub-Sub-based Celery workers
- Response time — Auto-scaling lag causes temporary performance degradation
Practical Setup
Minimum for every Django app
- HTTP check on homepage — 1-minute interval, content validation
- HTTP check on
/health/— Validates DB and cache connectivity - Celery worker heartbeat — Every 5 minutes
- Celery Beat heartbeat — Every 5 minutes (separate from worker)
- SSL monitoring on all domains
Comprehensive setup
All of the above, plus:
- HTTP checks on critical API routes — With response body validation
- HTTP check on a static asset — Catch collectstatic failures
- Post-deploy validation heartbeat — Confirms deploy completed cleanly
- Response time alerts — Detect database and cache performance regressions
- Multi-region checks — Verify the app works globally
- DNS monitoring — Catch domain misconfiguration
How Webalert Helps
Webalert monitors your Django application across every layer:
- 60-second HTTP checks from global regions — catch Gunicorn failures fast
- Content validation — verify pages return correct content, not Django error pages
- Heartbeat monitoring — track Celery workers, Beat scheduler, and management commands
- SSL monitoring — catch certificate issues before they block users
- Response time tracking — detect database and cache performance regressions
- DNS monitoring — verify domain resolution
- Multi-channel alerts — Email, SMS, Slack, Discord, Teams, webhooks
See features and pricing for details.
Summary
- Django apps have multiple layers beyond the web — Celery workers, Beat scheduler, database, cache, and static files.
- HTTP uptime checks only cover Gunicorn and the view layer. Use heartbeat monitoring for Celery.
- A
/health/endpoint should test database and cache connectivity, not just that Django boots. - Run
collectstaticand migrations as part of every deployment, then validate with a post-deploy check. - Monitor Celery worker and Beat scheduler separately — both can fail independently.
- Start with homepage + health endpoint + worker heartbeat + Beat heartbeat + SSL.
Your views handle requests. Monitoring proves the entire application is working.