
Every engineering team wants to ship faster and break things less. The problem is most teams have no objective way to know whether they're actually getting better or just busier. That's the gap DORA metrics fill: four numbers, backed by years of research, that measure how well a team delivers and operates software — without descending into vanity metrics like lines of code or story points.
This guide explains the four DORA metrics from first principles — what each one measures, how they balance each other, what "good" looks like, and how to start tracking them without buying a new platform.
Where DORA Metrics Come From
DORA stands for DevOps Research and Assessment, a research program (later acquired by Google) that spent years studying what separates high-performing engineering organizations from low-performing ones. The headline finding was counterintuitive: speed and stability are not a trade-off. The best teams deploy more often and recover faster — they're better at both at once.
The research distilled performance down to four metrics, sometimes called the "four keys." Two measure throughput (how fast you deliver) and two measure stability (how reliably you operate):
- Deployment frequency — how often you ship to production.
- Lead time for changes — how long code takes to go from commit to production.
- Change failure rate — what percentage of deploys cause a problem.
- Time to restore service — how fast you recover when something breaks.
The point of tracking all four together is that they keep each other honest. Optimizing one in isolation is easy and usually counterproductive.
Metric 1: Deployment Frequency
Deployment frequency measures how often you successfully release to production. Elite teams deploy on demand — often multiple times per day — while lower performers deploy weekly, monthly, or less.
Why it matters: frequent, small deployments are safer, not riskier. A change set of ten lines is trivial to review, test, and roll back. A quarterly release bundling six months of work is a high-stakes event where any one of hundreds of changes can bring the system down, and bisecting the culprit is a nightmare.
How to measure it well:
- Count production deployments, not merges or builds. A deploy that never reaches users doesn't count.
- Track the trend, not a single number. Rising frequency with stable failure rate is the signal you want.
- Watch for batching — if frequency is low because changes pile up, that's a deployment risk waiting to happen.
Metric 2: Lead Time for Changes
Lead time for changes measures how long it takes a commit to reach production. It's the clock from "code committed" to "code running for users" — the latency of your delivery pipeline.
Short lead times mean fast feedback. A bug fix committed in the morning that's live by lunch lets you respond to incidents and customer needs in hours instead of weeks. Long lead times usually point to manual approval gates, flaky test suites, slow CI/CD pipelines, or heavyweight release processes.
How to measure it well:
- Measure from commit to production deploy, not from ticket creation — you're measuring the pipeline, not planning.
- Use a median or percentile, not an average, so one stalled change doesn't distort the picture.
- Break it down by stage (review, test, deploy) to find the actual bottleneck before you try to fix it.
Metric 3: Change Failure Rate
Change failure rate is the percentage of deployments that result in a degraded service — a rollback, hotfix, incident, or outage. If you deploy 100 times and 5 of those cause a problem, your change failure rate is 5%.
This is the first of the two stability metrics, and it's the counterweight to deployment frequency. Anyone can deploy more often by skipping tests — change failure rate is what stops that from looking like progress. Elite teams keep this number low (commonly cited as 0–15%) while deploying frequently.
How to measure it well:
- Define "failure" clearly: a deploy that needs a rollback, a hotfix, or triggers an incident. Be consistent.
- Express it as a ratio of failed deploys to total deploys, not an absolute count.
- Don't confuse it with bug count — it's specifically about change-induced failures, which is what tells you whether your delivery process is safe.
Metric 4: Time to Restore Service
Time to restore service measures how quickly you recover from a failure in production. It's closely related to MTTR — the mean time to restore — and it's the metric that acknowledges a hard truth: failures are inevitable, so what matters is how fast you bounce back.
A team that recovers in minutes can tolerate a higher change failure rate than one that takes hours, because the impact of each failure is small. This is why time to restore and change failure rate are read together: low failure rate with slow recovery is fragile; higher failure rate with fast recovery can be perfectly healthy.
How to measure it well:
- Start the clock at detection and stop it at service restored — which means detection speed is part of the number.
- Invest in the things that shrink it: good alerting, runbooks, clear escalation paths, and fast rollbacks.
- Track it per incident and look at the distribution, not just the mean — a single multi-hour outage tells you more than the average ever will.
How the Four Metrics Work Together
The reason DORA uses four metrics, not one, is that each pair guards against gaming the other:
| If you optimize... | ...without watching... | You get |
|---|---|---|
| Deployment frequency | Change failure rate | Fast, reckless shipping |
| Lead time | Change failure rate | Rushed, fragile releases |
| Change failure rate | Deployment frequency | Slow, fear-driven delivery |
| Throughput overall | Time to restore | Speed with no safety net |
The DORA research groups teams into performance tiers — commonly Elite, High, Medium, and Low — based on all four. Elite teams deploy on demand, have lead times under a day, keep change failure rate low, and restore service in under an hour. But the tier label matters far less than your own trend: are all four moving in the right direction over time?
How to Start Tracking DORA Metrics
You don't need a dedicated platform to begin:
- Deployment frequency and lead time come from your CI/CD system and version control — most pipelines already log deploy events and commit timestamps.
- Change failure rate comes from tagging which deploys caused incidents — a simple label in your incident tracker is enough to start.
- Time to restore comes from your incident timeline: detection timestamp to resolution timestamp.
Start with a spreadsheet if you have to. The discipline of consistently defining and recording the four events matters far more than the tooling. Once the definitions are stable, automating the collection is straightforward.
How Webalert Helps
DORA's two stability metrics depend entirely on detecting and confirming failures quickly — and that's where outside-in monitoring earns its place:
- Faster detection shrinks time to restore — Webalert checks your service from multiple regions and catches outages the moment users would, often before internal dashboards notice.
- Deploy-aware monitoring — correlate failures with releases so you can attribute them to a specific change and measure change failure rate accurately.
- Content validation, not just status codes, so a broken deploy that still returns
200 OKdoesn't slip through as a success. - Status pages and alerting to coordinate the response and stop the recovery clock as fast as possible.
Webalert won't compute your DORA dashboard for you, but it sharpens the two metrics that hurt most when they slip: how fast you know something broke, and how fast you can prove it's fixed.
Summary
The four DORA metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — are the most research-backed way to measure software delivery performance. Two measure throughput, two measure stability, and the power is in tracking all four together so neither speed nor safety gets gamed.
You don't need new tooling to start — just consistent definitions and a place to record four events. Watch your own trend over time rather than chasing a tier label, and remember the core DORA insight: the best teams aren't choosing between fast and stable. They're getting better at both.