
When you stand up monitoring for a new service, the hardest question isn't how to collect metrics — it's which metrics actually matter. Two well-known frameworks answer that question from opposite ends: the RED method looks at a service from the outside (what requests experience), and the USE method looks at a resource from the inside (how full it is). They're often pitted against each other, but the truth is they're complementary — and the best monitoring uses both.
This guide explains what each method measures, where each shines and fails, and how to combine them so you cover both "are my users happy?" and "is my infrastructure about to fall over?"
The RED Method
The RED method, popularized by Tom Wilkie, is request-centric. For every service, you track three things:
- Rate — requests per second the service is handling.
- Errors — the number (or rate) of those requests that fail.
- Duration — the distribution of time those requests take.
RED is designed for request-driven services: web apps, APIs, microservices — anything that handles a stream of requests and returns responses. Its great strength is uniformity: every service is measured the same three ways, so a microservices fleet of 200 services has one consistent dashboard shape. That consistency is what makes RED scale across a large architecture — anyone can read any service's dashboard instantly.
If RED looks familiar, it should: it's essentially the four golden signals minus saturation. Rate ≈ traffic, Errors ≈ errors, Duration ≈ latency. RED drops saturation deliberately to stay simple and service-focused.
Measuring RED well:
- Track Duration as percentiles (p50/p95/p99), never averages — see latency percentiles explained.
- Express Errors as a ratio of total requests, and validate content, not just status codes — see 5xx error monitoring.
- Separate successful from failed request duration so fast errors don't skew your latency.
The USE Method
The USE method, created by performance engineer Brendan Gregg, is resource-centric. For every resource (CPU, memory, disk, network interface, connection pool), you check three things:
- Utilization — the percentage of time the resource is busy.
- Saturation — the degree to which work is queued because the resource can't keep up.
- Errors — the count of error events for that resource.
USE is designed for infrastructure and finite resources. Where RED asks "what are requests experiencing?", USE asks "which resource is the bottleneck?" It's a systematic checklist: enumerate every resource, then fill in U, S, and E for each. That exhaustiveness is its strength — it's a methodical way to find the one saturated resource causing a slowdown, rather than guessing.
Measuring USE well:
- Utilization can mislead. A resource at 100% utilization isn't necessarily a problem if there's no saturation; a resource at 60% with a growing queue is. Saturation is usually the more important of the two.
- Saturation is the leading indicator — queue depth, run-queue length, swap activity. It tells you what's about to break.
- Enumerate resources you don't normally think about: connection pools, file descriptors, thread pools, and downstream rate limits are common hidden bottlenecks.
RED vs USE: Side by Side
| RED method | USE method | |
|---|---|---|
| Focus | Services / requests | Resources / infrastructure |
| Viewpoint | Outside-in (user experience) | Inside-out (system internals) |
| Metrics | Rate, Errors, Duration | Utilization, Saturation, Errors |
| Answers | "Are users being served well?" | "Which resource is the bottleneck?" |
| Best for | APIs, microservices, web apps | Hosts, databases, queues, networks |
| Alerting role | Symptom-based paging | Capacity and root-cause analysis |
| Created by | Tom Wilkie | Brendan Gregg |
The key distinction: RED measures symptoms; USE measures causes. When RED tells you the service is slow or erroring, USE helps you find why — which resource ran out of headroom.
When to Use Which
You don't choose one. You layer them:
- Use RED for your services. Every user-facing service and microservice gets a consistent Rate/Errors/Duration dashboard. This is your front line — it's what you page on, because it reflects user pain directly.
- Use USE for your resources. Every host, database, queue, and pool gets a Utilization/Saturation/Errors view. This is your diagnostic layer — where you go after a RED alert fires to localize the cause.
A typical incident flows across both: a RED dashboard shows checkout Duration p99 spiking and Errors climbing (symptom) → you jump to the USE view of the database and see the connection pool saturated with a deep queue (cause) → you scale the pool or shed load. RED told you that something's wrong; USE told you what.
For the broader symptom-oriented framework that spans both, see the four golden signals, which adds saturation to RED and effectively bridges the two methods.
Common Pitfalls
- Only doing USE. Tracking CPU and memory everywhere but never measuring request latency or error rate means you'll watch a host look "healthy" while users get errors. Infrastructure metrics are not user experience.
- Only doing RED. Knowing the service is slow without resource visibility leaves you guessing at the cause during an incident, burning precious MTTR.
- Alerting on utilization. "CPU > 80%" is a classic noisy alert — high utilization is often fine. Alert on saturation and on RED symptoms instead, and avoid alert fatigue.
- Averaging duration. RED's Duration must be percentiles; an average will hide the tail your users actually feel.
- Forgetting hidden resources. The bottleneck is often a connection pool or downstream rate limit, not the obvious CPU/RAM.
How Webalert Helps
Webalert covers the RED side from where it matters most — outside your infrastructure, from your users' vantage point:
- Rate & Duration — multi-region response-time checks with per-geography latency percentiles.
- Errors — status-code and content validation, so silent failures don't pass as success.
- Symptom-based alerting that pages on user-facing degradation, with status pages to communicate during incidents.
Pair Webalert's outside-in RED view with your internal USE-style resource metrics, and you've got both halves: confirmation that users can reach you and how fast, plus the resource visibility to explain why when they can't.
Summary
RED (Rate, Errors, Duration) measures services from the outside and tells you whether users are being served well. USE (Utilization, Saturation, Errors) measures resources from the inside and tells you which one is the bottleneck. They aren't rivals — RED is your symptom-and-paging layer, USE is your diagnostic-and-capacity layer, and the four golden signals tie them together by adding saturation to RED.
Instrument every service with RED, every resource with USE, alert on RED symptoms and saturation (never raw utilization), and you'll both notice problems fast and know where to look when they happen.