
"We're on AWS, so we get 99.99% uptime." This sentence, spoken confidently in countless planning meetings, is wrong in at least three ways. Cloud SLAs are per-service, not per-provider; they often require specific architectures (multi-AZ, multi-region) to even apply; and the "guarantee" pays out in service credits worth a fraction of what an outage actually costs you.
This guide cuts through it. What AWS, Azure, and GCP actually promise, how their SLA tiers compare, how service credits really work, the fine print that voids your claim, and — most importantly — why the provider's SLA tells you almost nothing about your real availability.
For the underlying math of what each percentage means in downtime, see uptime SLA explained and the downtime calculator guide.
First: An SLA Is a Refund Policy, Not a Guarantee
The single most important reframe. A cloud SLA does not guarantee your service stays up. It defines:
- A target availability (e.g. 99.99% monthly).
- What the provider owes you if they miss it — almost always service credits, not cash, not consequential damages.
If AWS, Azure, or GCP blows your SLA during a major outage, you do not get compensated for your lost revenue, your reputation, or your engineers' weekend. You get a percentage discount on next month's bill for the affected service — and only if you file a claim, in time, with the right evidence.
So treat the SLA as the provider's statement of confidence and refund policy, never as your reliability plan.
The Availability Tiers (And What They Cost You in Downtime)
Cloud SLAs cluster around a handful of tiers. Here's what each allows:
| SLA | Downtime / month | Downtime / year | Typical cloud usage |
|---|---|---|---|
| 99.9% ("three nines") | ~43m 50s | ~8h 46m | Single-instance / single-AZ services |
| 99.95% | ~21m 54s | ~4h 23m | Many managed services baseline |
| 99.99% ("four nines") | ~4m 23s | ~52m 36s | Multi-AZ / zone-redundant configs |
| 99.999% ("five nines") | ~26s | ~5m 15s | Rare; specific premium/networking tiers |
Note how much architecture changes the number: the same service often offers 99.9% on a single instance and 99.99% only when deployed across multiple availability zones. The higher tier is something you have to architect for, not something you're handed.
AWS, Azure & GCP: How They Compare
The headline pattern is similar across all three, but the details differ.
AWS
- Per-service SLAs. EC2, RDS, S3, Lambda, etc. each have their own SLA document. There is no single "AWS SLA."
- Architecture-tiered. EC2, for example, offers a higher SLA for instances deployed across multiple AZs in a region than for a single instance.
- Credits: tiered — the further below target, the larger the credit percentage on the affected service.
- Claims: you must submit a claim (typically via support) within a defined window, with logs/timestamps.
Microsoft Azure
- Per-service SLAs, published centrally, often with explicit redundancy tiers (e.g. VMs: a higher SLA with availability zones / multiple instances, a lower one for single-instance with premium storage).
- Composite SLAs matter most. Azure documentation explicitly teaches that when you chain services (App Service + SQL + Storage), your effective SLA is the product of the components' SLAs — which is always lower than any single one.
- Credits: percentage of the service fee, scaling with how badly the target was missed.
Google Cloud (GCP)
- Per-service SLAs, with Covered Service definitions and explicit exclusions.
- Monthly uptime percentage is the standard measure; credits scale in tiers.
- Like the others, single-zone vs multi-zone / regional vs multi-regional deployments carry different commitments (e.g. AlloyDB, Cloud SQL, GCS classes).
- Credits: percentage of the bill for the covered service, applied to future billing.
The honest summary
For equivalent services and architectures, AWS, Azure, and GCP offer broadly comparable SLAs — typically 99.9% to 99.99% depending on redundancy. None of them meaningfully "wins" on paper. The differences that matter are in the exclusions, the claim process, and how composite SLAs stack — not in the headline number.
The Fine Print That Voids Your Claim
Every provider's SLA excludes large categories of downtime from counting against them. Read these carefully, because they're where claims die:
- Your fault. Misconfiguration, exceeding quotas/limits, your own code — excluded.
- Things outside their control. Internet/backbone issues, your own ISP, DNS you manage, force majeure.
- Beta / preview services. Almost never covered by an SLA.
- Scheduled maintenance. Often excluded or handled under separate notice terms.
- Not using required redundancy. If the 99.99% tier requires multi-AZ and you ran single-AZ, you get the lower tier (or nothing).
- Suspended/unpaid accounts. No credits while in arrears.
- Claim windows. Miss the filing deadline (often 30–60 days) and you forfeit the credit even for a legitimate breach.
The recurring theme: the burden of proof is on you. You must detect the outage, document it with timestamps and evidence, and file in time. If your own monitoring didn't catch it, you can't prove it happened.
Composite SLAs: Why Your Real Target Is Lower Than You Think
This is the calculation almost nobody does. When your app depends on multiple services in series, availability multiplies:
Effective SLA = SLA_compute × SLA_database × SLA_storage × ...
Worked example — three "99.99%" dependencies in the request path:
0.9999 × 0.9999 × 0.9999 = 0.9997 → 99.97%
Three four-nines services chained together yield 99.97%, not 99.99% — that's ~2.6 hours of allowed annual downtime instead of ~53 minutes. Add a CDN, a queue, an auth provider, and a payment gateway, and your real composite target erodes further.
Redundancy multiplies the other way (parallel components increase availability), which is exactly why multi-AZ and multi-region architectures exist. But the default assumption — "all my 99.99% services give me 99.99%" — is simply false. Model your dependency chain. The SLO and error-budget guide shows how to turn this into an operational target.
Why The Provider SLA Is Not Your Uptime
Even a perfectly-met cloud SLA does not equal what your users experience, because the provider only measures their slice:
- They measure the service, not your app. Your code, your config, your bad deploy, your expired cert, your DNS — all yours, none covered.
- They measure their definition of "available." A service can be "up" by the SLA's narrow definition while your users get errors.
- Regional blind spots. A provider region can be healthy while your users in a specific geography can't reach you due to networking or DNS.
- The last mile. CDN, TLS, DNS resolution, and the public internet sit between the cloud and your user — none of it in the provider's SLA.
This is the core reason you need independent, outside-in monitoring: the provider grades its own homework, on its own narrow terms. Your users grade the whole experience. For why location matters here, see multi-region monitoring.
How To Actually Use Cloud SLAs
- Read the per-service SLA for every service in your critical path — not "the AWS SLA."
- Compute your composite SLA across the dependency chain. That's your real ceiling.
- Check the redundancy requirements for the tier you're assuming, and architect to meet them (multi-AZ/region) if you need it.
- Set your own SLO below the composite ceiling, with an error budget — see SLOs explained.
- Monitor independently so you can both detect breaches and file evidence-backed credit claims.
- Quantify what downtime costs you so you size redundancy rationally — the downtime cost calculator guide helps.
- Don't over-buy nines. Going from 99.9% to 99.99% to 99.999% is exponentially expensive; match the tier to the actual business cost of downtime, not to vanity.
Quick Reference
- A cloud SLA is a service-credit refund policy, not a guarantee or insurance for your losses.
- SLAs are per-service and architecture-tiered — 99.99% usually requires multi-AZ/region.
- AWS, Azure, and GCP are broadly comparable; exclusions and claim rules matter more than the headline %.
- Composite SLAs multiply down — chained 99.99% services give less than 99.99%.
- The provider SLA is not your uptime; only independent monitoring tells you (and proves) what users actually got.
How Webalert Helps
A cloud SLA is only useful if you can detect breaches and prove them. Webalert is the independent witness:
- Outside-in uptime monitoring that measures what users experience — not the provider's self-graded metric.
- Multi-region checks to catch regional and last-mile failures the cloud SLA ignores — see multi-region monitoring.
- Timestamped incident history you can attach to a service-credit claim.
- TLS, DNS, and domain monitoring for the failure categories no cloud SLA covers.
- Status pages and SLA reporting so you can track your real availability against your own SLO, not just the provider's promise.
Pair it with the uptime SLA explainer to translate any provider's percentage into concrete allowed downtime.
Summary
Cloud SLAs from AWS, Azure, and GCP look reassuring and are broadly similar, but they're refund policies, not guarantees: per-service, architecture-tiered, full of exclusions, and payable only in service credits you must claim with evidence. Worse, chaining services multiplies your real availability target downward, and the provider's number reflects their narrow slice of the stack — not your users' end-to-end experience.
Use the SLAs to set a realistic ceiling, architect redundancy to the tier you actually need, and then monitor independently — because the only availability number that matters is the one your users (and your own monitoring) measure.