
Modern applications are mostly assembled from other people's APIs. A typical SaaS product calls Stripe for payments, Twilio or SendGrid for messaging, Auth0 or Clerk for identity, OpenAI or Anthropic for AI features, and a handful of data and infrastructure providers behind the scenes. Your own code might be solid, your uptime might be 99.99%, and your users can still be staring at a broken product because somebody else's API is down. When that happens, they don't blame the provider — they blame you. Your status page is the one they check, your support inbox is the one they email, and your SLA is the one you've now breached.
Third-party API monitoring is the practice of treating every external dependency as a first-class thing to monitor, with its own uptime, latency, and error-rate signals, so a provider outage reaches you in seconds instead of arriving as a customer complaint. This guide covers what to monitor, how to monitor it (including the authenticated and webhook cases), and how to alert without drowning in noise.
Why third-party dependencies are a reliability blind spot
Most teams under-monitor their dependencies for three reasons:
- The provider has a status page, so we assume we'll know. Provider status pages lag real outages by minutes — sometimes tens of minutes — and they almost never report partial degradation, elevated latency, or failures specific to your account region. By the time it's on their status page, your users have already noticed.
- The dependency is wrapped in a library that swallows errors. SDKs retry silently, return cached values, or raise generic exceptions that look like your own code's fault. A Stripe library retrying a transient 5xx for 30 seconds looks identical from the outside to your own service hanging.
- Your SLA is the one that counts, not theirs. A provider offering 99.9% uptime is permitted ~43 minutes of downtime a month. If you have ten such providers, your composite dependency uptime can be far worse than any individual SLA — and your users experience the composite. Read how to evaluate a vendor SLA for the math.
The fix isn't to abandon third-party APIs — you can't — it's to monitor them directly so you know, independently, when they're failing.
Map your dependency graph
Before you can monitor dependencies, you have to know what they are. Most teams are surprised by how many external calls their app makes once they sit down to list them. Group them by function:
- Payments & billing: Stripe, PayPal, Adyen, Braintree, Recurly, Paddle.
- Communications & email: SendGrid, Postmark, Resend, Twilio, SES, Intercom, Customer.io.
- Auth & identity: Auth0, Clerk, Okta, WorkOS, Firebase Auth, Cognito. See our identity provider monitoring guide.
- AI providers: OpenAI, Anthropic, Google, Cohere, and the vector DBs and embedding services behind them. See AI/LLM API monitoring.
- Data & infrastructure: Cloudflare, Algolia, Elastic, Upstash, Vercel, Supabase, Snowflake.
- Business systems: Salesforce, HubSpot, Stripe webhook receivers, the ERP integration, the shipping API.
For each, capture the endpoint(s) you call, the auth method, the expected latency, and the impact of failure. The impact column is what drives alert routing — a failing analytics pixel is a note in a channel; a failing payment API is a page.
What to monitor per dependency
Every external dependency deserves the same four signals you'd want on your own service:
Uptime + status code
A synthetic check that hits the provider's API with an authenticated request (see monitoring authenticated APIs with bearer tokens and custom headers) and asserts the response. For providers that expose a status or ping endpoint (Stripe, Adyen), probe that. For providers that don't, use a read-only call (GET /v1/customers?limit=1) that exercises the real auth path without side effects. Alert on any non-2xx that isn't an expected 4xx.
Latency at p95
A provider returning 200 in 8 seconds is failing your users even if the status code is fine. Set response-time alerts at the p95 your product can tolerate — for a payment on the checkout path that's 2–3 seconds; for a background sync job it might be 30. Latency percentiles matter more than averages; see latency percentiles p50, p95, p99 explained.
Error rate (5xx and 429)
Watch the 5xx rate from your own integration's calls to the provider, and watch 429 separately — a 429 means you're being rate-limited, which is a different failure with a different fix (back off, reduce volume, request a quota increase). See API rate limit monitoring and 429 alerting and consuming rate-limited APIs.
Auth and token refresh
OAuth token refresh failures are a silent killer. Your integration works for an hour, then every call starts returning 401 because the refresh flow broke and nobody noticed. Monitor the token-refresh endpoint directly and alert on refresh failures before the access tokens expire.
Webhook delivery monitoring
Inbound webhooks are dependencies that call you, which inverts the monitoring model. You can't probe a webhook — you have to monitor that the webhook arrived, on time, with the right shape. Three things to watch:
- Arrival rate. If Stripe normally sends you 200 webhook events an hour and it suddenly drops to zero, either Stripe stopped sending or your receiver stopped accepting. A heartbeat monitor on "last webhook received N minutes ago" catches both.
- Signature verification and replays. Verify signatures on every event, reject replays, and make the handler idempotent so a provider's retry doesn't double-process. See incoming webhook endpoint monitoring and our broader webhook monitoring guide.
- Receiver health. The webhook endpoint itself is on the critical path — a 500 on
/webhooks/stripemeans missed payments events. Monitor it like any other endpoint and alert on 5xx.
Circuit breakers & fallbacks
Monitoring tells you a dependency is failing; resilience patterns decide what your app does about it. The two worth pairing with monitoring:
- Circuit breaker. After N consecutive failures, stop calling the dependency for a cooldown window, then probe to see if it's recovered. This prevents a slow dependency from dragging your whole service down with it. See circuit breaker pattern explained.
- Graceful degradation. When a dependency is unavailable, return a partial response or a cached value instead of failing the whole request. A search service down shouldn't 500 the entire product page. See graceful degradation.
The monitoring loop matters here: your circuit breaker tripping is itself an alert. If breakers are tripping frequently for the same provider, that's a signal to either request a higher quota, switch providers, or rethink the dependency.
Alerting strategy
The biggest mistake teams make with dependency monitoring is alerting on every provider hiccup. Dependencies will fail briefly and recover; if you page on every 5xx, the on-call engineer stops trusting the alerts. The fix is correlation and thresholds:
- Correlate dependency outage with your own errors. A Stripe 5xx that doesn't produce a user-facing 5xx (because your circuit breaker or retry handled it) is a note, not a page. A Stripe 5xx that is producing user-facing 5xx is a page. Alert on the conjunction.
- Use sustained windows, not instantaneous. A 60-second elevated error rate on a dependency is common; a 5-minute one is an incident. Alert on the latter.
- Suppress noise during known provider incidents. If the provider's status page confirms an incident, suppress duplicate alerts and post a single update to your own status page.
- Watch for alert flapping. A dependency oscillating between healthy and failing will flood your channel. See alert flapping detection for taming that.
How Webalert Helps
Webalert is built to monitor the APIs you depend on, not just your own URLs:
- HTTP and API monitors with custom headers and bearer auth let you probe Stripe, Twilio, SendGrid, Auth0, OpenAI, and any other provider the same way your own code does — see monitoring authenticated APIs.
- Response assertions verify the response body, not just the status code, so a provider returning 200 with an error payload in the JSON gets caught.
- Response-time alerts at p95 catch the slow-but-200 failures that erode your product experience without showing up in uptime metrics.
- Heartbeat monitoring watches inbound webhook receivers for "last event received" staleness, so a silent webhook receiver gets caught in minutes.
- Incident management correlates a dependency outage with your own error spike so you can see the full picture in one place — and write a clean post-mortem afterward.
- Public status page lets you communicate dependency impact to your users without exposing your internal monitoring stack.
- Multi-region checks distinguish a provider's regional degradation from a global outage, which is exactly the distinction you need when triaging.
Webalert won't make Stripe more reliable, but it will tell you the moment Stripe is failing for you, before your users do.
Summary
Your application's reliability is the product of your own uptime and every dependency's uptime, and your users experience the composite. Provider status pages lag real outages, SDKs swallow errors, and your SLA is the one that counts. Map your dependency graph (payments, comms, auth, AI, data, business systems), then monitor each dependency for four signals: uptime with authenticated probes, latency at p95, error rate split into 5xx and 429, and auth/token refresh. For inbound webhooks, monitor arrival rate, signature verification, and receiver health. Pair monitoring with circuit breakers and graceful degradation so a failing dependency doesn't take your product with it, and alert on correlation and sustained windows rather than every provider hiccup.
Dependency monitoring checklist
- Every external API dependency listed with endpoint, auth method, and impact-of-failure
- Each dependency probed with an authenticated request (not a public status page)
- Response-time alerts set at p95 per dependency (tighter for checkout-path deps)
- 5xx and 429 from your integration's calls tracked and alerted on separately
- OAuth token-refresh endpoints monitored directly
- Inbound webhook receivers covered by heartbeat ("last event received" staleness)
- Webhook handlers verify signatures and are idempotent
- Circuit breaker or timeout in place for each high-impact dependency
- Graceful degradation path defined for each dependency (cached/fallback response)
- Alerts correlate dependency outage with your own error rate, not raw provider 5xx
- Public status page ready to communicate dependency impact to users
Frequently Asked Questions
What is third-party API monitoring?
Third-party API monitoring is the practice of monitoring external API dependencies (Stripe, Twilio, SendGrid, Auth0, OpenAI, and so on) directly, with their own uptime, latency, and error-rate signals, instead of relying on the provider's status page. The goal is to detect provider outages and degradations that affect your users before your users report them.
Why can't I just rely on my provider's status page?
Provider status pages lag real outages by minutes, rarely report partial degradation or elevated latency, and never reflect failures specific to your account, region, or integration. By the time a provider posts an incident, your users have already experienced it. Direct monitoring tells you when a dependency is failing for you, independently.
How do I monitor an API that requires authentication?
Send an authenticated request to a read-only or status endpoint using bearer tokens or custom headers, and assert on the response body as well as the status code. Don't probe a side-effecting endpoint (never create a real charge to test Stripe). Webalert supports custom headers and bearer auth out of the box — see monitoring authenticated APIs.
How do I monitor inbound webhooks?
You can't probe a webhook — it calls you. Instead, monitor the arrival rate with a heartbeat ("last webhook received N minutes ago"), verify signatures and reject replays on every event, make the handler idempotent, and monitor the webhook receiver endpoint itself for 5xx. A drop in arrival rate with no provider incident means your receiver is the one failing.