Skip to content

Auth0, Okta, Clerk: Identity Provider Monitoring

Webalert Team
May 15, 2026
15 min read

Auth0, Okta, Clerk: Identity Provider Monitoring

Your application is up. Every health check is green. Your status page says 100% available. And no one can log in.

This is the IdP outage. Your product feels completely broken to users, even though your code is fine. The login page loads, the user types their password, the redirect to your identity provider happens — and then either the page hangs, returns a confusing error, or completes the round-trip with an unusable session.

For most B2B SaaS, identity is the single most consequential third-party dependency. Stripe down means new sign-ups can't pay. SendGrid down means transactional email is delayed. Auth0 / Okta / Clerk down means nobody can log in at all. The blast radius is closer to 100% than any other dependency in your stack.

And yet most teams monitor the identity provider exactly the way they monitor any other API: a periodic ping to a /health endpoint, maybe a status-page subscription. That misses the failure modes that actually matter — silent token correctness regressions, regional auth latency spikes, JWKS rotation problems, SSO connection drift — every one of which produces a "the product is broken" experience without a single 500 response in your logs.

This guide is the IdP-specific monitoring layer for Auth0, Okta, Clerk, AWS Cognito, Firebase Auth, Microsoft Entra ID, and the rest: what to actually watch, how to wire end-to-end synthetic login canaries, and how to alert without drowning in noise.


Why IdP Outages Are Uniquely Painful

Compared to almost any other dependency:

  • Total blast radius. A login outage affects 100% of authenticated traffic — every customer, every plan, every tier.
  • Indirect blame. Users see your login page hang or error. They blame your product, not Auth0 / Okta / Clerk. Trust damage is real.
  • No "graceful degradation." You can serve a degraded ecommerce page from cache. You cannot "kind of" log someone in.
  • Provider status pages lag reality. A regional Auth0 incident often shows green on status.auth0.com for 5–15 minutes after users start hitting it.
  • The failure mode is often correctness, not availability. Tokens are issued — but with stale claims, or wrong tenant ID, or missing roles. The API call returns 200; nothing catches it until users start filing tickets.

The job of IdP monitoring is to detect all four classes: regional latency degradation, partial-availability failures, status-page drift, and silent correctness regressions.


"Up" vs "Correct": The Distinction That Matters

The most-overlooked aspect of IdP monitoring is the difference between the IdP responded and the IdP responded correctly.

Layer What it means Failure mode Detection
Reachable Endpoint accepts TCP, responds DNS / TLS / outage HTTP uptime check
Available API returns 2xx Backend degraded Status-code check
Performant Latency is in baseline range Regional slowdown Response-time check
Correct Token contains the right claims, signed by the right key Wrong tenant, stale roles, missing scopes, JWKS drift Synthetic login + token-validation
End-to-end A full user flow succeeds Anywhere across the chain Synthetic browser canary

The first four are what most teams monitor. The last two are where the silent breakage hides. Without synthetic login canaries you cannot tell the difference between "Auth0 is fine" and "Auth0 issued a token but the audience claim is wrong because someone changed an Auth0 Rule yesterday."


What to Monitor for Any IdP

1) Login Success Rate (Real Users)

The single most important metric. Track per-method:

  • Password login success rate
  • Magic-link login success rate
  • Social login success rate (Google, Apple, GitHub, Microsoft)
  • SSO / SAML success rate
  • Passkey / WebAuthn login success rate
  • Token refresh success rate

Alert on:

  • Success rate < 98% for 5 minutes (notification)
  • Success rate < 90% for 2 minutes (page)
  • Drop > 2pp vs rolling 24h average for any login method (notification)

Why per-method matters: a Google-SSO outage at Google's side will drop the social-login number to zero while password login looks fine. You need to know which path is broken to respond correctly.

2) OIDC Discovery Endpoint

Every modern IdP exposes /.well-known/openid-configuration. It returns the issuer, JWKS URI, authorization endpoint, token endpoint, supported scopes, etc.

Monitor:

  • HTTP availability (1-minute interval)
  • Response-time p95
  • Content validation — issuer and jwks_uri must match expected values
  • Any change to the returned JSON (could be legit or could be a misconfiguration)

A subtle failure mode: the discovery endpoint loads but its jwks_uri now points somewhere unreachable. Token verification breaks everywhere downstream.

3) JWKS Endpoint and Key Rotation

JWKS (jwks_uri) is the set of public keys your APIs use to verify JWTs. When the IdP rotates keys without you noticing, every API request fails verification.

Monitor:

  • HTTP availability of jwks_uri
  • Number of keys returned (should be ≥ 1; usually 1–3 during a rotation window)
  • New kid (key ID) appearing — this is the leading indicator of a rotation. If your services cache JWKS for hours, you need them to refresh before the IdP cuts over to the new key.
  • Key removal — if a kid that was in use yesterday is gone today, sessions issued with that key now fail

4) Token Endpoint Latency and Errors

The OAuth /oauth/token endpoint (or equivalent) is the most-hit IdP endpoint in steady-state traffic. Watch:

  • p50, p95, p99 latency
  • Error rate by error code (invalid_grant, invalid_client, rate_limit_exceeded)
  • Token issuance success rate

Alert on p95 > 2× rolling 7-day baseline for 10 minutes — most token-endpoint slowdowns are regional and clear up, but a sustained 2× is usually the start of a real incident.

For session-based flows (Clerk, Cognito, NextAuth):

  • Session refresh success rate
  • Session cookie rotation cadence (alert if cookies stop rotating — could be a cookie domain bug)
  • Cross-site cookie behavior (SameSite, Secure, partitioned cookies) — third-party cookie restrictions break SSO flows in subtle ways

6) MFA Challenge Success Rate

If your IdP supports MFA, monitor each challenge type:

  • TOTP success rate
  • SMS challenge delivery + success rate
  • Push notification (Auth0 Guardian, Okta Verify, Microsoft Authenticator) success rate
  • Passkey / WebAuthn success rate
  • Email-magic-link success rate

A drop here is almost always the third party providing the challenge (Twilio for SMS, FCM/APNS for push) — not the IdP itself. See Third-Party Dependency Monitoring.

7) Tenant / Region Health (Multi-Tenant SaaS)

Auth0 tenants, Okta orgs, and Clerk applications are isolated by region. A failure in us-west-2 doesn't affect eu-central-1 — but your monitoring needs to know that.

  • Run synthetic checks from each region your tenants are in
  • Tag every alert with tenant and region
  • For B2B SaaS with per-customer SAML connections, monitor each connection separately — one customer's IdP can break without affecting others

See Multi-Tenant SaaS Monitoring: Per-Customer Uptime for the broader pattern.

8) Provider Status Page (Lagging Indicator Only)

Subscribe to the IdP's status-page RSS, but treat it as the lagging indicator. Your synthetic canary will see real incidents 5–15 minutes before the status page does. If your monitoring is only the provider's status page, you'll find out about the outage from your customers first.


Per-Provider Notes

Auth0

  • Tenant regions: US, EU, AU, JP, CA. Each region is isolated; status-page incidents are usually scoped to one region.
  • Custom domains: auth.yourapp.com adds a layer to monitor — DNS, custom-domain SSL cert, Auth0 routing
  • Rules / Actions / Hooks: code that runs during the auth flow. A slow Action adds latency to every login. Monitor with the Actions Latency metric in the Auth0 dashboard or via synthetic canary
  • Anomaly detection rules: legitimate users sometimes get blocked. Monitor the "blocked" rate alongside the success rate
  • Failure mode to watch: rule changes deployed yesterday silently broke the email_verified claim — synthetic canary that validates claims catches this

Okta

  • Universal Directory + Authentication policies: split between identity store and access policies. A policy change can lock out a subset of users while everyone else logs in fine
  • Workflows: similar to Auth0 Actions — monitor execution time and failure rate
  • Multiple issuers: Okta apps can use the Okta Org Authorization Server or a Custom Authorization Server. Make sure your monitor validates the right issuer
  • API rate limits: Okta enforces hard per-org rate limits. A traffic spike or a misbehaving integration eats the budget and breaks logins for everyone. Monitor X-Rate-Limit-Remaining headers — see API Rate Limit Monitoring

Clerk

  • __session cookie: short-lived session token rotated frequently. Failures here look like users mysteriously logging out
  • Webhooks: Clerk drives user-lifecycle events (user.created, session.created) via webhooks. If webhooks fail, your downstream user record falls out of sync — see Webhook Monitoring
  • Frontend SDK: the React/Next.js SDK has its own failure modes (JS bundle blocked, CSP header missing). Synthetic browser checks catch what API monitoring misses

AWS Cognito

  • User Pools vs Identity Pools: different APIs, different failure modes. Most apps care about User Pools.
  • Hosted UI: extra layer of HTML/CSS/JS Cognito serves. Branding customizations occasionally break with a Cognito release
  • Lambda triggers: pre-signup, post-confirmation, pre-token-generation Lambdas all run inline with auth. A slow trigger slows every login. Monitor Lambda metrics for these specifically
  • Regional: each AWS region is a separate User Pool. Catch regional outages separately

Firebase Auth

  • Google Cloud Identity Platform is the underlying service. Status updates land at status.cloud.google.com, not Firebase's own page
  • Quota errors: free-tier and paid quotas exist; running into the daily quota silently breaks logins for the rest of the day
  • Identity provider config drift: SAML / OIDC providers configured in the Firebase console can be edited by anyone with access — monitor the config or alert on changes

Microsoft Entra ID (formerly Azure AD)

  • Sign-in logs: Entra exposes a queryable sign-in log. A real-time stream of sign-in failures with reason codes
  • Conditional Access policies: similar to Okta policies — can unintentionally lock out a subset of users
  • B2C tenants: separate from B2B (workforce) tenants; consumer-facing flows have their own latency profile

Supabase Auth


SAML vs OIDC Monitoring

Most enterprise SSO is SAML. Most consumer / modern SaaS auth is OIDC (OpenID Connect). They fail differently.

OIDC monitoring

  • Discovery endpoint, JWKS, token endpoint, userinfo endpoint, authorization endpoint
  • Validate iss, aud, exp, iat, nonce on every token
  • JWT signature verification using current JWKS keys

SAML monitoring

  • IdP metadata XML — fetch it on a schedule and alert on changes; alert on expiration of embedded certs
  • ACS (Assertion Consumer Service) endpoint at your side — must accept signed assertions
  • Clock skew tolerance — SAML assertions have NotBefore / NotOnOrAfter windows that fail with skew > 5 min
  • Per-customer SP-initiated and IdP-initiated flow tests
  • Signature validation against the customer's certificate (which they rotate without telling you)

For B2B SaaS, the per-customer SAML test is non-negotiable. One enterprise customer's IdP can break without affecting any other customer.


The Synthetic Login Canary

The single highest-value monitor: script a full credential-grant flow and run it on a schedule.

Minimum viable version (API-only)

POST /oauth/token
  grant_type=password
  client_id=…
  username=monitoring-canary@yourapp.com
  password=…
→ Validate response:
  - HTTP 200
  - access_token present
  - id_token present
  - JWT signature verifies against current JWKS
  - claims include expected sub, aud, email, custom claims
  - exp is in the future

Run every 1–5 minutes from at least one region. Alert on any failure or on response-time > baseline.

Better version (real browser)

A scripted browser session that:

  1. Loads your login page
  2. Submits credentials
  3. Waits for the redirect dance to complete
  4. Hits an authenticated endpoint with the session
  5. Validates the response

This catches everything: DNS, TLS, CDN, custom domain, frontend JS, IdP, callback handler, and your own session middleware. It's also the most expensive to run and the noisiest, but for a critical login flow it's worth it.

Multi-region

Run the canary from at least 3 regions. A US-only canary will miss EU and APAC incidents. See Multi-Region Monitoring.


What Not to Monitor (the Common Mistake)

Most teams accidentally monitor IdP health via "401 rate from our backend API." This is a terrible IdP signal:

  • A 401 spike is usually a client bug, an expired token wave, a deploy that bumped audience values — not an IdP problem
  • A real IdP outage often manifests as users unable to authenticate in the first place, which never reaches your API at all
  • 401s from misbehaving clients (curl scripts, integration partners) drown the signal

Better signals:

  • Synthetic login canary outcome
  • Login success rate (measured at the start of the flow, before any token is issued)
  • IdP-side metrics where available

See Monitor Authenticated APIs With Bearer Tokens and Custom Headers for the patterns to monitor your authenticated APIs without falling into this trap.


Alerting Thresholds

Critical (page)

  • Synthetic login canary fails 2 consecutive runs
  • Login success rate < 90% for 2 minutes
  • JWKS endpoint returns < 1 key
  • Discovery endpoint returns non-200 for > 3 minutes

High (notification)

  • Login success rate < 98% for 5 minutes
  • Token-endpoint p95 latency > 2× baseline for 10 minutes
  • New kid appears in JWKS (informational — verify rotation is intended)
  • Single tenant / region success rate drops while others are healthy
  • MFA challenge success rate < 95% for 10 minutes

Informational

  • OIDC discovery JSON content change
  • SAML metadata content change
  • Provider status page update

See Alert Fatigue: Notifications That Get Acted On for the broader noise-reduction principles. For the rest of the auth-flow monitoring picture see Login & Authentication Flow Monitoring. For the broader API-uptime piece see API Uptime Monitoring: Health Checks.


IdP Monitoring Checklist

  • OIDC discovery endpoint monitored with content validation
  • JWKS endpoint monitored for availability and kid changes
  • Token-endpoint latency p50/p95/p99 tracked
  • Per-method login success rate tracked (password, magic link, social, SSO, passkey, refresh, MFA)
  • Synthetic login canary every 1–5 minutes with full JWT validation
  • Synthetic browser flow at least every 15 minutes for critical login paths
  • Multi-region canary execution (≥ 3 regions)
  • Per-tenant / per-region alerting tags
  • Per-customer SAML SP-initiated and IdP-initiated tests (B2B)
  • SAML certificate expiration alerts (90 / 30 / 7 day)
  • Custom-domain SSL cert expiration alerts
  • IdP rate-limit headers tracked
  • Webhook delivery success rate (Clerk, Auth0, others)
  • Provider status page subscription (lagging-indicator only)
  • Internal /internal/auth-health endpoint surfacing last-canary-result for external monitoring
  • Runbook covering "auth is down" vs "auth issued wrong claims"
  • No reliance on 401-rate-from-our-API as the primary IdP signal

How Webalert Helps Monitor Identity Providers

Webalert handles the external-monitoring layer for IdP stacks:

  • HTTP monitoring with content validation — Hit /.well-known/openid-configuration and the JWKS endpoint; alert when the issuer or jwks_uri content drifts
  • Authenticated endpoint monitoring — Validate your internal /internal/auth-health endpoint that surfaces last-canary outcome and JWT-verification result
  • Multi-region checks — Run from every region your users live in; catch regional Auth0/Okta/Clerk degradation before users do
  • Response time monitoring — Catch token-endpoint slowdowns before they cascade
  • SSL certificate monitoring — Custom-domain cert expiry, SAML signing-cert expiry
  • Status page integration — Communicate auth degradation to your customers
  • Multi-channel alerts — Email, SMS, Slack, Discord, Microsoft Teams, webhooks
  • 1-minute check intervals — Detect IdP regressions within 60 seconds
  • 5-minute setup — Add endpoints, point monitors at them, set thresholds

See features and pricing.


Summary

  • Identity providers are usually your single highest-blast-radius dependency. An IdP outage feels like a complete product outage.
  • Provider status pages lag reality by 5–15 minutes. Synthetic canaries are the only way to know in real time.
  • The most-missed failure mode is correctness: tokens issued but with wrong claims. Plain HTTP uptime never catches this.
  • Monitor across all four layers: reachable, available, performant, correct, end-to-end.
  • The single highest-value monitor is a scripted login canary that validates JWT claims and runs every 1–5 minutes from multiple regions.
  • Each provider — Auth0, Okta, Clerk, Cognito, Firebase, Entra — has its own failure modes (tenant regions, action latency, JWKS rotation, Lambda triggers, conditional-access policies). Tune monitors for the specific provider.
  • For B2B SaaS, per-customer SAML connection monitoring is non-negotiable.
  • Don't lean on "401 rate from our API" as the IdP signal — it's the noisiest possible proxy.

The good news: IdP outages are detectable well before users feel them, if you build the right canaries. The bad news: nobody builds them by default — most teams only put them in place after the first painful incident. Build them now and skip that incident.


Catch IdP outages and silent token regressions before your users log out

Start monitoring with Webalert →

See features and pricing. No credit card required.

Monitor your website in under 60 seconds — no credit card required.

Start Free Monitoring

Written by

Webalert Team

The Webalert team is dedicated to helping businesses keep their websites online and their users happy with reliable monitoring solutions.

Ready to Monitor Your Website?

Start monitoring for free with 3 monitors, 10-minute checks, and instant alerts.

Start Free Monitoring