
terraform apply completed successfully. Zero errors. The infrastructure change is live.
Twenty minutes later, users start seeing 502 errors. The load balancer target group is healthy according to Terraform state, but the new security group rules are blocking traffic on port 443.
Terraform tells you what it changed. Monitoring tells you whether what it changed actually works.
This guide covers how to monitor infrastructure-as-code deployments so you catch drift, validate changes, and detect the failures that Terraform's exit code cannot.
Why Terraform Needs Monitoring
Terraform manages infrastructure declaratively. You define the desired state, and Terraform makes it happen. But several things can go wrong that Terraform itself cannot detect:
- Apply succeeds, service breaks — A valid configuration change (new security group, DNS record, certificate) can be syntactically correct but operationally wrong.
- Drift between applies — Someone manually changes infrastructure through the console. Terraform state no longer matches reality.
- Partial applies — Terraform creates some resources but fails on others. The infrastructure is in an inconsistent state.
- Propagation delays — DNS changes, certificate provisioning, and load balancer registration take time. Terraform reports success before the change is fully live.
- Dependency timing — A database is created but not yet accepting connections when the application tries to connect.
Terraform ensures your infrastructure is configured correctly. Monitoring ensures it is working correctly.
What to Monitor After Terraform Applies
1) Endpoint Reachability
After any infrastructure change, verify that user-facing endpoints still work:
- HTTP/HTTPS checks on all public URLs
- API endpoint validation with auth headers
- Health endpoint checks on internal services
- DNS resolution for any changed or new domains
This is the single most important post-apply check. If users can reach the service and get correct responses, the apply was successful in the ways that matter.
2) SSL and DNS State
Terraform frequently manages certificates and DNS records. Monitor:
- SSL certificate validity and expiry on all domains
- DNS resolution returning expected IP addresses or CNAMEs
- Certificate chain completeness (intermediate certificates)
- HTTPS redirect behavior
DNS and certificate changes are among the most common causes of post-Terraform outages because propagation is not instant.
3) Network Connectivity
Security groups, firewalls, VPC configurations, and load balancer rules are common Terraform resources. After changes:
- TCP port checks on critical services (databases, caches, message queues)
- Verify load balancer health check targets are passing
- Confirm services can reach their dependencies
- Test cross-VPC or cross-region connectivity if changed
4) Service Health Post-Apply
Beyond reachability, validate that services are functioning correctly:
- Content validation on key pages (not just 200 status)
- Response time within expected thresholds
- Background job completion (heartbeat monitoring)
- Queue processing and worker health
A service can be reachable but broken if its configuration references a resource that Terraform changed.
Infrastructure Drift Detection
Drift is when the actual state of infrastructure diverges from what Terraform state says it should be. Common causes:
- Manual changes through cloud provider console
- Automated processes modifying resources outside Terraform
- Another team's Terraform workspace changing shared resources
- Cloud provider auto-updates or maintenance
How monitoring catches drift
External monitoring detects drift symptoms that terraform plan misses:
| Drift Type | Terraform Detection | Monitoring Detection |
|---|---|---|
| Security group rule added manually | Next plan shows diff |
Immediate — port check fails or succeeds unexpectedly |
| DNS record changed in console | Next plan shows diff |
Immediate — resolution check returns wrong IP |
| Certificate replaced outside Terraform | Next plan shows diff |
Immediate — SSL check detects new cert or expiry change |
| Auto-scaling changes instance count | Depends on lifecycle rules | Response time monitoring detects capacity changes |
| Database parameter changed manually | May not show in plan | Latency or error rate monitoring detects behavior change |
Monitoring catches drift impact in real time. terraform plan catches drift definition on the next run, which may be hours or days later.
Continuous monitoring as a drift signal
When your monitoring detects an unexpected change (new SSL cert, different DNS response, changed response content), it may indicate infrastructure drift worth investigating with terraform plan.
Terraform CI/CD Pipeline Monitoring
Most teams run Terraform through CI/CD pipelines. Monitor the pipeline itself:
Heartbeat monitoring for scheduled applies
If you run Terraform on a schedule (e.g., hourly drift detection plans), use heartbeat monitoring to verify the pipeline completes:
- Pipeline sends heartbeat signal after successful
terraform planorapply - If the heartbeat does not arrive on schedule, alert
- Catches pipeline infrastructure failures, credential expirations, and scheduler issues
Post-apply validation step
Add a validation step after terraform apply:
- name: Apply infrastructure
run: terraform apply -auto-approve
- name: Validate endpoints
run: |
# Wait for propagation
sleep 30
# Check critical endpoints
curl -sf https://api.example.com/health || exit 1
curl -sf https://www.example.com/ || exit 1
- name: Signal deploy complete
run: curl -fsS --retry 3 https://heartbeat.web-alert.io/your-id
This closes the loop between "Terraform says it worked" and "the infrastructure actually works."
Apply duration monitoring
Track how long terraform apply takes. A sudden increase may indicate:
- State file lock contention
- Provider API rate limiting
- Large-scale resource recreation
- Dependency resolution issues
Common Terraform Failure Modes
| Failure Mode | What Happens | Monitoring Detection |
|---|---|---|
| Security group blocks traffic | Service unreachable on specific ports | TCP port check + HTTP check failure |
| DNS record wrong or stale | Users hit wrong endpoint or get errors | DNS resolution check + content validation |
| Certificate not yet provisioned | HTTPS fails with SSL error | SSL check failure |
| Load balancer target unhealthy | 502/503 errors for some requests | HTTP check + error rate monitoring |
| Database connection string changed | App cannot connect to database | Health endpoint check + error rate |
| IAM permission removed | Service cannot access dependencies | Content validation + error monitoring |
| Auto-scaling policy misconfigured | Service cannot handle traffic | Response time monitoring |
| Partial apply leaves inconsistent state | Some resources created, others failed | Endpoint-level checks across all services |
| Provider API timeout during apply | Resources in unknown state | Post-apply endpoint validation |
Practical Setup for Terraform Teams
Minimum viable monitoring
For small teams running Terraform:
- HTTP check on every public endpoint after each apply
- SSL check on all domains managed by Terraform
- DNS check on all DNS records managed by Terraform
- Heartbeat for Terraform CI/CD pipeline completion
- Response time alert to catch performance regressions from infrastructure changes
Comprehensive monitoring
For teams with complex multi-environment Terraform:
- All of the above, plus:
- TCP port checks on databases, caches, and internal services
- Content validation on key endpoints (not just status codes)
- Per-environment monitoring (staging and production separately)
- Post-apply smoke test step in CI/CD pipeline
- Multi-region checks for globally distributed infrastructure
- Alerting per workspace/environment so the right team is notified
How Webalert Helps
Webalert validates that your Terraform-managed infrastructure works from the user's perspective:
- HTTP/HTTPS checks every minute from global regions — verify endpoints after every apply
- SSL monitoring — catch certificate provisioning failures and expiry drift
- DNS monitoring — detect record changes and propagation issues
- TCP port checks — validate database, cache, and service connectivity
- Content validation — confirm responses are correct, not just reachable
- Response time tracking — detect performance regressions from infrastructure changes
- Heartbeat monitoring — verify Terraform CI/CD pipelines complete on schedule
- Multi-channel alerts — Email, SMS, Slack, Discord, Teams, webhooks
- Status pages — communicate infrastructure incidents to users
See features and pricing for details.
Summary
terraform applysuccess does not guarantee working infrastructure.- Monitor endpoints, SSL, DNS, and connectivity after every Terraform apply.
- Use heartbeat monitoring to verify Terraform CI/CD pipelines complete.
- External monitoring catches infrastructure drift impact in real time.
- Add a post-apply validation step to close the loop between configuration and reality.
- Start with HTTP, SSL, and DNS checks on all Terraform-managed endpoints.
Terraform defines your infrastructure. Monitoring proves it works.