Drift Detection and Resolution Skill
Overview
This skill helps investigate and resolve infrastructure drift in HCP Terraform workspaces. Drift occurs when actual infrastructure state diverges from the Terraform configuration.
Prerequisites
- HCP Terraform Plus or Enterprise (health assessments feature)
- Health assessments enabled in workspace settings
- Authenticated with
hcptfCLI (TFE_TOKENor~/.terraform.d/credentials.terraform.io)
Core Concepts
Drift Types:
- delete: Resource was deleted outside Terraform - usually needs
terraform applyto recreate - update: Resource was modified outside Terraform - may need code update or apply
- create: Resource was created outside Terraform - may need import or removal
Resolution Strategies:
- Apply fixes infrastructure: Run
terraform applyto make infrastructure match code - Update code: Modify Terraform configuration to match actual infrastructure state
- Update variables: Change variable values if drift is due to outdated variable values
- Import resources: Import manually created resources into Terraform state
Workflow
1. Find Drifted Workspaces
Use Explorer API to find workspaces with drift:
# Find all drifted workspaces in organization
hcptf explorer query -org=<org-name> -type=workspaces \
-fields=workspace-name,drifted,resources-drifted,resources-undrifted \
| grep -v "false"
# Sort by most drifted resources
hcptf explorer query -org=<org-name> -type=workspaces \
-sort=-resources-drifted -page-size=20
2. View Drift Details
Get detailed drift information for a workspace:
# View drift and check results (URL-style)
hcptf <org> <workspace> assessments
# Or flag-based
hcptf assessmentresult list -org=<org-name> -name=<workspace-name>
This shows:
- Which resources drifted
- What actions are needed (delete, update, create)
- Which attributes changed (before/after values)
- Terraform check failures (continuous validation)
3. Understand the Configuration
Get workspace VCS information:
# View workspace details including VCS repo
hcptf explorer query -org=<org-name> -type=workspaces \
-fields=workspace-name,vcs-repo-identifier \
-filter="workspace-name:<workspace>"
Get commit information from the run:
# Get current run ID from workspace
hcptf <org> <workspace> # Shows CurrentRunID
# View configuration version with VCS commit info
hcptf <org> <workspace> runs <run-id> configversion
Or with JSON parsing:
RUN_ID=$(hcptf <org> <workspace> -output=json | jq -r '.CurrentRunID')
hcptf <org> <workspace> runs $RUN_ID configversion
For VCS-backed workspaces, this provides:
- Branch name
- Commit SHA
- Commit URL (direct link to GitHub/GitLab commit)
- Repo identifier (e.g.,
username/repo)
4. Analyze and Decide
Review the drift details to determine the appropriate action:
Decision Matrix:
| Action Type | Common Cause | Typical Resolution |
|---|---|---|
delete | Manual deletion, autoscaling | Apply to recreate |
update with infrastructure values | Manual changes, console edits | Apply to revert OR update code to match |
update with computed values | Normal operation (IPs, timestamps) | Update code/variables to match |
| Tags added | Tagging automation | Update code to include tags |
| Check failures | Cert expiration, thresholds | Varies by check type |
Questions to ask:
- Is the current infrastructure state correct?
- Should the code be updated to match reality?
- Is this drift expected (computed values, auto-scaling)?
- Do variables need updating?
5. Take Action
Option A: Fix infrastructure (apply)
# Create a run to fix drift
hcptf run create -org=<org> -name=<workspace> \
-message="Fix drift: [describe what's being fixed]"
# Monitor the run
hcptf run show -id=<run-id>
Option B: Update code
- Clone the repository (use VCS info from step 3)
- Checkout the branch that's connected to the workspace
- Update Terraform files to match actual infrastructure
- Commit and push - this triggers a new run automatically
Option C: Update variables
# List current variables
hcptf variable list -org=<org> -workspace=<workspace>
# Update a variable
hcptf variable update -org=<org> -workspace=<workspace> \
-key=<variable-name> -value=<new-value>
Option D: Import resources (for manually created resources)
# This requires terraform CLI access to the workspace
terraform import <resource_address> <resource_id>
6. Verify Resolution
After taking action, verify drift is resolved:
# Check if drift is cleared (may take a few minutes for next health check)
hcptf explorer query -org=<org> -type=workspaces \
-filter="workspace-name:<workspace>" \
-fields=workspace-name,drifted,resources-drifted
# View latest assessment result
hcptf assessmentresult list -org=<org> -name=<workspace>
Common Scenarios
Scenario 1: EC2 Instance Deleted (Autoscaling)
# 1. Identify drift
hcptf my-org my-workspace assessments
# Shows: aws_instance.app - Action: delete
# 2. Decision: Recreate the instance
# 3. Apply fix
hcptf my-org my-workspace runs create \
-message="Recreate deleted EC2 instance"
Scenario 2: Route53 Record IP Changed
# 1. View drift
hcptf my-org my-workspace assessments
# Shows: aws_route53_record.app - Action: update
# records: ["1.2.3.4"] -> ["5.6.7.8"]
# 2. Get VCS info to see the code
RUN_ID=$(hcptf my-org my-workspace -output=json | jq -r '.CurrentRunID')
hcptf my-org my-workspace runs $RUN_ID configversion
# Shows: CommitURL to review the configuration
# 3. Decision: Is new IP correct?
# - If YES: Update variable or code
# - If NO: Run apply to revert
# 4a. If updating variable:
hcptf variable update -org=my-org -workspace=my-workspace \
-key=app_ip -value="5.6.7.8"
# 4b. If reverting infrastructure:
hcptf my-org my-workspace runs create \
-message="Revert Route53 record to configured IP"
Scenario 3: Tags Added by Automation
# 1. View drift
hcptf my-org my-workspace assessments
# Shows: multiple resources - Action: update
# tags: {} -> {"Environment": "prod", "Owner": "team-a"}
# 2. Get code location
RUN_ID=$(hcptf my-org my-workspace -output=json | jq -r '.CurrentRunID')
hcptf my-org my-workspace runs $RUN_ID configversion
# Shows: RepoIdentifier, Branch, CommitURL
# 3. Decision: Tags are correct, update code to include them
# 4. Clone repo, checkout branch, update Terraform files with tags
# 5. Commit and push (triggers auto-run if VCS-backed)
Scenario 4: Certificate Expiring (Check Failure)
# 1. View checks
hcptf my-org my-workspace assessments
# Shows: TERRAFORM CHECK RESULTS
# tls_self_signed_cert.app - FAIL
# • Certificate will expire in less than 4 hours
# 2. Decision: Need to regenerate certificate
# 3. Update code to generate new cert or increase validity period
# 4. Apply changes
hcptf my-org my-workspace runs create \
-message="Regenerate expiring certificate"
Tips and Best Practices
- Regular monitoring: Check drift regularly using Explorer queries
- Investigate before fixing: Always understand WHY drift occurred
- Document decisions: Include clear messages when creating runs
- Use continuous validation: Enable Terraform checks to catch issues early
- Automate remediation: For known drift patterns, consider automated fixes
- Review permissions: Drift often indicates someone has console access they shouldn't
- Update documentation: If updating code to match infrastructure, document why
Troubleshooting
No assessment results found:
- Health assessments may not be enabled in workspace settings
- Feature requires HCP Terraform Plus or Enterprise
- Enable via UI or
hcptf workspace update
401 Unauthorized on assessment results:
- Check your authentication token is valid
- Ensure you have at least read access to the workspace
Drift shows but no resources listed:
- May be computed attribute changes (not shown in detail)
- Check the log output URL for full details
Apply doesn't fix drift:
- Drift may be due to external automation re-applying changes
- Consider updating code to match the pattern
- Investigate what's causing the external changes
Related Commands
hcptf explorer query- Query workspaces and resourceshcptf assessmentresult list- View drift and check detailshcptf workspace read- Get workspace detailshcptf run create- Create a run to fix drifthcptf run show- Monitor run progresshcptf variable list/update- Manage workspace variableshcptf configversion read- Get VCS commit information
Agent Considerations
When building agents that handle drift:
- Assess severity: Check if drift is critical or expected
- Get approval: Don't auto-fix drift without user review
- Provide context: Explain what changed and why
- Link to code: Use VCS URLs to show relevant code
- Suggest actions: Present options with pros/cons
- Verify resolution: Confirm drift is cleared after action
