Drift Detection and Resolution Skill

Overview

This skill helps investigate and resolve infrastructure drift in HCP Terraform workspaces. Drift occurs when actual infrastructure state diverges from the Terraform configuration.

Prerequisites

HCP Terraform Plus or Enterprise (health assessments feature)
Health assessments enabled in workspace settings
Authenticated with hcptf CLI (TFE_TOKEN or ~/.terraform.d/credentials.terraform.io)

Core Concepts

Drift Types:

delete: Resource was deleted outside Terraform - usually needs terraform apply to recreate
update: Resource was modified outside Terraform - may need code update or apply
create: Resource was created outside Terraform - may need import or removal

Resolution Strategies:

Apply fixes infrastructure: Run terraform apply to make infrastructure match code
Update code: Modify Terraform configuration to match actual infrastructure state
Update variables: Change variable values if drift is due to outdated variable values
Import resources: Import manually created resources into Terraform state

Workflow

1. Find Drifted Workspaces

Use Explorer API to find workspaces with drift:

# Find all drifted workspaces in organization
hcptf explorer query -org=<org-name> -type=workspaces \
  -fields=workspace-name,drifted,resources-drifted,resources-undrifted \
  | grep -v "false"

# Sort by most drifted resources
hcptf explorer query -org=<org-name> -type=workspaces \
  -sort=-resources-drifted -page-size=20

2. View Drift Details

Get detailed drift information for a workspace:

# View drift and check results (URL-style)
hcptf <org> <workspace> assessments

# Or flag-based
hcptf assessmentresult list -org=<org-name> -name=<workspace-name>

This shows:

Which resources drifted
What actions are needed (delete, update, create)
Which attributes changed (before/after values)
Terraform check failures (continuous validation)

3. Understand the Configuration

Get workspace VCS information:

# View workspace details including VCS repo
hcptf explorer query -org=<org-name> -type=workspaces \
  -fields=workspace-name,vcs-repo-identifier \
  -filter="workspace-name:<workspace>"

Get commit information from the run:

# Get current run ID from workspace
hcptf <org> <workspace>  # Shows CurrentRunID

# View configuration version with VCS commit info
hcptf <org> <workspace> runs <run-id> configversion

Or with JSON parsing:

RUN_ID=$(hcptf <org> <workspace> -output=json | jq -r '.CurrentRunID')
hcptf <org> <workspace> runs $RUN_ID configversion

For VCS-backed workspaces, this provides:

Branch name
Commit SHA
Commit URL (direct link to GitHub/GitLab commit)
Repo identifier (e.g., username/repo)

4. Analyze and Decide

Review the drift details to determine the appropriate action:

Decision Matrix:

Action Type	Common Cause	Typical Resolution
`delete`	Manual deletion, autoscaling	Apply to recreate
`update` with infrastructure values	Manual changes, console edits	Apply to revert OR update code to match
`update` with computed values	Normal operation (IPs, timestamps)	Update code/variables to match
Tags added	Tagging automation	Update code to include tags
Check failures	Cert expiration, thresholds	Varies by check type

Questions to ask:

Is the current infrastructure state correct?
Should the code be updated to match reality?
Is this drift expected (computed values, auto-scaling)?
Do variables need updating?

5. Take Action

Option A: Fix infrastructure (apply)

# Create a run to fix drift
hcptf run create -org=<org> -name=<workspace> \
  -message="Fix drift: [describe what's being fixed]"

# Monitor the run
hcptf run show -id=<run-id>

Option B: Update code

Clone the repository (use VCS info from step 3)
Checkout the branch that's connected to the workspace
Update Terraform files to match actual infrastructure
Commit and push - this triggers a new run automatically

Option C: Update variables

# List current variables
hcptf variable list -org=<org> -workspace=<workspace>

# Update a variable
hcptf variable update -org=<org> -workspace=<workspace> \
  -key=<variable-name> -value=<new-value>

Option D: Import resources (for manually created resources)

# This requires terraform CLI access to the workspace
terraform import <resource_address> <resource_id>

6. Verify Resolution

After taking action, verify drift is resolved:

# Check if drift is cleared (may take a few minutes for next health check)
hcptf explorer query -org=<org> -type=workspaces \
  -filter="workspace-name:<workspace>" \
  -fields=workspace-name,drifted,resources-drifted

# View latest assessment result
hcptf assessmentresult list -org=<org> -name=<workspace>

Common Scenarios

Scenario 1: EC2 Instance Deleted (Autoscaling)

# 1. Identify drift
hcptf my-org my-workspace assessments
# Shows: aws_instance.app - Action: delete

# 2. Decision: Recreate the instance
# 3. Apply fix
hcptf my-org my-workspace runs create \
  -message="Recreate deleted EC2 instance"

Scenario 2: Route53 Record IP Changed

# 1. View drift
hcptf my-org my-workspace assessments
# Shows: aws_route53_record.app - Action: update
#        records: ["1.2.3.4"] -> ["5.6.7.8"]

# 2. Get VCS info to see the code
RUN_ID=$(hcptf my-org my-workspace -output=json | jq -r '.CurrentRunID')
hcptf my-org my-workspace runs $RUN_ID configversion
# Shows: CommitURL to review the configuration

# 3. Decision: Is new IP correct?
#    - If YES: Update variable or code
#    - If NO: Run apply to revert

# 4a. If updating variable:
hcptf variable update -org=my-org -workspace=my-workspace \
  -key=app_ip -value="5.6.7.8"

# 4b. If reverting infrastructure:
hcptf my-org my-workspace runs create \
  -message="Revert Route53 record to configured IP"

Scenario 3: Tags Added by Automation

# 1. View drift
hcptf my-org my-workspace assessments
# Shows: multiple resources - Action: update
#        tags: {} -> {"Environment": "prod", "Owner": "team-a"}

# 2. Get code location
RUN_ID=$(hcptf my-org my-workspace -output=json | jq -r '.CurrentRunID')
hcptf my-org my-workspace runs $RUN_ID configversion
# Shows: RepoIdentifier, Branch, CommitURL

# 3. Decision: Tags are correct, update code to include them
# 4. Clone repo, checkout branch, update Terraform files with tags
# 5. Commit and push (triggers auto-run if VCS-backed)

Scenario 4: Certificate Expiring (Check Failure)

# 1. View checks
hcptf my-org my-workspace assessments
# Shows: TERRAFORM CHECK RESULTS
#        tls_self_signed_cert.app - FAIL
#        • Certificate will expire in less than 4 hours

# 2. Decision: Need to regenerate certificate
# 3. Update code to generate new cert or increase validity period
# 4. Apply changes
hcptf my-org my-workspace runs create \
  -message="Regenerate expiring certificate"

Tips and Best Practices

Regular monitoring: Check drift regularly using Explorer queries
Investigate before fixing: Always understand WHY drift occurred
Document decisions: Include clear messages when creating runs
Use continuous validation: Enable Terraform checks to catch issues early
Automate remediation: For known drift patterns, consider automated fixes
Review permissions: Drift often indicates someone has console access they shouldn't
Update documentation: If updating code to match infrastructure, document why

Troubleshooting

No assessment results found:

Health assessments may not be enabled in workspace settings
Feature requires HCP Terraform Plus or Enterprise
Enable via UI or hcptf workspace update

401 Unauthorized on assessment results:

Check your authentication token is valid
Ensure you have at least read access to the workspace

Drift shows but no resources listed:

May be computed attribute changes (not shown in detail)
Check the log output URL for full details

Apply doesn't fix drift:

Drift may be due to external automation re-applying changes
Consider updating code to match the pattern
Investigate what's causing the external changes

Related Commands

hcptf explorer query - Query workspaces and resources
hcptf assessmentresult list - View drift and check details
hcptf workspace read - Get workspace details
hcptf run create - Create a run to fix drift
hcptf run show - Monitor run progress
hcptf variable list/update - Manage workspace variables
hcptf configversion read - Get VCS commit information

Agent Considerations

When building agents that handle drift:

Assess severity: Check if drift is critical or expected
Get approval: Don't auto-fix drift without user review
Provide context: Explain what changed and why
Link to code: Use VCS URLs to show relevant code
Suggest actions: Present options with pros/cons
Verify resolution: Confirm drift is cleared after action

driftSafety 95Repository

Package Files

Drift Detection and Resolution Skill

Overview

Prerequisites

Core Concepts

Workflow

1. Find Drifted Workspaces

2. View Drift Details

3. Understand the Configuration

4. Analyze and Decide

5. Take Action

6. Verify Resolution

Common Scenarios

Scenario 1: EC2 Instance Deleted (Autoscaling)

Scenario 2: Route53 Record IP Changed

Scenario 3: Tags Added by Automation

Scenario 4: Certificate Expiring (Check Failure)

Tips and Best Practices

Troubleshooting

Related Commands

Agent Considerations

Install

AI Quality Score

Metadata

Tags

driftSafety 95Repository ShareFavorite skill

Package Files

Drift Detection and Resolution Skill

Overview

Prerequisites

Core Concepts

Workflow

1. Find Drifted Workspaces

2. View Drift Details

3. Understand the Configuration

4. Analyze and Decide

5. Take Action

6. Verify Resolution

Common Scenarios

Scenario 1: EC2 Instance Deleted (Autoscaling)

Scenario 2: Route53 Record IP Changed

Scenario 3: Tags Added by Automation

Scenario 4: Certificate Expiring (Check Failure)

Tips and Best Practices

Troubleshooting

Related Commands

Agent Considerations

Install

AI Quality Score

Metadata

Tags

driftSafety 95Repository