askill
terragrunt

terragruntSafety 100Repository

Homelab infrastructure management with Terragrunt, OpenTofu, and Terraform patterns. Use when: (1) Planning or applying infrastructure changes to dev/integration/live clusters, (2) Adding/modifying machines in inventory.hcl, (3) Creating or updating units and stacks, (4) Working with feature flags, (5) Running validation (fmt, validate, test, plan), (6) Understanding the units→stacks→modules architecture, (7) Working with HCL configuration files, (8) Bare-metal Kubernetes provisioning or Talos configuration. Triggers: "terragrunt", "terraform", "opentofu", "tofu", "infrastructure code", "IaC", "inventory.hcl", "networking.hcl", "HCL files", "add machine", "add node", "cluster provisioning", "bare metal", "talos config", "task tg:", "infrastructure plan", "infrastructure apply", "stacks", "units", "modules architecture" Always use task commands (task tg:*) instead of running terragrunt directly.

0 stars
1.2k downloads
Updated 2/5/2026

Package Files

Loading files...
SKILL.md

Terragrunt Infrastructure Skill

Manage bare-metal Kubernetes infrastructure from PXE boot to running clusters.

Architecture Overview

stacks/           → Cluster deployments (dev, integration, live)
  └── terragrunt.stack.hcl → Defines units and passes values

units/            → Reusable Terragrunt wrappers
  └── terragrunt.hcl → Declares dependencies, passes inputs to modules

modules/          → Pure Terraform/OpenTofu code
  └── *.tf → Resources, variables, outputs

Dependency chain: configunifi / talosbootstrap / aws-set-params

The config unit is the brain—reads all .hcl config files and outputs structured data consumed by other units.

Task Commands (Always Use These)

# Validation (run in order)
task tg:fmt                    # Format HCL files
task tg:test-<module>          # Test specific module (e.g., task tg:test-config)
task tg:validate-<stack>       # Validate stack (e.g., task tg:validate-integration)

# Operations
task tg:list                   # List available stacks
task tg:plan-<stack>           # Plan (e.g., task tg:plan-integration)
task tg:apply-<stack>          # Apply (REQUIRES HUMAN APPROVAL)
task tg:gen-<stack>            # Generate stack files
task tg:clean-<stack>          # Clean generated files

NEVER run terragrunt or tofu directly—always use task commands.

Stack Definition (terragrunt.stack.hcl)

locals {
  name     = "${basename(get_terragrunt_dir())}"  # "integration"
  features = ["gateway-api", "longhorn", "prometheus", "spegel"]
}

unit "config" {
  source = "../../units/config"
  path   = "config"
  values = {
    name     = local.name
    features = local.features
  }
}

unit "talos" {
  source = "../../units/talos"
  path   = "talos"
}
  • source: Path to unit directory
  • path: Output path in .terragrunt-stack/
  • values: Data passed to unit's values.* references

Unit Definition (terragrunt.hcl)

locals {
  networking_vars = read_terragrunt_config(find_in_parent_folders("networking.hcl"))
  inventory_vars  = read_terragrunt_config(find_in_parent_folders("inventory.hcl"))
}

include "root" {
  path = find_in_parent_folders("root.hcl")
}

terraform {
  source = "../../../.././/modules/config"
}

dependency "config" {
  config_path = "../config"
  mock_outputs = { ... }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}

inputs = {
  name       = values.name           # From stack's values block
  networking = local.networking_vars.locals.clusters[values.name]
}

Key patterns:

  • read_terragrunt_config() reads sibling .hcl files
  • values.* accesses data from stack's values = { } block
  • dependency.* accesses outputs from prerequisite units
  • mock_outputs enables planning without applied dependencies

Configuration Files (Source of Truth)

FilePurposeExample Data
inventory.hclHardware (nodes, MACs, IPs, disks)node41 = { cluster = "live", type = "controlplane", ... }
networking.hclNetwork topology per clusterlive = { vip = "192.168.10.20", pod_subnet = "172.18.0.0/16" }
versions.hclPinned software versionstalos = "v1.12.1", kubernetes = "1.34.0"
accounts.hclExternal service credentialsSSM paths for secrets, not values

NEVER hardcode values that exist in these files—use read_terragrunt_config().

Common Tasks

Add a Machine

  1. Edit inventory.hcl:
node50 = {
  cluster = "live"
  type    = "worker"
  install = {
    selector     = "disk.model == 'Samsung'"
    architecture = "amd64"
  }
  interfaces = [{
    id           = "eth0"
    hardwareAddr = "aa:bb:cc:dd:ee:ff"  # VERIFY correct
    addresses    = [{ ip = "192.168.10.50" }]  # VERIFY available
  }]
}
  1. Run task tg:plan-live
  2. Review plan—config module auto-includes machines where cluster == "live"
  3. Request human approval before apply

Add a Feature Flag

  1. Add version to versions.hcl if needed
  2. Add feature detection in modules/config/main.tf:
locals {
  new_feature_enabled = contains(var.features, "new-feature")
}
  1. Enable in stack's features list:
features = ["gateway-api", "longhorn", "new-feature"]

Create a New Unit

  1. Create units/new-unit/terragrunt.hcl:
include "root" {
  path = find_in_parent_folders("root.hcl")
}

terraform {
  source = "../../../.././/modules/new-unit"
}

dependency "config" {
  config_path = "../config"
  mock_outputs = { new_unit = {} }
}

inputs = dependency.config.outputs.new_unit
  1. Create corresponding modules/new-unit/ with variables.tf, main.tf, outputs.tf, versions.tf
  2. Add output from config module
  3. Add unit block to stacks that need it

Module Testing

Tests use OpenTofu native testing in modules/<name>/tests/*.tftest.hcl:

# Top-level variables set defaults for ALL run blocks
variables {
  name     = "test-cluster"
  features = ["gateway-api"]
  machines = {
    node1 = {
      cluster = "test-cluster"
      type    = "controlplane"
      # ... complete machine definition
    }
  }
}

run "feature_enabled" {
  command = plan
  variables {
    features = ["prometheus"]  # Only override what differs
  }
  assert {
    condition     = output.prometheus_enabled == true
    error_message = "Prometheus should be enabled"
  }
}

Run with task tg:test-config or task tg:test for all modules.

Safety Rules

  • NEVER run apply without explicit human approval
  • NEVER use --auto-approve flags
  • NEVER guess MAC addresses or IPs—verify against inventory.hcl
  • NEVER commit .terragrunt-cache/ or .terragrunt-stack/
  • NEVER manually edit Terraform state

State Operations

When removing state entries with indexed resources (e.g., this["rpi4"]), xargs strips the quotes causing errors. Use a while loop instead:

# WRONG - xargs mangles quotes in resource names
terragrunt state list | xargs -n 1 terragrunt state rm

# CORRECT - while loop preserves quotes
terragrunt state list | while read -r resource; do terragrunt state rm "$resource"; done

This applies to any state operation on resources with map keys like data.talos_machine_configuration.this["rpi4"].

Validation Checklist

Before requesting apply approval:

  • task tg:fmt passes
  • task tg:test passes (if module tests exist)
  • task tg:validate passes for ALL stacks
  • task tg:plan-<stack> reviewed
  • No unexpected destroys in plan
  • Network changes won't break connectivity

References

  • stacks.md - Detailed Terragrunt stacks documentation
  • units.md - Detailed Terragrunt units documentation

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/10/2026

An exceptionally well-documented skill for managing infrastructure using Terragrunt and OpenTofu. It provides clear architecture, specific task commands, safety guardrails, and detailed walkthroughs for common operations.

100
95
65
98
95

Metadata

Licenseunknown
Version-
Updated2/5/2026
Publishermajiayu000

Tags

apitesting