askill
cert-manager-troubleshooting

cert-manager-troubleshootingSafety 95Repository

Use when TLS certificates are not being issued or renewed, when Certificate or CertificateRequest resources show errors, when ACME challenges fail, or when Issuer or ClusterIssuer resources are not ready

2 stars
1.2k downloads
Updated 2/9/2026

Package Files

Loading files...
SKILL.md

cert-manager Troubleshooting

Diagnose and resolve failures in cert-manager — the controller that automates TLS certificate issuance and renewal from sources like Let's Encrypt (ACME), HashiCorp Vault, Venafi, and self-signed CAs.

Keywords

cert-manager, certificate, tls, ssl, https, letsencrypt, acme, issuer, clusterissuer, certificaterequest, order, challenge, http01, dns01, self-signed, ca, renew, renewal, expired, not-ready, ingress, annotation

When to Use This Skill

  • Certificate resources show Ready: False or Issuing for too long
  • CertificateRequest is stuck or shows errors
  • ACME challenges (HTTP-01 or DNS-01) are failing
  • Issuer or ClusterIssuer shows Ready: False
  • TLS secrets are not being created or contain expired certificates
  • Ingress TLS annotations are not triggering certificate creation
  • Certificate renewal is not happening before expiry

When NOT to Use

Related Skills

Quick Reference

TaskCommand
Check cert-manager podskubectl get pods -n cert-manager
List Certificateskubectl get certificate -A
List Issuerskubectl get issuer -A && kubectl get clusterissuer
Check CertificateRequestskubectl get certificaterequest -A
Check Orders (ACME)kubectl get order -A
Check Challenges (ACME)kubectl get challenge -A
View cert-manager logskubectl logs -n cert-manager deploy/cert-manager --tail=200
Check cert expirykubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

Diagnostic Workflow

TLS certificate not working?
├─ Certificate resource exists?
│   ├─ No → Check Ingress annotations or create Certificate (Section 4)
│   └─ Yes → Check Certificate status
│       ├─ Ready: True → Certificate issued, check if correct (Section 6)
│       └─ Ready: False → Check CertificateRequest
│           ├─ No CertificateRequest → Issuer reference broken (Section 2)
│           ├─ CertificateRequest pending → Check Issuer health (Section 2)
│           └─ CertificateRequest denied → Check approval/RBAC
├─ Using ACME (Let's Encrypt)?
│   ├─ Order created? → Check Order status
│   │   ├─ Order pending → Check Challenge status (Section 3)
│   │   ├─ Order invalid → All challenges failed
│   │   └─ Order errored → ACME account issue (Section 2)
│   └─ Challenge status?
│       ├─ HTTP-01 pending → Solver pod/ingress issues (Section 3)
│       └─ DNS-01 pending → DNS provider issues (Section 3)
└─ Certificate expired?
    └─ Renewal not triggered → Check renewal config (Section 5)

Section 1: Controller Health

# Check all cert-manager components
kubectl get pods -n cert-manager -o wide

# Controller logs (main component)
kubectl logs -n cert-manager deploy/cert-manager --tail=200

# Webhook logs (validates/converts resources)
kubectl logs -n cert-manager deploy/cert-manager-webhook --tail=100

# CA injector logs (injects CA bundles)
kubectl logs -n cert-manager deploy/cert-manager-cainjector --tail=100

# Check CRDs
kubectl get crd | grep cert-manager

# Check webhook connectivity
kubectl get validatingwebhookconfigurations | grep cert-manager
kubectl get mutatingwebhookconfigurations | grep cert-manager

Controller Issues

SymptomCauseFix
Certificate stuck at "Issuing"Controller not processingCheck controller logs for errors, restart if needed
Any cert-manager resource creation failsWebhook unreachableCheck webhook pod, Service, and network policies
"conversion webhook" errorsCRD version mismatchEnsure cert-manager version matches CRD version
Leader election errorsMultiple controllers competingCheck for duplicate installations

Section 2: Issuer and ClusterIssuer

# Check Issuer status
kubectl get issuer -n ${NS} -o wide
kubectl describe issuer ${ISSUER_NAME} -n ${NS}

# Check ClusterIssuer status
kubectl get clusterissuer -o wide
kubectl describe clusterissuer ${ISSUER_NAME}

# Check the Certificate's issuer reference
kubectl get certificate ${CERT_NAME} -n ${NS} -o jsonpath='{.spec.issuerRef}'

ACME Issuer (Let's Encrypt)

# Check ACME account registration
kubectl describe issuer ${ISSUER_NAME} -n ${NS} | grep -A10 "Status:"

# Check ACME account secret
kubectl get secret ${ISSUER_NAME}-account-key -n ${NS} 2>/dev/null

# Verify ACME server URL
kubectl get issuer ${ISSUER_NAME} -n ${NS} -o jsonpath='{.spec.acme.server}'
ACME ServerURLUse Case
Let's Encrypt Productionhttps://acme-v02.api.letsencrypt.org/directoryProduction certs
Let's Encrypt Staginghttps://acme-staging-v02.api.letsencrypt.org/directoryTesting (fake certs, higher rate limits)
ErrorCauseFix
"account not registered"ACME registration failedCheck email, delete account secret, let cert-manager re-register
"invalid account"Account key mismatchDelete the account key secret and restart
Issuer Ready: FalseCan't reach ACME serverCheck network/proxy, verify server URL

CA Issuer

# Check CA secret exists and has required keys
kubectl get secret ${CA_SECRET} -n ${NS}
kubectl get secret ${CA_SECRET} -n ${NS} -o jsonpath='{.data}' | jq 'keys'
# Must contain: tls.crt, tls.key

Vault Issuer

# Check Vault connection
kubectl describe issuer ${ISSUER_NAME} -n ${NS} | grep -A20 "Spec:"

# Logs for Vault errors
kubectl logs -n cert-manager deploy/cert-manager --tail=200 | grep -iE 'vault|pki|sign|auth'

Section 3: ACME Challenges (HTTP-01 and DNS-01)

# List all challenges
kubectl get challenge -A -o wide

# Describe failing challenge
kubectl describe challenge ${CHALLENGE_NAME} -n ${NS}

# Check challenge state
kubectl get challenge -A -o json | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name)\t\(.spec.type)\t\(.status.state)\t\(.status.reason // "no reason")"'

HTTP-01 Challenge Troubleshooting

# Check solver pod
kubectl get pods -n ${NS} -l acme.cert-manager.io/http01-solver=true

# Check solver Ingress/Service
kubectl get ingress -n ${NS} -l acme.cert-manager.io/http01-solver=true
kubectl get svc -n ${NS} -l acme.cert-manager.io/http01-solver=true

# Test challenge URL locally
# The URL is: http://<domain>/.well-known/acme-challenge/<token>
kubectl get challenge ${CHALLENGE_NAME} -n ${NS} -o jsonpath='{.spec.token}'
HTTP-01 ProblemCauseFix
Solver pod not createdRBAC or class annotation issueCheck cert-manager logs, verify ingress class
Solver pod pendingResource constraintsCheck node capacity and resource quotas
Challenge URL returns 404Ingress not routing to solverCheck ingress class, annotations, and controller
Challenge URL unreachableFirewall blocks port 80Open port 80 from internet to ingress controller
"wrong status code: 503"Load balancer health check failingSolver pod needs time; check LB health check config
Challenge stays pendingLet's Encrypt can't reach clusterVerify domain resolves to cluster ingress IP

DNS-01 Challenge Troubleshooting

# Check which DNS provider is configured
kubectl get issuer ${ISSUER_NAME} -n ${NS} -o json | jq '.spec.acme.solvers[].dns01'

# Check for DNS provider credentials
kubectl get challenge ${CHALLENGE_NAME} -n ${NS} -o json | jq '.spec.solver.dns01'

# Verify TXT record was created
dig TXT _acme-challenge.${DOMAIN} @8.8.8.8

# Check cert-manager DNS provider logs
kubectl logs -n cert-manager deploy/cert-manager --tail=300 | grep -iE 'dns|challenge|present|cleanup|txt'
DNS-01 ProblemCauseFix
TXT record not createdProvider auth failedCheck DNS provider credentials in Issuer
TXT record exists but challenge failsDNS propagation delayIncrease --dns01-recursive-nameservers-only and check nameservers
Wrong zoneDNS split-horizon or sub-domain issueSet --dns01-recursive-nameservers=8.8.8.8:53
"zone not found"Provider can't find the DNS zoneVerify zone exists and credentials have access
Cleanup failed (TXT not removed)Provider API error on deleteManually remove stale _acme-challenge TXT records

Common DNS-01 Providers

ProviderSecret TypeKey Fields
Route53AWS credentials or IRSAaccessKeyID, secretAccessKeySecretRef, or IRSA annotation
Azure DNSService principal or Workload IdentityclientID, clientSecretSecretRef, subscriptionID, resourceGroupName
CloudFlareAPI tokenapiTokenSecretRef
Google Cloud DNSService account keyserviceAccountSecretRef, project

Section 4: Certificate and Ingress Configuration

# Check Certificate spec
kubectl get certificate ${CERT_NAME} -n ${NS} -o yaml

# Check CertificateRequest chain
kubectl get certificaterequest -n ${NS} -o wide
kubectl describe certificaterequest ${CR_NAME} -n ${NS}

# Check if Ingress annotations trigger cert creation
kubectl get ingress -n ${NS} -o json | jq '.items[] | {name: .metadata.name, annotations: .metadata.annotations, tls: .spec.tls}'

Ingress Annotation Issues

AnnotationPurposeCommon Mistake
cert-manager.io/issuerNamespace IssuerUsing ClusterIssuer name with this annotation
cert-manager.io/cluster-issuerClusterIssuerTypo in name or using Issuer name
cert-manager.io/issue-temporary-certificateTemp cert during issuanceForgetting this causes TLS errors during initial issuance
cert-manager.io/common-nameOverride CNIgnored if dnsNames are set

Certificate Spec Problems

ProblemCauseFix
"issuer not found"Wrong issuerRef.name or issuerRef.kindVerify issuer exists and kind matches (Issuer vs ClusterIssuer)
"dnsNames required"No domains specifiedAdd dnsNames to Certificate spec
"secret already exists"Conflict with manually created SecretDelete existing Secret or change secretName
CertificateRequest not createdSpec validation failedCheck cert-manager webhook logs

Section 5: Renewal and Expiry

# Check certificate dates
kubectl get certificate -A -o json | jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name)\t\(.status.notAfter)\t\(.status.renewalTime)\tReady=\(.status.conditions[0].status)"'

# Check from the actual TLS secret
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates -subject

# Force renewal
kubectl cert-manager renew ${CERT_NAME} -n ${NS}
# Or annotate to trigger renewal
kubectl annotate certificate ${CERT_NAME} -n ${NS} cert-manager.io/renew-before-expiry=now --overwrite
ProblemCauseFix
Not renewing before expiryrenewBefore too short or controller stoppedCheck renewBefore (default: 2/3 of cert lifetime)
Renewed but old cert in SecretSecret not updatedDelete Secret; cert-manager will recreate
Let's Encrypt rate limitedToo many issuances for same domainWait (production: 5 certs/week per domain), use staging for testing
Wildcard cert not renewingDNS-01 solver brokenCheck DNS-01 challenge flow specifically

Let's Encrypt Rate Limits

LimitValueScope
Certificates per domain50/weekRegistered domain (e.g., example.com)
Duplicate certificates5/weekSame set of domain names
Failed validations5/hourPer account, per hostname
New registrations10/3 hoursPer IP

Section 6: Verifying Issued Certificates

# Check certificate details from the K8s Secret
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -text | head -30

# Verify cert chain
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl verify -show_chain

# Check cert matches key
CERT_MOD=$(kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -modulus | md5)
KEY_MOD=$(kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.key}' | base64 -d | openssl rsa -noout -modulus | md5)
echo "Match: $([ "$CERT_MOD" = "$KEY_MOD" ] && echo YES || echo NO)"

# Check if staging cert (not trusted in browsers)
kubectl get secret ${SECRET} -n ${NS} -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -issuer
# Staging issuer contains "(STAGING)" or "Fake LE"
ProblemCauseFix
Browser shows "Not Trusted"Using Let's Encrypt staging issuerSwitch to production issuer server URL
Wrong SANs on certdnsNames in Certificate don't match Ingress hostsUpdate Certificate dnsNames to match
Cert issued but Ingress uses old oneIngress controller cached old certRestart ingress controller or wait for cache TTL
"certificate signed by unknown authority"CA cert not in client trust storeAdd CA cert to trust store or use ca.crt from Secret

MCP Tools Available

When the appropriate MCP servers are connected, prefer these over raw kubectl where available:

  • mcp__flux-operator-mcp__get_kubernetes_resources - Query Certificates, CertificateRequests, Orders, Challenges, Issuers
  • mcp__flux-operator-mcp__get_kubernetes_logs - Retrieve cert-manager controller, webhook, and cainjector logs
  • mcp__flux-operator-mcp__get_kubernetes_metrics - Check cert-manager resource consumption

Common Mistakes

MistakeWhy It FailsInstead
Testing with Let's Encrypt productionHits rate limits quickly during debuggingUse acme-staging-v02.api.letsencrypt.org until everything works, then switch
Wildcard cert with HTTP-01HTTP-01 doesn't support wildcards, only specific domainsUse DNS-01 solver for wildcard certificates (*.example.com)
Forgetting port 80 for HTTP-01Let's Encrypt must reach /.well-known/acme-challenge/ over HTTPEnsure port 80 is open from the internet to the ingress controller
Using cert-manager.io/issuer for a ClusterIssuerIngress annotation looks for a namespace-scoped IssuerUse cert-manager.io/cluster-issuer annotation instead
Deleting and recreating Certificate for renewalCauses a new ACME order, consuming rate limitsUse kubectl cert-manager renew to trigger renewal of existing cert
Not checking Order and Challenge resourcesDebugging the Certificate when the problem is in the ACME challengeFollow the chain: Certificate → CertificateRequest → Order → Challenge

Behavioural Guidelines

  1. Follow the resource chain — Certificate → CertificateRequest → Order → Challenge. Diagnose at the deepest failing layer.
  2. Never display raw private key contents — Examining certificate metadata (dates, SANs, issuer) is safe. Modulus comparison (openssl rsa -noout -modulus) for cert/key matching is acceptable, but never print full tls.key data.
  3. Check the Issuer first — If the Issuer is Ready: False, no Certificate can be issued.
  4. Distinguish staging from production — Staging certs are not browser-trusted but have much higher rate limits for testing.
  5. Watch for rate limits — Let's Encrypt production has strict limits. Check kubectl describe order for rate limit messages before retrying.
  6. Verify DNS resolution — For both HTTP-01 and DNS-01, the domain must resolve correctly. Use dig to verify.
  7. Check all cert-manager components — The controller, webhook, and cainjector all play different roles. A webhook failure blocks resource creation entirely.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/18/2026

Excellent skill document for cert-manager troubleshooting. Comprehensive coverage of controller health, issuers (ACME/CA/Vault), ACME challenges (HTTP-01/DNS-01), certificate configuration, and renewal issues. Includes clear diagnostic workflow, troubleshooting tables, quick reference commands, and related skills for cross-referencing. Well-structured with proper "when to use" and "when NOT to use" guidance. The content is technically accurate, actionable, and reusable across different cert-manager deployments. Slight deduction for potential internal-only signal due to repo path, but content itself is generalizable.

95
95
90
95
95

Metadata

Licenseunknown
Version-
Updated2/9/2026
Publisherfoxj77

Tags

apici-cdgithub-actionssecuritytesting