askill
distributed-tracing

distributed-tracingSafety 92Repository

Implement distributed tracing across microservices with Jaeger, Zipkin, and W3C TraceContext. Use when: setting up trace propagation, configuring Jaeger or Zipkin, understanding span relationships, choosing sampling strategies, or correlating traces with logs and metrics.

2 stars
1.2k downloads
Updated 3/14/2026

Package Files

Loading files...
SKILL.md

Distributed Tracing

Status: Production Ready Last Updated: 2026-02-24


Description

Distributed tracing tracks requests as they flow through multiple services, providing end-to-end visibility into latency, errors, and dependencies. It is essential for debugging and performance optimization in microservice architectures.

When to Use

  • Setting up trace propagation (W3C TraceContext, B3)
  • Configuring Jaeger for trace collection and visualization
  • Configuring Zipkin for trace collection and visualization
  • Understanding span relationships (parent/child, follows-from)
  • Choosing between head-based and tail-based sampling
  • Planning trace storage and retention policies
  • Correlating traces with logs and metrics

Key Concepts

Trace Propagation Formats

FormatStandardUse Case
W3C TraceContextW3C RecommendationDefault for OpenTelemetry, broadly supported
B3ZipkinLegacy Zipkin ecosystems, broad language support
JaegerJaeger-specificJaeger-native environments (being deprecated in favor of W3C)

Span Relationships

RelationshipMeaning
ChildOfThe parent span depends on the child span's result
FollowsFromThe parent span does not depend on the child (fire-and-forget)

Sampling Strategies

StrategyDescriptionTrade-off
Head-basedDecision made at trace startSimple, but may miss interesting traces
Tail-basedDecision made after trace completesCaptures errors/slow traces, but requires buffering
AdaptiveAdjusts rate based on traffic volumeBalances cost and coverage

Best Practices

  1. Use W3C TraceContext — it is the industry standard and supported by all major tracing backends
  2. Propagate context through all transport layers — HTTP headers, message queue metadata, gRPC metadata
  3. Set span names to describe the operationPOST /api/orders not handler
  4. Record errors on spans — set span status to ERROR and call recordException
  5. Use tail-based sampling for production — capture all error and high-latency traces while sampling normal traffic
  6. Correlate traces with logs — inject trace_id and span_id into log records for cross-signal navigation
  7. Set retention policies — keep error traces longer than normal traces to support incident investigation
  8. Monitor trace pipeline health — alert on dropped spans, exporter failures, and collector queue depth

Common Patterns

Jaeger Setup (Docker Compose)

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"   # Jaeger UI
      - "4317:4317"     # OTLP gRPC
      - "4318:4318"     # OTLP HTTP
    environment:
      COLLECTOR_OTLP_ENABLED: "true"

Zipkin Setup (Docker Compose)

services:
  zipkin:
    image: openzipkin/zipkin:latest
    ports:
      - "9411:9411"     # Zipkin UI + API

W3C TraceContext Header

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01

Format: version-traceId-spanId-traceFlags

Correlating Traces with Logs

import { trace, context } from "@opentelemetry/api";

function getTraceContext() {
  const span = trace.getActiveSpan();
  if (!span) return {};
  const ctx = span.spanContext();
  return {
    trace_id: ctx.traceId,
    span_id: ctx.spanId,
    trace_flags: ctx.traceFlags,
  };
}

// Inject into structured logs
logger.info({ ...getTraceContext(), message: "Order processed", orderId });

Anti-Patterns

Anti-PatternWhy It HurtsCorrect Approach
Not propagating context across async boundariesBroken traces, orphan spansUse context-aware async utilities
Sampling at 100% in productionStorage costs explode, performance impactUse 5–20% head-based or tail-based sampling
Generic span namesCannot filter or search effectivelyUse descriptive, route-based span names
Ignoring span status on errorsErrors invisible in trace UIAlways set ERROR status and record exception
No trace-log correlationCannot jump from trace to logsInject trace_id into all structured logs
Unbounded trace retentionStorage grows without limitSet retention policies (7–30 days typical)

Last verified: 2026-02-24 | Skill version: 1.0.0

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

72/100Analyzed 3/27/2026

High-quality technical reference for distributed tracing with comprehensive coverage of concepts, patterns, and anti-patterns. Well-structured with tables, code examples, and clear organization. Located in an internal agent skill library path which signals project-specific usage, lowering overall reusability score. Lacks deep language-specific SDK configuration details but provides solid foundational guidance.

92
90
70
82
75

Metadata

Licenseunknown
Version1.0.0
Updated3/14/2026
PublisherRepairYourTech

Tags

apici-cdobservability