askill
cluster-sizing

cluster-sizingSafety 90Repository

This skill should be used when the user asks about "Temporal sizing", "history shards", "cluster capacity", "Temporal resources", "scale Temporal", "Temporal performance", "how many shards", or needs guidance on capacity planning for Temporal clusters.

0 stars
1.2k downloads
Updated 2/8/2026

Package Files

Loading files...
SKILL.md

Temporal Cluster Sizing

Guidance for sizing Temporal clusters based on workload requirements.

Key Sizing Factors

FactorImpactCannot Change
History ShardsWorkflow parallelismYes (set at creation)
History ReplicasThroughput, availabilityNo
Matching ReplicasTask dispatch rateNo
Frontend ReplicasAPI request rateNo
Database SizeHistory storageNo

History Shards

Critical: History shards cannot be changed after cluster creation.

Shards determine maximum workflow parallelism. Each workflow belongs to one shard.

Sizing Guidelines

Concurrent WorkflowsRecommended Shards
< 10,000128
10,000 - 100,000256
100,000 - 500,000512
500,000 - 2,000,0001024
> 2,000,0002048 or 4096

Calculation Formula

shards = ceil(max_concurrent_workflows / 1000) * safety_factor

# Round up to nearest power of 2
# safety_factor = 2-4x for growth

Example: Expecting 50,000 concurrent workflows with 3x growth:

base = 50,000 / 1000 = 50
with_growth = 50 * 3 = 150
nearest_power_of_2 = 256 shards

Shard Distribution

Shards distribute across history service replicas:

shards_per_replica = total_shards / history_replicas

# Example: 512 shards, 4 replicas = 128 shards/replica

More replicas = better distribution = higher throughput.

Service Sizing

Frontend Service

Handles API requests, authentication, rate limiting.

Load LevelReplicasCPUMemory
Low (<100 rps)1-2500m1Gi
Medium (100-1000 rps)312Gi
High (1000-5000 rps)524Gi
Very High (>5000 rps)10+48Gi

History Service

Manages workflow state and event history.

ShardsReplicasCPU/replicaMemory/replica
128212Gi
256324Gi
5124-624Gi
10248-1248Gi
204816-2448Gi

Matching Service

Dispatches tasks to workers.

Task RateReplicasCPUMemory
Low (<1000/s)2500m1Gi
Medium (1000-10000/s)312Gi
High (>10000/s)5+24Gi

Worker Service (Internal)

Handles internal system workflows. Scale with cluster size:

Cluster SizeReplicasCPUMemory
Small1200m256Mi
Medium1500m512Mi
Large211Gi

Database Sizing

PostgreSQL Recommendations

Workflow VolumeCPUMemoryStorageIOPS
< 100K workflows28GB100GB3000
100K-1M workflows416GB500GB6000
1M-10M workflows832GB1TB12000
> 10M workflows16+64GB+2TB+20000+

Storage Calculation

storage_per_workflow = avg_history_events * event_size
                     = 100 events * 1KB = 100KB

total_storage = workflows * storage_per_workflow * retention_multiplier
              = 1,000,000 * 100KB * 1.5 = 150GB

Retention: Configure appropriate workflow retention to manage storage.

Elasticsearch Sizing

For visibility queries (optional but recommended):

Indexed WorkflowsNodesCPU/nodeMemory/nodeStorage/node
< 1M312Gi50Gi
1M-10M324Gi200Gi
> 10M5+48Gi500Gi

Configuration Templates

Small Cluster (Dev/Test)

server:
  config:
    numHistoryShards: 128
  replicaCount:
    frontend: 1
    history: 1
    matching: 1
    worker: 1
  resources:
    frontend:
      requests: {cpu: "250m", memory: "512Mi"}
    history:
      requests: {cpu: "500m", memory: "1Gi"}
    matching:
      requests: {cpu: "250m", memory: "512Mi"}

Medium Cluster (Production Start)

server:
  config:
    numHistoryShards: 256
  replicaCount:
    frontend: 3
    history: 3
    matching: 3
    worker: 1
  resources:
    frontend:
      requests: {cpu: "500m", memory: "1Gi"}
      limits: {cpu: "2", memory: "4Gi"}
    history:
      requests: {cpu: "1", memory: "2Gi"}
      limits: {cpu: "4", memory: "8Gi"}
    matching:
      requests: {cpu: "500m", memory: "1Gi"}
      limits: {cpu: "2", memory: "4Gi"}

Large Cluster (High Volume)

server:
  config:
    numHistoryShards: 1024
  replicaCount:
    frontend: 5
    history: 10
    matching: 5
    worker: 2
  resources:
    frontend:
      requests: {cpu: "2", memory: "4Gi"}
      limits: {cpu: "4", memory: "8Gi"}
    history:
      requests: {cpu: "4", memory: "8Gi"}
      limits: {cpu: "8", memory: "16Gi"}
    matching:
      requests: {cpu: "2", memory: "4Gi"}
      limits: {cpu: "4", memory: "8Gi"}

Scaling Guidelines

Horizontal Scaling

Scale replicas when:

  • CPU utilization > 70% sustained
  • Memory utilization > 80%
  • Request latency p99 > SLA
  • Task backlog growing

Vertical Scaling

Increase resources when:

  • Replica count at practical limit
  • Database connection pooling maxed
  • GC pressure affecting latency

Monitoring for Sizing Decisions

Key metrics to watch:

# History service load
sum(rate(temporal_persistence_requests_total[5m])) by (operation)

# Task latency (indicates matching capacity)
histogram_quantile(0.99, rate(temporal_schedule_to_start_latency_bucket[5m]))

# Workflow throughput
sum(rate(temporal_workflow_completed_total[5m]))

# Shard distribution
temporal_history_shard_count

Common Sizing Mistakes

MistakeImpactSolution
Too few shardsCannot scale laterStart with more shards
Undersized historyLatency spikesIncrease memory, replicas
Single frontendSingle point of failureMinimum 2 for HA
No ElasticsearchSlow visibility queriesEnable for production

Additional Resources

Reference Files

For detailed sizing calculations, consult:

  • references/sizing-calculator.md - Detailed sizing formulas
  • references/benchmark-results.md - Performance benchmark data

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/11/2026

An exceptional technical reference for Temporal cluster capacity planning, featuring precise formulas, resource allocation tables, and ready-to-use configuration templates.

90
96
95
98
92

Metadata

Licenseunknown
Version1.0.0
Updated2/8/2026
Publishertherealbill

Tags

apidatabasegithub-actionsobservabilitytesting