askill
resilience-testing

resilience-testingSafety 95Repository

Resilience and fault tolerance testing. Use when implementing circuit breakers, retry logic, or testing failure scenarios. Covers failure simulation, graceful degradation, and resilience pattern validation.

0 stars
1.2k downloads
Updated 1/30/2026

Package Files

Loading files...
SKILL.md

Validate Data Flow Resilience via Fault-Tolerant Integration Tests

Description

This rule promotes resilience testing by simulating upstream/downstream failures in integration tests. Tests should assert that the system under test responds with fallback behavior (e.g., retries, circuit breakers, failover logs) rather than cascading failure or unhandled exceptions.

Purpose

To ensure that mission-critical services remain operational even when dependencies fail. This improves availability, supports fault isolation, and aligns with modern resilience engineering practices.

Scope

  • Critical path services and workflows
  • Integration tests involving third-party APIs, microservice dependencies, databases, or queues
  • Applies to all developers and system testers
  • Enforced via CI environments using mock failure injection or service virtualization

SDLC Integration

  • Planning: Failure scenarios are captured as acceptance criteria
  • Analysis: Identifies SLAs and recovery thresholds
  • Design: Uses circuit breakers, retry policies, and timeouts
  • Development: Adds negative-path tests to simulate failure
  • Testing: Validates fallback logic in real environments
  • Deployment: Prevents ungraceful degradation in production
  • Maintenance: Regular review of resilience patterns across services

Standards

Failure Scenario Testing

  • Integration tests SHOULD simulate at least one failure scenario per critical service
  • Systems MUST retry, degrade gracefully, or return default values on dependency failure
  • Circuit breakers and timeout logic SHOULD be test-covered
  • Retry attempts MUST NOT exceed safe retry thresholds to avoid service amplification

Actionable Metrics

MetricTarget ValueMeasurement MethodEnforcement Level
Resilience test presence≥ 1 per critical serviceTest tag or folder scanSHOULD
Response under failureNo crash, fallback loggedLog assertion in test frameworkMUST
Retry logic testedYesObserved call count or delaySHOULD

Implementation

Configuration Requirements

  • Use test doubles or service virtualization to simulate HTTP 503, timeouts, or dropped responses
  • Assert application logs and retry/backoff mechanisms

Example: Correct Implementation (Java)

@Test
void retriesOnceOnBilling503() {
    wireMockServer.stubFor(post(urlEqualTo("/billing"))
        .willReturn(aResponse().withStatus(503)));

    checkoutService.checkout(user);

    verify(billingClient, times(2)).charge(any());
    assertTrue(logsContain("Billing retry triggered"));
}

Example: Correct Implementation (C#/.NET with Polly)

// Configure resilience policy with Polly
var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .OrResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.ServiceUnavailable)
    .WaitAndRetryAsync(3, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)));

// Test resilience behavior
[Fact]
public async Task Checkout_BillingServiceUnavailable_RetriesAndSucceeds()
{
    // Arrange
    var callCount = 0;
    A.CallTo(() => _billingClient.ChargeAsync(A<ChargeRequest>._))
        .ReturnsLazily(() =>
        {
            callCount++;
            if (callCount < 3)
                throw new HttpRequestException("Service unavailable");
            return Task.FromResult(new ChargeResponse { Success = true });
        });

    // Act
    var result = await _checkoutService.CheckoutAsync(user);

    // Assert
    Assert.True(result.Success);
    Assert.Equal(3, callCount); // Retried twice before succeeding
}

[Fact]
public async Task Checkout_BillingServiceDown_CircuitBreakerOpens()
{
    // Arrange - All calls fail
    A.CallTo(() => _billingClient.ChargeAsync(A<ChargeRequest>._))
        .ThrowsAsync(new HttpRequestException("Service unavailable"));

    // Act & Assert
    await Assert.ThrowsAsync<BrokenCircuitException>(
        () => _checkoutService.CheckoutAsync(user));
}

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

84/100Analyzed 2/24/2026

High-quality resilience testing skill with solid technical depth. Provides comprehensive coverage of fault tolerance testing including circuit breakers, retries, and graceful degradation. Features practical code examples in both Java (WireMock) and C# (Polly), clear standards with MUST/SHOULD requirements, and actionable metrics. Well-structured with SDLC integration guidance. Minor gaps include step-by-step procedures and explicit tool recommendations in main text. Highly reusable across projects, not internal-only."

95
85
85
75
82

Metadata

Licenseunknown
Version1.0.0
Updated1/30/2026
Publisherspallempati

Tags

apici-cdobservabilitytesting