Tip #1

Stop Testing Success. Kill the Database.

Intro to Chaos Engineering for QA: Testing resilience, not just functionality.

The Problem: The "Green Build" Illusion

Local Environment

Status Healthy
Latency5 ms
"All builds passed!" ✅

Production

Status Crashing
LatencyTIMEOUT (504)
  • 3rd Party API Outage
  • White Screen of Death

We are obsessed with testing the "Happy Path." But in production, networks lag, pods crash, and 3rd-party APIs fail. A standard automation test says "Failed." Chaos Engineering asks: "Did the app crash, or did it handle the failure gracefully?"

The Solution: Failure Injection

Don't wait for random failures

Hoping a database crashes during your test run is not a strategy. It leads to flaky tests and "cannot reproduce" bugs.

RECOMMENDED

Do use Failure Injection

Kill the database intentionally inside your test. Verify the application's "Plan B" (Retry logic or Fallback UI).

The Code (Python + Docker SDK)

Here is a simple "Chaos Test" that uses the Docker library to strangle the database while the Playwright test is running.

tests/chaos_db.py
import docker
from playwright.sync_api import Page, expect

def test_database_failure_resilience(page: Page):
    # Connect to local Docker daemon
    client = docker.from_env()
    db_container = client.containers.get("postgres-prod")

    # 1. Happy Path: App loads normally
    page.goto("/dashboard")
    expect(page.locator(".user-balance")).to_be_visible()

    # 🧨 CHAOS: Hard stop the database
    print("Injecting Chaos: Stopping DB...")
    db_container.stop()

    # 2. Resilience Assertion
    # BAD: White screen or Stack Trace 500
    # GOOD: Friendly Toast "Connection lost, retrying..."
    page.reload()
    expect(page.locator(".error-toast")).to_contain_text("Connection lost")

    # 🩹 RECOVERY: Bring it back to life
    db_container.start()
    
    # 3. Self-Healing Assertion
    expect(page.locator(".user-balance")).to_be_visible()

Can't access Docker in CI?

Debuggo can simulate a 503 Service Unavailable error directly on the network layer, without killing containers.

Try Error Injection

No root access required

Why this matters: Traditional QA checks the quality of features. Chaos QA checks the quality of architecture. If your test passes this scenario, you aren't just a Tester; you are a Reliability Engineer.