BACK TO INTELLIGENCE
QA / TESTINGSeptember 28, 202521 min

Quality Assurance in Continuous Delivery

The 'Works on My Machine' excuse is dead. A technical deep-dive into the Testing Pyramid, E2E Automation with Playwright, and Synthetic Monitoring in Production.

The Velocity Paradox

In software engineering, there is a pervasive myth: "You can have it Fast, or you can have it Good. Pick one." This is the Velocity Paradox. Traditional "Waterfall" methodology prioritized "Good." We spent 6 months coding, then 3 months testing. The result was high quality, but zero velocity. We shipped once a year. The Startup culture prioritized "Fast." "Move Fast and Break Things." The result was high velocity, but bugs, outages, and customer churn.

Continuous Delivery (CD) challenges this paradox. It claims: "If you want to go Fast, you MUST be Good." You cannot deploy to production 10 times a day if you don't trust your code. Speed requires Stability. Stability requires Automated Quality Assurance.

This whitepaper details the architecture of a modern QA pipeline that enables "Elite" DORA metrics (Deploying on demand with <15% change failure rate).


Part 1: The Death of Manual QA

In the old world, "QA" was a person. A developer wrote code. They threw it "over the wall" to the QA Analyst. The QA Analyst opened a spreadsheet (The Test Plan) and manually clicked through the app for 3 days.

  • "Click Logic button."
  • "Wait 3 seconds."
  • "Verify dashboard loads."

This fails in 2025 because:

  1. Scale: Modern apps have too many permutations (Mobile, Desktop, Tablet, Chrome, Safari, Dark Mode, Light Mode). A human cannot check them all.
  2. Speed: If deployment takes 10 minutes, testing cannot take 3 days.
  3. Boredom: Humans are bad at repetitive tasks. We miss things. Robots do not getting bored.

The Shift: QA is not a role; it is a Codebase. QA Engineers are now "Systems Engineers in Test" (SDETs). They write code that tests code.


Part 2: The Testing Pyramid (Architectural Balance)

We organize our tests based on the Google Testing Pyramid.

1. Unit Tests (The Base - 70%)

  • Scope: Single function or class. No Database. No Network.
  • Example: expect(calculateTax(100, 0.2)).toBe(20);
  • Speed: Microseconds.
  • Cost: Cheap.
  • Rule: Every bug found in production should result in a new Unit Test to prevent regression.

2. Integration Tests (The Middle - 20%)

  • Scope: The interaction between two units (e.g., API + Database).
  • Example: "POST /api/users should create a row in Postgres."
  • Speed: Milliseconds/Seconds. Requires Docker containers.
  • Tooling: Testcontainers, Jest, Supertest.

3. End-to-End (E2E) Tests (The Tip - 10%)

  • Scope: The entire application, from UI to DB.
  • Example: "Open Browser -> Click 'Buy' -> Verify Email Sent."
  • Speed: Slow (Minutes).
  • Flakiness: High. (Sometimes the network is slow and the test fails).
  • Tooling: Playwright (The modern standard), Cypress.

The Anti-Pattern (The Ice Cream Cone): Most legacy companies have this inverted. They have 0 Unit tests and thousands of Manual/E2E tests. Result: The build takes 4 hours to run. Developers stop running tests. Bugs explode.


Part 3: The CI/CD Pipeline as Gatekeeper

The Pipeline is the robot that enforces the law. Every time a developer pushes code to Git (Pull Request), the Pipeline wakes up.

Stage 1: The Fast Feedback (The Linter)

  • Checks syntax (ESLint).
  • Checks formatting (Prettier).
  • Time: 30 seconds.

Stage 2: The Logic Check (Unit Tests)

  • Runs thousands of unit tests.
  • Time: 2 minutes.

Stage 3: The Build

  • Compiles the Docker Image.
  • Time: 3 minutes.

Stage 4: The Integration

  • Spins up a temporary Database. Runs API tests against it.
  • Time: 5 minutes.

The Gate: If ANY stage fails, the Red Light flashes. The Merge Button is disabled. You physically cannot merge bad code into the main branch.


Part 4: Playwright and Modern E2E

Selenium is dead. It was slow, flaky, and relied on external drivers. Playwright (by Microsoft) is the modern standard.

  • Headless: Runs without a visible UI (fast).
  • Auto-Wait: It automatically waits for elements to appear (no more sleep(1000)).
  • Trace Viewer: When a test fails, it gives you a video + snapshot of the DOM at that exact second.

Visual Regression Testing: Playwright can take a screenshot of your localized UI and compare it to the "Gold Master" pixel-by-pixel. If a CSS change accidentally moved a button by 2 pixels, the test fails. This catches "Visual Bugs" that code tests miss.


Part 5: Shift Right (Testing in Production)

Testing doesn't stop after deployment. We use Synthetic Monitoring. We deploy "Robot Users" (Canaries) that run in production 24/7.

  • Scenario: Every 5 minutes, a robot tries to Login and Checkout.
  • Result: It records the latency and success/fail.
  • Alert: If Login takes > 5 seconds, page the On-Call Engineer.

This allows us to detect outages before real users complain on Twitter. "The site is up" (HTTP 200) often lies. "The user can buy" (Synthetic Success) is the truth.


Part 6: Feature Flags and Progressive Delivery

How do you test a risky new feature? You don't just "Launch" it to 100% of users. You use Feature Flags (LaunchDarkly, Statsig).

  1. Code Deploy: Deploy the code, but wrap it in an if (false) block. (Dark Launch).
  2. Internal Test: Enable flag for @denizberke.com users. (Dogfooding).
  3. Canary: Enable for 1% of random users. Watch logs for errors.
  4. Ramp: 10% -> 50% -> 100%.

If an error spikes at 10%, you Kill Switch the feature instantly. The code stays, but the path is closed. This decouples Deployment (Technical act) from Release (Business act).


Part 7: Managing Flaky Tests (The Quarantine Pattern)

The enemy of CI/CD is not Bugs; it is Flakiness. A "Flaky Test" passes 90% of the time and fails 10% of the time due to network jitter, race conditions, or cosmic rays. When a test flakes, developers lose trust. They start ignoring the Red Light. "Oh, that's just the login test, it always fails." This is fatal.

The Quarantine Protocol:

  1. Identification: If a test fails in Main, re-run it 3 times immediately. If it passes once, it is Flaky.
  2. Isolation: Move the test file to a quarantine/ folder or mark it @flaky.
  3. Exclusion: The CI Search path excludes @flaky from the Block-Merge gate.
  4. Remediation: A ticket is auto-created. A developer must fix the flake and move it back to Main within 48 hours.
  5. Deletion: If it isn't fixed in 48 hours, Delete the test. No test is better than a flaky test.

Part 8: Dockerized Test Environments (Testcontainers)

How do you test database interactions?

  • Bad Way: Use an in-memory DB (H2/SQLite) for testing, but Postgres for Prod.
    • Risk: You miss bugs specific to Postgres JSONB syntax.
  • Bad Way: Connect to a shared "Dev" database.
    • Risk: Two tests run at the same time and overwrite each other's data (Race Condition).

The Modern Way: Testcontainers We use Docker to spin up ephemeral infrastructure inside the test.

// Jest / Node.js
beforeAll(async () => {
  const container = await new GenericContainer("postgres:15")
    .withExposedPorts(5432)
    .withEnvironment({ POSTGRES_PASSWORD: "test" })
    .start();
  
  const port = container.getMappedPort(5432);
  // Connect ORM to this port
});

Every single test run gets a pristine, fresh Postgres instance. It takes 2 seconds to boot. It ensures Hermetic Testing.


Part 9: The Continuous Quality Checklist

Audit your pipeline against these standards.

  1. [ ] Pre-Commit Hooks: Do you run linting/formatting (Prettier/ESLint) before the dev even commits?
  2. [ ] The 10-Minute Rule: Does your test suite run in <10 minutes? If slower, devs will skip it. Parallelize.
  3. [ ] Flaky Test Hunter: Do you assume "It's just a flake" and re-run? Quarantine flaky tests immediately.
  4. [ ] Test Data Management: Are you testing against "Production-Like" data? Or empty tables?
  5. [ ] Visual Regression: Use Percy or Chromatic. A pixel-perfect CSS break is a bug too.
  6. [ ] Security Scan (SAST): Run SonarQube or Snyk in the pipeline. Catch vulnerabilities early.
  7. [ ] Artifact Immutability: Do you build the binary once and promote it? Or rebuild for staging? (Build once!).
  8. [ ] Feature Flags: Can you turn off the new feature in 1 second without a redeploy? (LaunchDarkly).
  9. [ ] Smoke Tests: After deploy, does a robot log in and check the basics?
  10. [ ] On-Call Empathy: Do the devs who wrote the code hold the pager? (They should).

Part 10: Frequently Asked Questions (FAQ)

Q: Who is responsible for QA? A: Everyone. "QA" is not a person; it is a role. Modern teams don't have "QA Departments." Developers write tests. QA Engineers write Test Infrastructure.

Q: My E2E tests are flaky. What do I do? A: Delete them. Serious. A flaky test is worse than no test because it destroys trust. Delete it, or rewrite it as a Unit Test.

Q: Should we aim for 100% Code Coverage? A: No. That is vanity. Aim for 80% coverage of business logic. Don't write tests for getters/setters or standard library calls. Test the "Critical Path."


Part 11: The AI Tester (Future)

In 2026, we won't write test scripts. We will give the AI the Figma file and the URL.

  • Prompt: "Explore the app. Find bugs. Try to break the payment flow."
  • Autonomous Agents will crawl your app 24/7, trying every edge case (Negative Testing) that a human would never think of. QA will shift from "Writing Tests" to "Auditing AI Bug Reports."

Conclusion: The Culture of Quality

Quality Assurance is not a department. It is a mindset. It is the developer saying: "I will write a test to prove this works." It is the Product Manager saying: "We will not ship until the pipeline is green."

At DENIZBERKE, we automate the boring parts of QA so humans can focus on the creative parts (Exploratory Testing). We sleep soundly at night because we know the Robots are watching the walls.

#QA#Testing#CI/CD#DevOps#Automation#Playwright