Guide

How to Fix Flaky Playwright Tests: A Complete Guide

Playwright's auto-waiting solves most timing issues. The flakiness that remains comes from patterns it can't see. Here's how to find and fix them.

Dhaval Shreyas

Co-founder & CEO at Pie

10 min read

What you’ll learn

Why Playwright tests become flaky despite auto-waiting
Seven proven fixes for common Playwright flakiness patterns
Configuration settings that maximize test stability
How to debug flaky tests using Playwright’s trace viewer

Playwright was supposed to solve the flaky test problem. Auto-waiting. Reliable selectors. Modern architecture built for the modern web.

And yet here you are, watching the same test pass and fail on consecutive runs.

You’re not alone. A 2019 developer survey found that 58% of developers deal with flaky tests at least monthly, with 79% rating it a moderate or serious problem. The framework handles many timing issues automatically, but it can’t fix everything.

Most Playwright flakiness falls into predictable categories with known solutions. This guide covers seven fixes that address the majority of cases, plus configuration strategies and debugging techniques for the rest.

Why Playwright Tests Become Flaky

Playwright’s auto-waiting eliminates the most common cause of E2E test flakiness: explicit sleep statements. But modern web applications have complexity that even intelligent waiting can’t fully address.

Network timing variability — Auto-wait handles DOM elements appearing, but it doesn’t automatically wait for the data those elements need. A list component might be visible before its API response arrives, causing assertions on empty content.
Animation and transition states — Playwright waits for elements to be actionable, but “actionable” doesn’t account for CSS animations. An element might be clickable while still sliding into position, causing the click to miss.
Race conditions in application code — If your React component updates state twice in quick succession, Playwright might assert between updates. This isn’t Playwright’s fault, but it shows up as a flaky test.
Selector brittleness — Dynamic class names, changing DOM structures, and framework-generated attributes make selectors unreliable. A selector that works today might not match tomorrow’s deployment.
CI environment differences — Tests that pass locally can fail in CI due to resource constraints, slower network, or different browser rendering.

Understanding which category your flakiness falls into determines which fix will actually work. For the broader picture of why tests become flaky, see our comprehensive guide.

Playwright’s Built-in Flakiness Prevention

Before adding fixes, ensure you’re using what Playwright already provides.

Auto-waiting — Enabled by default for all actionability checks. When you call click(), Playwright waits for the element to be visible, stable, receiving events, and enabled. Don’t disable this.
Web-first assertions — Wait for conditions to be met rather than checking once and failing. Use expect(locator).toBeVisible() instead of checking visibility manually. Here’s the difference:

// Good: Web-first assertion waits automatically
await expect(page.getByText('Success')).toBeVisible();

// Bad: Manual check doesn't wait
const isVisible = await page.getByText('Success').isVisible();
expect(isVisible).toBe(true);

Test isolation — Browser contexts ensure each test gets a fresh state. Every test in Playwright gets its own BrowserContext by default, equivalent to an incognito window.
Retry on failure — Can be configured to re-run failing tests automatically. This helps identify flaky tests but shouldn’t be a permanent solution. The key is using retries as a diagnostic tool, not a crutch. If a test consistently needs retries to pass, it has a problem worth fixing.

// playwright.config.js
export default {
  retries: process.env.CI ? 2 : 0, // Retry in CI only
};

These features handle many flakiness sources automatically. The fixes below address what they can’t.

7 Fixes for Flaky Playwright Tests

1. Wait for Network, Not Just DOM

Auto-waiting handles element visibility, but your data might not be loaded yet.

// Flaky: Element visible but data not loaded
await page.goto('/dashboard');
await expect(page.getByText('Total Revenue')).toBeVisible();

// Stable: Wait for the API response too
await page.goto('/dashboard');
await page.waitForResponse(resp =>
  resp.url().includes('/api/metrics') && resp.status() === 200
);
await expect(page.getByText('Total Revenue')).toBeVisible();

When a single action triggers multiple API calls, wait for all of them simultaneously with Promise.all. This ensures you don’t proceed until every dependency has loaded:

await Promise.all([
  page.waitForResponse('**/api/users'),
  page.waitForResponse('**/api/permissions'),
  page.click('button[data-testid="load-user"]'),
]);

2. Use Role-Based and Text-Based Selectors

CSS selectors break when classes change. XPath breaks when DOM structure changes. Role and text selectors are more resilient.

// Fragile: Depends on class names
await page.click('.btn-primary.submit-form');

// Fragile: Depends on DOM structure
await page.click('div.form-container > div:nth-child(3) > button');

// Stable: Uses accessible role
await page.getByRole('button', { name: 'Submit' }).click();

// Stable: Uses visible text
await page.getByText('Submit Form').click();

When semantic selectors don’t work (perhaps multiple buttons have the same label, or the element lacks accessible attributes), use data-testid attributes. These are explicit contracts between your code and your tests:

// In your component: <button data-testid="checkout-button">
await page.getByTestId('checkout-button').click();

3. Handle Animations and Transitions

Elements in motion can cause clicks to miss or assertions to fail mid-transition.

// Disable animations globally in test setup
await page.addInitScript(() => {
  const style = document.createElement('style');
  style.textContent = `
    *, *::before, *::after {
      animation-duration: 0s !important;
      transition-duration: 0s !important;
    }
  `;
  document.head.appendChild(style);
});

For specific animations you can’t disable, wait for stability:

// Wait for element to stop moving
await page.getByRole('dialog').waitFor({ state: 'visible' });
await page.waitForTimeout(100); // Brief pause for animation
await page.getByRole('button', { name: 'Confirm' }).click();

Note: waitForTimeout is generally discouraged, but small pauses for animations are acceptable when other approaches fail.

4. Mock External Dependencies

Third-party APIs introduce variability you can’t control.

// Mock Stripe API
await page.route('**/api.stripe.com/**', route => {
  route.fulfill({
    status: 200,
    contentType: 'application/json',
    body: JSON.stringify({
      id: 'pi_test',
      status: 'succeeded'
    }),
  });
});

// Mock slow or unreliable analytics
await page.route('**/analytics.example.com/**', route => route.abort());

Validate mocks periodically against real responses to ensure they stay realistic.

5. Stabilize Time-Dependent Tests

Tests involving dates, timers, or scheduled events behave differently depending on when they run. A test that passes Monday morning might fail Friday evening because a “this week” filter returns different results. A countdown timer might complete at different points in test execution.

The fix is to freeze time during test execution:

// Fix the date for consistent assertions
await page.addInitScript(() => {
  const fixedDate = new Date('2026-03-15T10:00:00Z');
  Date.now = () => fixedDate.getTime();
});

// Or use Playwright's clock API (recommended for Playwright 1.45+)
await page.clock.install({ time: new Date('2026-03-15T10:00:00Z') });
await page.clock.pauseAt(new Date('2026-03-15T10:00:00Z'));

The clock API is cleaner and works across frames and workers. Use it when you need to test timer-based functionality like session timeouts or polling intervals.

6. Isolate Test Data

Tests that share data fail randomly based on execution order. Test A creates a user, Test B assumes it exists, Test C deletes it. Run them in a different order and everything breaks. This is why proper test isolation matters.

Each test should create its own data and clean up after itself:

// Generate unique test data
const testId = `test-${Date.now()}-${Math.random().toString(36).slice(2)}`;
const testUser = {
  email: `user-${testId}@test.example.com`,
  name: `Test User ${testId}`,
};

// Clean up in afterEach
test.afterEach(async ({ request }) => {
  await request.delete(`/api/users/${testUser.id}`);
});

The timestamp plus random suffix ensures uniqueness even when tests run in parallel. Cleanup in afterEach runs regardless of whether the test passed or failed, preventing data accumulation across runs.

7. Configure CI-Specific Settings

CI environments have less CPU, memory, and network bandwidth than your local machine. Elements render slower. API responses take longer. Race conditions that never happen on your laptop become common in CI.

Adjust timeouts and concurrency accordingly:

// playwright.config.js
export default {
  timeout: process.env.CI ? 60000 : 30000,
  expect: {
    timeout: process.env.CI ? 10000 : 5000,
  },
  use: {
    trace: 'on-first-retry',
    video: process.env.CI ? 'on-first-retry' : 'off',
  },
  workers: process.env.CI ? 1 : undefined, // Start with 1, increase once stable
};

Start with workers: 1 in CI to eliminate parallel execution as a variable. Once tests are stable, increase workers gradually. If flakiness returns at higher concurrency, you have a shared state problem to fix.

Configuring Playwright for Stability

A well-configured playwright.config.js prevents many flakiness issues before they start. The settings below balance stability with diagnostic capability:

import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  // Fail fast on flaky tests
  forbidOnly: !!process.env.CI,

  // Retries for diagnosis, not as a crutch
  retries: process.env.CI ? 2 : 0,

  // Capture artifacts only when needed
  use: {
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'on-first-retry',
  },

  // Reasonable timeouts
  timeout: 30000,
  expect: {
    timeout: 5000,
  },

  // Consistent viewport
  use: {
    viewport: { width: 1280, height: 720 },
    locale: 'en-US',
    timezoneId: 'America/New_York',
  },
});

Key settings explained: forbidOnly prevents accidentally committing test.only() calls that skip tests. trace: 'on-first-retry' captures debugging artifacts only when needed, keeping CI fast. Fixed viewport, locale, and timezone eliminate environment variables that could cause tests to behave differently across machines.

Debugging Flaky Playwright Tests

When a test flakes, you need data. Three techniques help isolate the cause:

1. Use the Trace Viewer

Playwright’s trace viewer captures DOM snapshots, network requests, and console logs at each step of test execution. Enable it with trace: 'on-first-retry' and open traces with:

npx playwright show-trace trace.zip

The trace shows exactly what the page looked like when an assertion failed, what network requests were pending, and what console errors occurred. This is often enough to identify timing issues or missing data.

2. Reproduce Locally with Repetition

Flaky tests don’t fail every time. Running a test once proves nothing. Run it repeatedly to expose intermittent failures:

npx playwright test --repeat-each=10 tests/checkout.spec.js

Ten runs gives you a reasonable sample. If all ten pass locally but the test fails in CI, the issue is environmental: resource constraints, network latency, or slower rendering. If it fails even once locally, you’ve isolated a reproducible problem you can debug.

3. Stress Test with Parallelism

Run tests concurrently to expose shared state and race conditions:

npx playwright test --workers=10 --repeat-each=5

Tests that pass when run alone but fail under parallel execution are sharing state they shouldn’t. This reveals database pollution, global variables, or improper test isolation.

When Playwright Isn’t Enough

Sometimes the problem isn’t Playwright configuration. It’s the testing approach.

Signs you’ve hit this point:

You’re spending more time fixing tests than writing features
Every UI change breaks multiple tests regardless of selectors used
Flakiness remains despite implementing all the fixes above
Your team has lost trust in the test suite

When E2E test maintenance consumes significant engineering time, consider whether traditional scripted testing is the right approach. Autonomous testing platforms interact with your application the way humans do, identifying elements by appearance rather than brittle selectors. Tests adapt to UI changes automatically, removing the selector maintenance that causes most Playwright flakiness.

The goal is reliable test feedback, not mastery of a specific framework. If Playwright is delivering that for your team, these fixes will help you get more from it. If it isn’t, other approaches exist.

How Autonomous QA Eliminates Playwright Flakiness

Every fix in this guide addresses a symptom. Autonomous QA addresses the cause: scripted tests are inherently brittle because they encode assumptions about implementation details.

Vision-based testing takes a fundamentally different approach. Instead of scripting interactions with selectors and waits, these platforms interact with applications the way humans do, by understanding what’s on screen.

Vision-Based Element Recognition

Playwright flakiness often stems from selector brittleness. A designer changes a class name. A developer restructures a component. The test breaks despite the feature working perfectly.

Vision-based testing identifies elements by their visual appearance and semantic meaning. A “Submit” button is recognized as a submit button regardless of whether it’s implemented as <button class="btn-primary">, <input type="submit">, or a custom React component. UI refactors that break Playwright selectors don’t affect tests that understand what they’re looking at.

Self-Healing Adapts to Change

When applications change, self-healing tests adapt automatically. If a button moves from the header to a sidebar, visual recognition finds it in the new location. If a form field gets renamed internally, the test still interacts with it because it recognizes the label.

This eliminates the maintenance cycle that consumes so much engineering time: run tests, see failures, investigate whether they’re real bugs or broken selectors, update selectors, repeat. Tests stay green when features work, regardless of implementation changes underneath.

True Isolation by Design

Playwright provides browser context isolation, but shared databases, APIs, and backend state still cause flakiness. Autonomous QA platforms run each test in a completely isolated environment with fresh state. No test pollution. No order dependencies. No debugging why test A passes alone but fails after test B.

The seven fixes in this guide will make your Playwright suite more stable. But if you’re spending more time maintaining tests than building features, the problem might not be configuration. It might be the fundamental approach of encoding application behavior in brittle scripts.

Fixes Help. But They Won’t Last.

The seven strategies above will make your Playwright tests more stable. Then a designer changes a class name. A developer restructures a component. Your tests break again.

That’s the trade-off with scripted testing. Every selector, every waitForResponse, every data-testid creates a dependency on implementation details. Configuration can only take you so far.

Vision-based testing removes that coupling. Instead of checking class names, it verifies what users see. The implementation changes. The test keeps working.

Our autonomous QA platform doesn’t use selectors at all. If test maintenance still dominates your engineering time after implementing these fixes, the problem isn’t configuration. It’s the scripted model itself.

Tired of Fixing Flaky Playwright Tests?

See how autonomous QA eliminates selector brittleness and timing issues by design.

Book a Demo

Frequently Asked Questions

CI environments typically have less CPU, memory, and network bandwidth than local machines. Elements load slower, animations take longer, and race conditions that never occur locally become common. Increase timeouts, use explicit waits for network requests, and consider running with workers: 1 initially to isolate the issue.

Retries are a diagnostic tool, not a solution. Configure retries with trace: 'on-first-retry' to capture what went wrong, then fix the underlying issue. A test that needs retries to pass is a test with a real problem.

Run the test against a static version of your app (same commit, same data). If it still flakes, the test is the problem. If it stops flaking, your application has a race condition or timing issue worth investigating.

No. Playwright's auto-waiting actually reduces flakiness compared to older frameworks. But its speed can expose timing issues that slower frameworks accidentally hide. The flakiness was always there; Playwright just reveals it faster.

Inject CSS that sets animation-duration and transition-duration to 0s using page.addInitScript(). This prevents elements from being in motion during assertions or clicks, which is a common source of flakiness.

Use getByRole() and getByText() for semantic, resilient selectors. When those don't work, add data-testid attributes specifically for testing. Avoid CSS class selectors and deeply nested XPath expressions. These break when designers change styles or developers restructure components.

Configure 2 retries in CI and 0 locally. Enable trace: 'on-first-retry' to capture debugging artifacts. Treat retries as a diagnostic tool that reveals flaky tests, not as a permanent solution. A test that consistently needs retries has a problem worth fixing.

For rare cases where explicit waiting is genuinely needed, like brief pauses after CSS animations that can't be disabled or detected. Use it sparingly with small values (100-200ms). If you're using waitForTimeout frequently, you're likely missing a better approach.

Dhaval Shreyas

Co-founder & CEO at Pie

13 years building mobile infrastructure at Square, Facebook, and Instacart. Payment systems, video platforms, the works. Now building the QA platform he wished existed the whole time. LinkedIn →