Guide

How Autonomous Test Discovery Actually Works (And Why Most Tools Get It Wrong)

Test automation solved execution, not discovery. Learn how vision-based AI agents explore your app and generate test cases without scripts or selectors.

Adithya Aggarwal

CTO & Co-founder at Pie

10 min read

In this guide, you’ll learn:

What autonomous test discovery actually means vs. marketing claims
Why DOM-based tools still require human-driven discovery
How vision-based AI agents explore and generate test cases
Real results: How Fi cut release validation from days to hours

We spent months evaluating test automation tools that claimed “AI-powered discovery.” The pitch was always the same: let AI find what to test. So we tried a few. Most still needed humans to map out the flows first, define the test boundaries, write the initial scripts. Then the “AI” would run those scripts faster or retry them when they failed. The discovery part was still manual.

This bothered us because discovery is actually the hard part. I’ve watched teams spend weeks documenting test cases for new features. Smart engineers, clicking through applications, writing down what they found, translating observations into scripts. The scripts ran fast once written. But writing them? That was the bottleneck nobody talked about.

This post breaks down what autonomous test discovery actually means, why most tools claiming “autonomous” still require human-driven discovery, and how vision-based agents are changing that equation.

What Is Autonomous Test Discovery?

Autonomous test discovery is when AI agents explore your application on their own. They identify what can be tested and generate test cases automatically. You don’t write scripts. You don’t record sessions. You don’t guide them down each path. Agents handle it through exploration, observation, and generation.

Exploration

Agents navigate your application the way a user would. They click buttons, fill out forms, follow links, and handle popups. Unlike scripted automation, they’re not following a predefined path. They make decisions about what to try next based on what they see on screen. Each agent explores different routes through your app, building a map of possible user journeys.

Observation

As agents explore, they observe everything. What elements appear on each screen? How does the UI respond to interactions? What state changes occur after an action? These observations become the raw material for test cases. The agent noticed that clicking “Add to Cart” increased the cart count by one—a testable assertion, automatically documented.

Generation

From exploration and observation, structured test scenarios emerge automatically. The output isn’t “click at coordinates (340, 220).” It’s “Add product to cart and verify cart count increases.” Test cases are semantic, readable, and tied to actual user behavior rather than brittle implementation details.

The Evolution of Test Discovery

The Manual Era

Before automation, testers explored applications by hand. They clicked through flows, tried unexpected inputs, looked for things that broke. Slow, but thorough. Human curiosity found edge cases that checklists missed. The problem: it didn’t scale. You can’t manually explore a complex app before every release.

The Script Era

Automation frameworks changed the game, partially. Testers could encode their discoveries as scripts. Explore once, execute forever. Except “forever” lasted until the next UI change. A button moves, a selector breaks, and suddenly your 500-test suite has 50 failures that aren’t real bugs. They’re maintenance debt. Gartner research shows implementation struggles, skill gaps, and high upfront costs still block adoption for over a third of teams.

The Record-Replay Era

Tools tried to bridge the gap. Record your exploration, replay it as a test. Playwright Codegen, Selenium IDE, Cypress Studio all work this way. In practice, these recordings were brittle. They captured clicks at specific coordinates, expected elements in exact positions. Any UI shift broke them. Still human discovery, just captured differently.

The Crawler Era

Some platforms took a different approach: crawl the application automatically. Map every URL, generate a test for each page. Better coverage on paper. But crawling URLs isn’t the same as understanding user journeys. A sitemap doesn’t know that users go from product page to cart to checkout to confirmation. It just sees four separate pages. Crawlers find pages. They don’t find flows.

Each era improved execution. None solved discovery.

Why Most AI Testing Tools Aren’t Truly Autonomous

Most “autonomous” tools still read the DOM. HTML elements, CSS selectors, JavaScript state. They find a button by its class name. Rename a class? Test breaks. Restructure components? Tests break. Switch frameworks? Start over. This reliance on code creates predictable failure modes.

Selector Fragility

When discovery depends on selectors, you’re encoding fragility into every test. The agent finds a button with id="submit-btn". Your developer changes it to id="checkout-submit". Discovery becomes rediscovery becomes maintenance. A React refactor that changes nothing functional can break hundreds of tests overnight.

Sitemap Limits

Crawling URLs generates impressive coverage reports. “We discovered 847 pages!” But pages aren’t user journeys. A checkout flow spans five pages. Login-protected features require authentication state. Multi-step forms need sequential completion. Crawlers see structure. They miss behavior.

Replay Constraints

Recording user sessions captures real behavior. It’s valuable for understanding what users do. But replaying those sessions assumes the UI stayed frozen. Session recordings are snapshots. Your app is a moving target. You’re always testing yesterday’s UI.

State Blindness

Authenticated flows, conditional UI, multi-step processes — all require memory. What happened before affects what happens next. Most tools see pages in isolation, with no understanding of session state or user context.

Discovery Through Vision: How Pie Does It Differently

We built Pie differently. Our agents see applications the way users do. Computer vision, not DOM inspection. They look at rendered screens, identify elements visually, understand context from what’s displayed. This is how autonomous test discovery finds test cases your team never documented.

A button is a button because it looks like one. Not because of its class name.

Vision-based discovery changes what’s possible:

Framework-Agnostic — React, Vue, Angular, Rails with jQuery, that legacy app nobody wants to touch. If it renders, we can explore it. No selectors. No integration.
Handles the Unexpected — Cookie banners, chat widgets, promo popups. These break selector-based tools constantly. Our agents do what humans do: see popup, dismiss it, keep exploring.
Parallel Exploration — Humans explore one path at a time. Our agents run hundreds in parallel, each taking different routes through your app. Days become minutes. We typically deploy 1,000+ agents on first discovery.
Self-Healing — UI changes? Button moved? Fields reordered? Visual understanding persists. Code changed completely? Doesn’t matter. Self-healing tests adapt automatically.

How Discovery Transformed Fi’s Release Cycle

Fi makes smart dog collars. GPS tracking, activity monitoring, escape alerts. Millions of dogs depend on their platform working. Reliability isn’t optional.

When Fi deployed autonomous discovery, our agents explored across iOS, Android, and web. They found flows nobody had documented. Collar disconnects mid-walk. Language settings change between sessions. Edge cases that existed but weren’t in any test plan.

📊 Customer Result

“Release validation went from two to three days to just a few hours. The way Pie set up allowed Fi to work alongside development without changing processes.” — Philip Hubert, Director of Mobile Engineering, Fi

That’s the difference between discovering what to test manually versus letting agents do it. Philip’s team didn’t write new scripts. They didn’t map flows by hand. Agents explored the app, found what mattered, and Fi shipped faster without adding headcount. 10x faster testing. 75% less manual effort. Edge cases found before users hit them.

Want Results Like Fi?

See autonomous discovery on your actual app. We'll show you what our agents find.

Book a Demo

No credit card required

Discovery Approaches Compared

The table below shows why most teams stay stuck. Manual testing and script-based automation both put discovery burden on humans. Only autonomous discovery removes that bottleneck entirely.

Aspect	Manual Testing	Script-Based Automation	Autonomous Discovery
Discovery Method	Human exploration	Human writes scripts	AI agents explore independently
Coverage	Limited by time	Limited to scripted paths	Expands automatically
UI Change Impact	Re-explore manually	Fix broken selectors	Self-corrects via vision
Maintenance	N/A	30-40% of sprint time	Zero
Edge Cases	Depends on tester curiosity	Only if scripted	Discovered automatically
Time to 80% Coverage	Days to weeks	Weeks to months	30 minutes

Look at the maintenance row. Script-based automation eats 30-40% of sprint time just keeping tests working. Gone. Not to testing. To housekeeping. Autonomous discovery eliminates that overhead because vision-based agents adapt when UI changes. QA teams stay stuck fixing broken locators instead of expanding coverage. Autonomous discovery gives that time back.

Discovery First, Execution Second

Twenty years of test automation got the order backwards. Faster script execution while humans stayed stuck writing scripts.

Autonomous discovery flips that. Pie is an autonomous testing platform where machines explore at machine speed. Vision-based agents see your app like users do. No selectors to break. No scripts to maintain.

The question isn’t whether to automate QA. It’s whether to start with discovery.

Try Autonomous Discovery

Point Pie at your staging URL and watch agents map your application automatically.

Book a Demo

SOC 2 Type II certified • No source code access

Frequently Asked Questions

First discovery run: about 30 minutes for 80% coverage. Subsequent runs are faster because agents learn your application’s patterns.

No. Upload your APK, IPA, or staging URL. Agents explore autonomously and generate test cases in plain English. You review and approve.

Agents handle authentication. Provide credentials once, and they navigate authenticated flows just like a real user would.

Same approach. Vision-based agents work on iOS and Android. No Appium scripts, no device farm headaches. Flutter, React Native, native. All of it.

Our QA experts review every flagged issue before it reaches you. Real bugs only. Zero false positive noise.

Yes. Point agents at specific flows, exclude areas, set priorities. Or let them explore everything and curate afterward.

Selenium and Playwright are execution frameworks. You write scripts, they run them. Discovery is still manual. Pie discovers what to test autonomously, then executes. No scripts to write or maintain.

Adithya Aggarwal

CTO & Co-founder at Pie

Eight years building search and delivery systems at Amazon. The kind of scale where flaky tests block billion-dollar releases. Now CTO at Pie, building AI agents that adapt when your UI changes. LinkedIn →