Case Study

Fintech at Scale: QA Automation Across Five Apps, Three Countries

Tilt's lending flows span KYC, OTP, Plaid, ledger verification, and multi-currency disbursement. Pie tests all of it. 149 steps at a time, twice a week across five apps.

49
Bugs Caught
670 hrs
QA Saved (60 Days)
5 apps
3 Countries
149
Steps in Longest Test
Tilt

Industry

Consumer Fintech

Products

Tilt, Cashalo, Tilt-MX

Geographies

US, Philippines, Mexico

Use Case

Mobile App Regression QA

Engagement

Sept 2025 to present

Meet Tilt

Tilt is a consumer fintech platform serving users across the US, Philippines, and Mexico. In two years, they scaled from one app to five. One engineering organization. Three time zones. Eighteen hours of active code commits every day.

They ship twice a week per app. In fintech, that's radical. Compliance requires flawless accuracy across three jurisdictions. Every release is a live financial transaction. Every bug is a potential user loss. And they do it across five apps, with a QA team that didn't grow.

The Challenge

Mobile fintech is unforgiving. A bug in a loan calculation isn't a UX frustration. It's a user who borrowed 13,000 pesos and received 1,000. A legal disclosure rendering with an unpopulated template tag isn't a cosmetic glitch. It's a compliance risk. An onboarding screen that silently fails after Plaid authentication doesn't generate a support ticket. It generates churn.

Anil Kumar Lekkalapudi, Director of QA & Automation at Tilt (10+ years automating mission-critical systems at NVIDIA and across fintech), faced a scaling problem. Three apps became five. Twice-weekly ship cadence remained non-negotiable. The fix wasn't more people. It was automation that could carry the same depth Anil's team already proved manually, and run it on every build, across every app, in every market.

"I have a small, sharp QA team. We need to ship twice a week. But doing this manually doesn't work anymore. I want an automation platform that completely removes manual regression testing from what we do."

Anil Kumar Lekkalapudi

Director of QA & Automation, Tilt

The Pre-Pie Reality

Manual regression eats 4–6 hours, twice a week, every week

Across three apps before Pie, that's up to 36 hours a week of QA labor just to clear regression. The cost compounds with every new app.

The deepest flows are the hardest to reach

End-to-end flows through KYC, OTP, and Plaid require complex state setup before a single test step runs. At twice-weekly release cadence, reaching these paths consistently is impractical without automation.

Five apps and counting. Manual coverage doesn't scale.

Three apps in 2025. Five by May 2026. Mexico is live, India is next. The surface area to cover keeps expanding faster than a QA team can hire.

One compliance miss across three jurisdictions is one too many

Legal disclosures, promissory note figures, and financial calculations need to be correct on every release. A gap in coverage is exposure that doesn't wait for a support ticket.

What Running Pie Looks Like at Tilt

Five apps. Twice a week. End-to-end flows up to 149 steps.

Today, Pie owns the regression gate for every Tilt and Cashalo build that goes out the door. iOS and Android. End-to-end flows that traverse KYC, OTP, Plaid, loan disbursement, repayment, and ledger verification, on every meaningful release candidate. Anil's team reviews the deviations. Everything else, Pie has already cleared.

Current Run Cadence

Cashalo Android

Daily

Cashalo iOS

~5-6 times per week

Tilt Android

~5 times per week

Tilt iOS

~4 times per week

Tilt-MX

Newly onboarded

Pie has executed 507 test runs across the portfolio since September 2025. In the last 60 days, that's ~310 runs, roughly 670 hours of manual regression labor. The equivalent of two full-time QA engineers working nothing but regression, for two months straight. That capacity is back in Anil's team, freed up for exploratory testing, root-cause work, and the QA judgment calls that automation isn't meant to make.

How Pie Reaches the Deepest Flows: Backend Calls Inside Any Test

Most automation platforms are scripts. Every tap, every selector, every assertion is hand-coded. When the UI shifts, the script breaks.

Pie inverts that. Vision handles the front end. Pie's agent reads each screen, decides the next move, and adapts when the layout changes. Backend hooks handle only what vision can't see: production state that needs to be primed before the flow can run. Spinning up a test user with a specific credit history. Validating a ledger update mid-test. Resetting account state between runs.

That separation is what makes Tilt's 149-step journey possible. Pie navigates the UI on its own, through KYC, OTP, Plaid, disbursement, and repayment, and reaches into the backend only when state needs setting. No selector maintenance. No script rebuilds when a screen shifts. No ceiling on how deep a test can go.

Coverage Depth That Most QA Platforms Can't Reach

Longest end-to-end test executions Pie runs at Tilt today (measured in agent steps: each step is a Pie decision + UI interaction + screenshot capture):

# App Sequence Steps
1 Cashalo Android New user → GCash loan → full repayment with ledger verification 149
2 Cashalo iOS New user → Maya loan → disbursement → repayment → payment-options matrix 139
3 Cashalo iOS New user → Bank Account loan + Work Info validation 137
4 Cashalo Android New user → Bank Account loan → repayment status 124
5 Cashalo Android New user → Maya loan (Decline Marketing path) 123

What a "step" means: Each agent step is a full reasoning cycle: Pie's agent reads the current screen, decides what to do next, executes a UI interaction (tap, type, scroll, gesture), captures a screenshot, and validates the result before moving on. The 149-step Cashalo flow includes app restarts mid-test, OTP entry, backend tool calls to update credit state, ledger verification after repayment, and back-navigation between five disbursement methods (Maya, GCash, Bank, 7-Eleven, ECPay).

Cashalo's 149-step flow is what that separation makes possible. One run covers a new user through disbursement across five payment methods (Maya, GCash, Bank, 7-Eleven, ECPay), ledger verification, app restarts, OTP, and back-navigation between paths. The deeper a flow runs, the more modals, state changes, and unexpected screens it encounters. Hard-coded selectors don't survive that pressure. Vision absorbs it. No script-based platform runs this end-to-end across weekly releases. Pie does, every release.

Why this matters

Pie is the only test platform Tilt operates that can author, maintain, and reliably re-run 120–150-step end-to-end flows against a moving codebase across iOS and Android, twice a week.

The Bugs Pie Found

Pie has surfaced 49 approved bugs across the Tilt portfolio. Validated, confirmed defects, with pending and rejected finds filtered out. They cluster in the places where customer trust breaks.

Financial Accuracy

The service fee on the loan breakdown screen didn't match the fee on the promissory note. The total amount due had a rounding error. Higher interest was calculated for a smaller loan amount than selected. These defects require long, stateful, end-to-end journeys to surface. You need to be inside a live loan flow to see the numbers disagree.

Legal Disclosures

Promissory notes rendering raw template tags as literal text. Legal agreements showing the wrong loan amount after the slider adjusts. A legal section disappearing from the More tab entirely. In a regulated lending market, findings like these aren't bug reports. They're compliance items.

Onboarding Failures

A user stuck in a loop after Plaid authentication. OTP verification triggering a "Query not allowed" error. An app that failed to launch after multiple attempts. Users who never get past onboarding never become borrowers.

Deprecated Product Still Live

Cashalo had sunset their "60 days / 2 installments" loan product on the backend. Pie found it was still navigable in the live app and flagged it in Slack before it became a support ticket.

Featured moment

Day 1 in Mexico: Six Bugs Before Lunch

Tilt-MX had been live in production for over a year. Pie's coverage went live on May 5, 2026, with 40 test cases covering CURP identity verification, CLABE bank validation, BBVA fund transfers, and Spanish-language onboarding.

On the very first regression run, Pie surfaced six defects across four areas:

  • Three mixed-language strings: English copy bleeding into Spanish registration screens

  • A misspelling in a production legal title (marital status screen)

  • A registration to login race condition: accounts created during registration weren't found on subsequent login attempts

  • Returning users routed back into onboarding instead of their dashboard

Six bugs in production, on day one, before a single user would run into them.

Tilt × Pie by the Numbers

Engagement Since September 2025
Apps covered 5 (Tilt iOS/Android, Cashalo iOS/Android, Tilt-MX Android)
Longest test 149 agent steps (Cashalo Android full loan + repayment)
iOS builds covered 66 (7.7.0 → 7.73.0)

What Tilt and Pie Are Building Next

Nine months in. Five apps live. Three things on deck for the next quarter.

MCP integration

Replacing months of hand-built state-setup scripts with a single MCP call into Tilt's internal test infrastructure.

PR-level testing

Extending coverage from release-gate to per-PR. Catching regressions at the commit, not the release candidate.

Tilt-MX scale-up

Scaling Mexico to daily runs by Q3 2026, matching Cashalo Android's cadence.

"
The ability to call scripts at different stages of the prompt. This is how we were envisioning a QA tool should be.

Anil Kumar Lekkalapudi

Director of QA & Automation, Tilt

See What Pie Can Find in Your App

Tilt's 149-step regression suite started with one app and a kickoff call. Yours can too.

SOC 2 certified • Trusted by Tilt, Cashalo, and others shipping mobile twice a week