Tilt's lending flows span KYC, OTP, Plaid, ledger verification, and multi-currency disbursement. Pie tests all of it. 149 steps at a time, twice a week across five apps.
Industry
Consumer Fintech
Products
Tilt, Cashalo, Tilt-MX
Geographies
US, Philippines, Mexico
Use Case
Mobile App Regression QA
Engagement
Sept 2025 to present
Tilt is a consumer fintech platform serving users across the US, Philippines, and Mexico. In two years, they scaled from one app to five. One engineering organization. Three time zones. Eighteen hours of active code commits every day.
They ship twice a week per app. In fintech, that's radical. Compliance requires flawless accuracy across three jurisdictions. Every release is a live financial transaction. Every bug is a potential user loss. And they do it across five apps, with a QA team that didn't grow.
Mobile fintech is unforgiving. A bug in a loan calculation isn't a UX frustration. It's a user who borrowed 13,000 pesos and received 1,000. A legal disclosure rendering with an unpopulated template tag isn't a cosmetic glitch. It's a compliance risk. An onboarding screen that silently fails after Plaid authentication doesn't generate a support ticket. It generates churn.
Anil Kumar Lekkalapudi, Director of QA & Automation at Tilt (10+ years automating mission-critical systems at NVIDIA and across fintech), faced a scaling problem. Three apps became five. Twice-weekly ship cadence remained non-negotiable. The fix wasn't more people. It was automation that could carry the same depth Anil's team already proved manually, and run it on every build, across every app, in every market.
"I have a small, sharp QA team. We need to ship twice a week. But doing this manually doesn't work anymore. I want an automation platform that completely removes manual regression testing from what we do."
Manual regression eats 4–6 hours, twice a week, every week
Across three apps before Pie, that's up to 36 hours a week of QA labor just to clear regression. The cost compounds with every new app.
The deepest flows are the hardest to reach
End-to-end flows through KYC, OTP, and Plaid require complex state setup before a single test step runs. At twice-weekly release cadence, reaching these paths consistently is impractical without automation.
Five apps and counting. Manual coverage doesn't scale.
Three apps in 2025. Five by May 2026. Mexico is live, India is next. The surface area to cover keeps expanding faster than a QA team can hire.
One compliance miss across three jurisdictions is one too many
Legal disclosures, promissory note figures, and financial calculations need to be correct on every release. A gap in coverage is exposure that doesn't wait for a support ticket.
Five apps. Twice a week. End-to-end flows up to 149 steps.
Today, Pie owns the regression gate for every Tilt and Cashalo build that goes out the door. iOS and Android. End-to-end flows that traverse KYC, OTP, Plaid, loan disbursement, repayment, and ledger verification, on every meaningful release candidate. Anil's team reviews the deviations. Everything else, Pie has already cleared.
Cashalo Android
Daily
Cashalo iOS
~5-6 times per week
Tilt Android
~5 times per week
Tilt iOS
~4 times per week
Tilt-MX
Newly onboarded
Pie has executed 507 test runs across the portfolio since September 2025. In the last 60 days, that's ~310 runs, roughly 670 hours of manual regression labor. The equivalent of two full-time QA engineers working nothing but regression, for two months straight. That capacity is back in Anil's team, freed up for exploratory testing, root-cause work, and the QA judgment calls that automation isn't meant to make.
Most automation platforms are scripts. Every tap, every selector, every assertion is hand-coded. When the UI shifts, the script breaks.
Pie inverts that. Vision handles the front end. Pie's agent reads each screen, decides the next move, and adapts when the layout changes. Backend hooks handle only what vision can't see: production state that needs to be primed before the flow can run. Spinning up a test user with a specific credit history. Validating a ledger update mid-test. Resetting account state between runs.
That separation is what makes Tilt's 149-step journey possible. Pie navigates the UI on its own, through KYC, OTP, Plaid, disbursement, and repayment, and reaches into the backend only when state needs setting. No selector maintenance. No script rebuilds when a screen shifts. No ceiling on how deep a test can go.
Longest end-to-end test executions Pie runs at Tilt today (measured in agent steps: each step is a Pie decision + UI interaction + screenshot capture):
| # | App | Sequence | Steps |
|---|---|---|---|
| 1 | Cashalo Android | New user → GCash loan → full repayment with ledger verification | 149 |
| 2 | Cashalo iOS | New user → Maya loan → disbursement → repayment → payment-options matrix | 139 |
| 3 | Cashalo iOS | New user → Bank Account loan + Work Info validation | 137 |
| 4 | Cashalo Android | New user → Bank Account loan → repayment status | 124 |
| 5 | Cashalo Android | New user → Maya loan (Decline Marketing path) | 123 |
What a "step" means: Each agent step is a full reasoning cycle: Pie's agent reads the current screen, decides what to do next, executes a UI interaction (tap, type, scroll, gesture), captures a screenshot, and validates the result before moving on. The 149-step Cashalo flow includes app restarts mid-test, OTP entry, backend tool calls to update credit state, ledger verification after repayment, and back-navigation between five disbursement methods (Maya, GCash, Bank, 7-Eleven, ECPay).
Cashalo's 149-step flow is what that separation makes possible. One run covers a new user through disbursement across five payment methods (Maya, GCash, Bank, 7-Eleven, ECPay), ledger verification, app restarts, OTP, and back-navigation between paths. The deeper a flow runs, the more modals, state changes, and unexpected screens it encounters. Hard-coded selectors don't survive that pressure. Vision absorbs it. No script-based platform runs this end-to-end across weekly releases. Pie does, every release.
Why this matters
Pie is the only test platform Tilt operates that can author, maintain, and reliably re-run 120–150-step end-to-end flows against a moving codebase across iOS and Android, twice a week.
Pie has surfaced 49 approved bugs across the Tilt portfolio. Validated, confirmed defects, with pending and rejected finds filtered out. They cluster in the places where customer trust breaks.
The service fee on the loan breakdown screen didn't match the fee on the promissory note. The total amount due had a rounding error. Higher interest was calculated for a smaller loan amount than selected. These defects require long, stateful, end-to-end journeys to surface. You need to be inside a live loan flow to see the numbers disagree.
Promissory notes rendering raw template tags as literal text. Legal agreements showing the wrong loan amount after the slider adjusts. A legal section disappearing from the More tab entirely. In a regulated lending market, findings like these aren't bug reports. They're compliance items.
A user stuck in a loop after Plaid authentication. OTP verification triggering a "Query not allowed" error. An app that failed to launch after multiple attempts. Users who never get past onboarding never become borrowers.
Cashalo had sunset their "60 days / 2 installments" loan product on the backend. Pie found it was still navigable in the live app and flagged it in Slack before it became a support ticket.
Tilt-MX had been live in production for over a year. Pie's coverage went live on May 5, 2026, with 40 test cases covering CURP identity verification, CLABE bank validation, BBVA fund transfers, and Spanish-language onboarding.
On the very first regression run, Pie surfaced six defects across four areas:
Three mixed-language strings: English copy bleeding into Spanish registration screens
A misspelling in a production legal title (marital status screen)
A registration to login race condition: accounts created during registration weren't found on subsequent login attempts
Returning users routed back into onboarding instead of their dashboard
Six bugs in production, on day one, before a single user would run into them.
QA Hours Saved
In the last 60 days
Bugs Surfaced
Approved, validated defects
Steps in Longest Test
Cashalo full loan + repayment
Test Cases
Across 5 apps, 3 countries
| Engagement | Since September 2025 |
| Apps covered | 5 (Tilt iOS/Android, Cashalo iOS/Android, Tilt-MX Android) |
| Longest test | 149 agent steps (Cashalo Android full loan + repayment) |
| iOS builds covered | 66 (7.7.0 → 7.73.0) |
Nine months in. Five apps live. Three things on deck for the next quarter.
MCP integration
Replacing months of hand-built state-setup scripts with a single MCP call into Tilt's internal test infrastructure.
PR-level testing
Extending coverage from release-gate to per-PR. Catching regressions at the commit, not the release candidate.
Tilt-MX scale-up
Scaling Mexico to daily runs by Q3 2026, matching Cashalo Android's cadence.
The ability to call scripts at different stages of the prompt. This is how we were envisioning a QA tool should be.
Anil Kumar Lekkalapudi
Director of QA & Automation, Tilt
Tilt's 149-step regression suite started with one app and a kickoff call. Yours can too.
SOC 2 certified • Trusted by Tilt, Cashalo, and others shipping mobile twice a week