Maestro + Flutter: E2E Tests in YAML Without the Pain

Mobile E2E testing is a pain point. Appium drags along Selenium, Java, and hours of configuration. Detox only works with React Native. Flutter’s built-in integration_test can’t tap a system dialog. Patrol is solid but requires access to source code.

Maestro works differently: YAML files, black-box testing, no Dart code in tests. These are patterns and pitfalls collected while testing a Flutter app with dozens of screens and AI features. What works, what needs workarounds, where Maestro saves time, and where it adds headaches.

Why Maestro When Everything Else Exists

There’s no shortage of mobile E2E tools. The question is which one fits a Flutter project where one or two developers handle everything, with no dedicated QA team.

Framework	GitHub Stars	Flutter	Test Language	Key Trait
Appium	21.2k	via plugin	any (WebDriver)	heavy, enterprise
Detox	11.8k	none	JS/TS	React Native only
Patrol	1.2k	native (Dart)	Dart	requires source code
integration_test	Flutter SDK	native	Dart	no OS-level access
Maestro	10.8k	first-class	YAML	black-box

Appium is the standard for large teams with dedicated automation engineers. You can make it work with Flutter through appium-flutter-driver, but that’s another abstraction layer on top of an already complex stack.

Detox is out immediately — React Native only.

Patrol is a strong option for Flutter. Written in Dart, extends integration_test, handles system dialogs. Since v4.0 it supports Web too. But tests are Dart code running inside the app process (gray-box). Some see this as an advantage, others as tight coupling.

integration_test from the Flutter SDK — minimal dependencies, Dart, gray-box. But anything outside the app boundary — permission dialogs, push notifications, system toggles — is unreachable.

Maestro works through the accessibility tree. It doesn’t know about Flutter, Dart, or widgets. It sees what the user sees. Tests are YAML. Getting started is easy: install the CLI, write a file, run it.

curl -fsSL "https://get.maestro.mobile.dev" | bash
# or macOS:
brew tap mobile-dev-inc/tap && brew install mobile-dev-inc/tap/maestro

First Test: Zero to Green in 5 Minutes

A Maestro flow is a YAML file with a list of commands. Each command is a user action or an assertion.

appId: com.example.myapp
---
- launchApp
- assertVisible: "Welcome"
- tapOn: "Sign Up"
- assertVisible: "Create an account"

Run it:

maestro test smoke_test.yaml

Maestro launches the app on a connected device or simulator, executes steps in order, fails on the first mismatch. No Gradle, no Xcode, no test runner build step.

A slightly more realistic example — a smoke test with tags and a login dependency:

appId: com.example.myapp
tags:
  - smoke
---
- runFlow: ../auth/login_flow.yaml

- assertVisible:
    text: "New journey|Новое путешествие"

- tapOn:
    text: "New journey|Новое путешествие"

- waitForAnimationToEnd

- extendedWaitUntil:
    visible:
      text: "Choose destination|Выберите направление"
    timeout: 10000

- assertVisible:
    text: "Create with AI|Создать с помощью AI"

runFlow pulls in login as a dependency — more on that below. text: "EN|RU" is a regex matcher for multilingual apps.

Flutter Specifics: How Maestro Sees Widgets

Maestro works through the accessibility bridge — the layer Flutter exposes for screen readers and automation tools. This means Widget Keys (Key('my_button')) don’t work. At all. Issue #128, open since 2022, is formally closed — but native Key support still doesn’t exist. The fix came from a different direction: Semantics.identifier.

Semantics.identifier — The Right Way

The Maestro team contributed Semantics.identifier directly to the Flutter SDK. It shipped in Flutter 3.19 (February 2024). The identifier lands in the accessibility tree and is addressable in Maestro as id.

Semantics(
  identifier: 'create_button',
  child: FloatingActionButton(
    onPressed: _onCreate,
    child: Icon(Icons.add),
  ),
)

- tapOn:
    id: "create_button"

The identifier is language-independent, doesn’t conflict with text content, and stays stable when you rename a button. For icons without text, use semanticLabel:

Icon(Icons.search, semanticLabel: 'Search')

BottomNavigationBar: Coordinate Fallback

Flutter renders BottomNavigationBar as a single accessibility element. Maestro can’t tap individual tabs by text — all tabs live in one container.

The workaround is coordinates:

- tapOn:
    point: "10%,93%"  # first tab, bottom edge

Percentage coordinates work across different screen sizes. Not ideal, but stable enough for a navigation bar with a fixed position.

iOS: Secure Text Fields Drop Characters

On iOS, Maestro types text through IME (Input Method Editor). Fields with obscureText: true — passwords — drop characters during fast input. Flutter rebuilds the widget on every character, and IME can’t keep up.

The fix: toggle password visibility before typing.

- tapOn:
    text: "Password|Пароль"

# Tap the eye icon -- switch to a regular text field
- tapOn:
    text: "Show password"

- waitForAnimationToEnd

# Now type into the visible (non-secure) field
- eraseText: 50
- inputText: "${TEST_PASSWORD}"

Without this step, the test enters “TestP” instead of “TestPassword123!” — and login silently fails.

Bilingual Tests: One Flow, Two Languages

If the app is multilingual and the language is determined by the device’s system locale, writing two test suites means double the work and double the maintenance.

Maestro uses regex for text matching. Pipe | means “or”:

- tapOn:
    text: "Skip|Пропустить"

- assertVisible:
    text: "Create an account|Создать аккаунт"

- extendedWaitUntil:
    visible:
      text: ".*afternoon.*|.*morning.*|.*evening.*|.*день.*|.*утро.*|.*вечер.*"
    timeout: 30000

Regex matches against the full text of the element. If a button reads “Do you have an account? Login”, you need .*Login (with the .* prefix) — otherwise it won’t match.

A regex trap: .*OK.* matches the word “Tokyo”. Regex doesn’t know about word boundaries in this context. The test taps the wrong button and navigates to a different screen. The fix: use specific text like "Got it|Понятно" instead of .*OK.*. The more precise the matcher, the more stable the test.

Reliability: Sleep, Timeouts, and Graceful Degradation

Pausing Without Sleep

Maestro has no sleep command. A deliberate choice — sleep makes tests brittle. But sometimes a pause is necessary: async initialization after launch, backend delays, animations that waitForAnimationToEnd doesn’t catch.

The workaround is extendedWaitUntil with a nonexistent element:

- extendedWaitUntil:
    visible:
      id: "__never_matches__"
    timeout: 5000
    optional: true

Maestro waits 5 seconds, doesn’t find the element, and optional: true prevents the test from failing. It looks like a hack — and it is. But it works predictably, and the intent is clear from the code.

optional: true — Tests That Don’t Break on Trivia

optional: true turns a hard assertion into a soft one. Element not found — the step is skipped, the test continues.

# Tooltip may or may not appear
- tapOn:
    text: ".*Got it.*|.*Понятно.*"
    optional: true

Without this flag, random tooltips and modals will tank the run every other time. With it, the test passes consistently, checking only the critical path.

The rule: hard asserts (optional: false, the default) for key checks. Soft asserts (optional: true) for variable elements — tooltips, onboarding hints, promo banners.

Performance Gates: Timeouts as Specification

When your app talks to an AI backend or runs heavy queries, timeouts become a performance contract:

# First AI response within 45 seconds
- extendedWaitUntil:
    visible:
      text: ".*Searching.*|.*Processing.*"
    timeout: 45000

# Full results within 120 seconds
- extendedWaitUntil:
    visible:
      text: ".*results.*|.*Done.*"
    timeout: 120000

Backend degrades and responds in 130 seconds — the test fails. The timeout here isn’t a magic number. It’s a documented speed requirement.

Test Suite Architecture

runFlow: DRY for Shared Dependencies

- runFlow: ../auth/login_flow.yaml

One login file, dozens of tests use it. Auth flow changed? Fix it in one place. runFlow works for other repeated blocks too: navigating to a specific screen, dismissing onboarding, preparing test data. Think of it as setUp() from xUnit, but in YAML.

Tags and Selective Execution

Each flow file can be tagged, then you run subsets:

tags:
  - smoke      # quick checks of critical paths
  - critical   # must never break
  - slow       # tests with AI or heavy queries

maestro test --tags smoke     # smoke before commit, ~2 minutes
maestro test --tags critical  # critical before merge, ~5 minutes
maestro test                  # full suite before release

Persistence Check: Data Survives a Restart

How to verify that data persists across app restarts:

# Step 1: change data (toggle a task, add to favorites)
- tapOn:
    text: ".*task name.*"
    optional: true

# Step 2: restart the app WITHOUT clearState
- launchApp
  # clearState NOT set -- filesystem is preserved

# Step 3: verify data is still there
- assertVisible:
    text: ".*task name.*"

launchApp without clearState kills the process but preserves the app’s filesystem — tokens, cache, local database. Data reloads from the server. If something disappears after restart, the problem is in backend sync, and the test catches it.

Isolation: Unique Test Users

Maestro supports JavaScript expressions. A unique email generator in two lines:

const ts = Date.now();
output.email = `e2e${ts}@test.example.com`;
output.password = 'TestPass123!';

Every run gets a clean user. No conflicts with parallel runs, no leftover state from previous tests.

AI in Maestro

Starting with version 1.39, Maestro supports three AI commands:

assertWithAI — describe your expectation in natural language. Maestro takes a screenshot and sends it to an LLM for verification:

- assertWithAI:
    assertion: "The screen shows a login form with email and password fields"

assertNoDefectsWithAI — automatic visual defect detection (clipped text, overlapping elements, broken layout):

- assertNoDefectsWithAI

extractTextWithAI — extract text from a screenshot via LLM. Useful for dynamic content that’s hard to regex-match.

All three commands are experimental. They work through OpenAI-compatible APIs — you can plug in your own model or use the default Maestro Cloud backend.

Maestro MCP: AI Agent Writes and Debugs Tests

Maestro has a built-in MCP server, launched via maestro mcp with no extra packages. 13 tools: take_screenshot, inspect_view_hierarchy, run_flow, tap_on, input_text, check_flow_syntax, query_docs.

An AI agent (Claude Code, Cursor, Windsurf) gets access to a live emulator: sees the screen, taps, types, runs flow files. The agent reads the accessibility tree and writes tests while verifying them on a real device. The write-run-fix cycle takes minutes instead of hours of manual YAML wrestling.

Where Maestro Stumbles

Maestro does a lot, but it has real problems.

Key('my_widget') in Flutter code doesn’t reach the accessibility tree. Maestro can’t use it for element selection. Open issue since 2022. The solution is Semantics(identifier:), but this means adding accessibility markup to production code. Some see this as a bonus (accessibility by default), others as noise.

No Sleep

Philosophically sound. Practically inconvenient. The extendedWaitUntil + __never_matches__ hack works, but every new team member asks “what is this magic.”

Breaking Changes Between Versions

Commands disappear between releases. wait: <number> was removed. regex: true was removed. Every major update means reviewing all flow files. For a large test suite, write a migration script ahead of time.

CI Isn’t Free

Two options:

Maestro Cloud — upload APK/IPA and flow files, tests run on their cloud devices. GitHub Action mobile-dev-inc/action-maestro-cloud@v2.0.2. Convenient, but paid.

Self-hosted — install Maestro CLI on a CI runner. For Android, any runner with ADB access. For iOS, a macOS runner. Parallelization via --shard-split N / --shard-all N (splits tests across N devices). Free, but requires infrastructure.

For a small team without CI, you can get by on local runs and MCP for a long time. But as the team grows, CI becomes a necessity.

WebView on iOS

Maestro can’t see elements inside WebView on iOS (issue #2293). Android has a workaround via Chrome DevTools Protocol (androidWebViewHierarchy: devtools), but iOS doesn’t. If your app uses WebView heavily, this is a blocker.

No Flutter Desktop Support

Mobile platforms only (Android, iOS) and Web (since v2.0). macOS, Windows, Linux are not supported.

What’s Next

If your Flutter project has no E2E tests, you can start with Maestro in an evening. Install the CLI, write a login flow, run it. No Xcode runners, no Gradle tasks, no test dependencies in pubspec.yaml.

Three steps to get started:

Add Semantics(identifier:) to key elements. Action buttons, input fields, navigation. 10—15 identifiers will cover the main flows.
Write a login flow and smoke tests. Login as a reusable runFlow. One smoke test per screen: did it open, are key elements visible.
Connect Maestro MCP to your IDE. maestro mcp gives an AI agent access to a live emulator. The agent sees the screen, writes the flow, verifies immediately.

Beyond that — tags for different test suites, performance gates, persistence checks, CI. None of it required on day one.

FAQ

How does Maestro compare to Patrol in terms of test reliability for Flutter apps with heavy animations?

Patrol runs inside the app process (gray-box), so it can call tester.pumpAndSettle() directly and wait for the widget tree to stabilize — this makes it inherently more reliable with complex animations. Maestro works through the accessibility tree from the outside, which means waitForAnimationToEnd and extendedWaitUntil with generous timeouts are your only tools. In practice, production Flutter apps with transitions over 600ms regularly require 2,000–3,000ms timeout buffers in Maestro to avoid flakiness that Patrol would handle deterministically.

What happens to test state when Maestro runs flows in parallel with `--shard-split`?

Each shard gets its own device and runs its assigned flow files independently. There is no shared state between shards — every flow starts from a clean device unless your flows explicitly reuse state via launchApp without clearState. The recommended pattern for parallel runs is unique test users per flow (timestamp-based email generation) to avoid server-side conflicts. Shard count is bounded by the number of available devices or emulators, not CPU cores.

Can `assertWithAI` be used in CI without paying for Maestro Cloud?

Yes — the AI commands use OpenAI-compatible APIs that you configure yourself via the MAESTRO_AI_* environment variables. Point them at any OpenAI-compatible endpoint: your own deployment, Groq, or a local model via Ollama. The screenshots are sent to whichever endpoint you configure, with no mandatory routing through Maestro Cloud. Cost per assertWithAI call is typically 2–5 cents with GPT-4o depending on screenshot resolution.