feat: Maestro E2E functional test suite for Omi app by hniane1 · Pull Request #6283 · BasedHardware/omi

hniane1 · 2026-04-02T22:21:04Z

What

Adds a complete Maestro E2E functional test suite for the Omi Flutter mobile app. Closes #3857.

Why

The Omi app needs automated functional tests to detect regressions in core user flows — sign-in, conversations, recording, chat, etc. This suite lets a maintainer run tests on their machine with a single command and get a clear pass/fail report.

Test Coverage

#	Flow	Tags	What It Tests
01	Login	`core`, `auth`	Google sign-in through consent sheet
02	Onboarding	`core`, `onboarding`	Name entry, permissions, device skip
03	Conversations	`core`, `conversations`	List, detail, Summary/Transcript/Action Items tabs, folder filters, three-dot menu
04	Memories	`core`, `memories`	View, create, edit, delete
05	Chat	`core`, `chat`	Send message, verify AI response
06	Apps/Plugins	`core`, `apps`	Browse marketplace, view app detail
07	Settings	`core`, `settings`	Navigate settings, verify key sections
08	Device Connection	`device_required`	BLE scan and pair Omi hardware
09	Recording	`device_required`	Audio capture and transcription verification
10	Logout	`core`, `auth`	Sign out and verify return to auth screen

How to Use

# Install Maestro
brew install maestro

# Run core tests (no Omi device needed)
bash app/.maestro/scripts/run_all.sh --tags core

# Run ALL tests (Omi device nearby + phone connected)
bash app/.maestro/scripts/run_all.sh --tags all

# Or via unified test.sh
bash app/scripts/test.sh --e2e --tags core

After the run, check app/.maestro/reports/report.md for the full Markdown report with:

Summary table (passed/failed/skipped)
Per-flow timing
Console output for debugging failures

Design Decisions

Tag system: core flows run on simulator/emulator (no physical Omi device needed). device_required flows need Omi hardware nearby.
DRY structure: Each flow is self-contained with no code duplication. Shared config in config/global.yaml.
Runner script: Tag-based filtering, JUnit XML output, Markdown report generation, configurable device target and output directory.
Integrated entry point: app/scripts/test.sh --e2e for Maestro, default (no flags) for unit/widget tests.

Files Changed

app/.maestro/ — 10 Maestro flow files, runner script, config, README
app/scripts/test.sh — Unified test runner
app/.gitignore — Exclude generated reports

…e#3857) - 10 Maestro flow files covering all core app functionality: - Login & Sign-in (Google OAuth) - Onboarding (name, permissions, device skip) - Conversations (list, detail, tabs, CRUD, folder filters) - Memories (view, create, edit, delete) - Chat (send message, AI response) - Apps/Plugins (marketplace browse, detail) - Settings (navigation, key sections) - Device Connection (BLE scan & pair) [device_required] - Recording & Transcription [device_required] - Logout (sign out, verify auth screen) - Tag system: 'core' (simulator-safe) vs 'device_required' (needs Omi HW) - Runner script with tag filtering, parallel-safe output, JUnit XML + Markdown reports - Unified test.sh entry point: --e2e flag for Maestro, default for unit/widget - Clean, DRY flow structure — no code duplication across flows - .gitignore updated to exclude generated reports

greptile-apps · 2026-04-02T22:25:59Z

Greptile Summary

This PR introduces a Maestro E2E functional test suite covering 10 core user flows (login, onboarding, conversations, memories, chat, apps, settings, device connection, recording, and logout) along with a Bash runner script and a unified app/scripts/test.sh entry point. The structure is well-organised and the tag-based filtering for core vs device_required flows is a sensible design.

There are several correctness issues that should be addressed before the suite is relied upon in CI:

global.yaml is never applied. run_all.sh calls maestro test <flow> without --config, so the onFlowStart: clearState directive is silently ignored. Flows are implicitly chained by shared app state rather than starting from a known baseline.
Chat response assertion is a no-op. extendedWaitUntil: visible: text: \".*\" matches any text already on screen, so the wait resolves immediately and does not verify that an AI response actually arrived.
Screenshot artifact mixing. The post-flow cp \"$HOME/.maestro/tests\"/*.png glob copies every screenshot ever produced by Maestro into the current flow's output directory, contaminating per-flow reports.
Tag extraction is capped at 5 lines (grep -A5), which will silently miss tags if a flow ever has more than ~4 of them.
pressKey: \"Enter\"\ optional: true uses unsupported sibling-key syntax for a scalar Maestro command and may not behave as intended.

Confidence Score: 2/5

Not ready to merge — the test suite has multiple logic bugs that cause it to not correctly verify app behaviour or maintain clean test state.

Three P1 bugs (global config never applied, meaningless chat assertion, screenshot cross-contamination) mean the suite would give false confidence when run. The flows themselves are well-structured and the runner script is a solid foundation, but these issues need to be fixed before the suite can be trusted.

app/.maestro/scripts/run_all.sh (runner bugs) and app/.maestro/flows/05_chat.yaml (assertion and syntax issues)

Important Files Changed

Filename	Overview
app/.maestro/scripts/run_all.sh	Core runner script with three bugs: global.yaml config is never passed to maestro test (clearState never fires), screenshot copy uses a global glob that mixes artifacts across flows, and tag extraction uses grep -A5 which silently truncates tags beyond 5 lines.
app/.maestro/flows/05_chat.yaml	AI response verification uses text: ".*" which matches anything visible, making the wait a no-op; also contains pressKey with an unsupported optional: true sibling-key syntax.
app/.maestro/config/global.yaml	Defines onFlowStart: clearState but the runner never passes this config file to maestro test, so the directive is silently ignored.
app/.maestro/flows/01_login.yaml	Login flow is well-structured with optional Google account picker handling; final wait for landing screen is non-optional which is correct for a login test.
app/.maestro/flows/09_recording.yaml	Recording flow uses a hardcoded 30-second wait (wait: 30000) which adds latency but is unavoidable for audio capture; correctly tagged as device_required.
app/scripts/test.sh	Clean unified test runner; correctly delegates to run_all.sh for E2E and flutter test for unit tests.
app/.gitignore	Correctly excludes generated .maestro/reports/ directory from version control.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["bash test.sh --e2e --tags core"] --> B["run_all.sh"]
    B --> C["collect_flows(tags)"]
    C -->|"grep -A5 tags:"| D["Flow file list"]
    D --> E{For each flow}
    E --> F["maestro test flow.yaml\n--format junit\n--output report.xml"]
    F -->|"exit 0"| G["PASS ✅"]
    F -->|"exit ≠ 0"| H["FAIL ❌"]
    G --> I["cp ~/.maestro/tests/*.png\nflow_output/\n⚠️ copies ALL screenshots"]
    H --> I
    I --> E
    E -->|done| J["Generate report.md"]
    J --> K[Exit 0 / 1]

    subgraph "Never Used"
        L["config/global.yaml\nonFlowStart: clearState\n❌ not passed via --config"]
    end

Comments Outside Diff (2)

app/.maestro/flows/05_chat.yaml, line 454-458 (link)

AI response wait is a meaningless assertion

text: ".*" matches any visible text on screen, so extendedWaitUntil here passes immediately regardless of whether an AI response has appeared. This makes the chat response verification effectively a no-op.

To actually verify that a response was received, wait for something that could only appear after the AI responds:
```
- extendedWaitUntil:
    notVisible:
      text: "Thinking|Loading|…"
    timeout: 30000
- extendedWaitUntil:
    visible:
      text: "recently|Here's|Based on"
    timeout: 5000
```
app/.maestro/flows/05_chat.yaml, line 449-451 (link)

optional: true on scalar pressKey command is unsupported syntax

Maestro supports optional: true as a nested key only when the command uses block-mapping syntax. For scalar commands written as pressKey: "Enter", adding optional: true as a sibling key creates {pressKey: "Enter", optional: true}, which Maestro does not recognise.

Change to block form:

_{Reviews (1): Last reviewed commit: "feat: add Maestro E2E functional test su..." | Re-trigger Greptile}

greptile-apps · 2026-04-02T22:26:02Z

app/.maestro/scripts/run_all.sh

+    maestro test "$flow_file" \
+        $DEVICE_FLAG \
+        --format junit \
+        --output "$flow_output/report.xml" \
+        > "$flow_output/stdout.log" 2>&1


global.yaml config is never applied

The runner calls maestro test "$flow_file" directly without passing --config or referencing config/global.yaml. This means the onFlowStart: clearState directive defined in global.yaml never fires — each flow inherits the full app state from the previous one rather than starting from a known-clean state.

This has two consequences:

The suite only works if run in strict sequence starting from flow 01 (login). If any earlier flow is skipped or fails mid-way, subsequent flows may start in an unexpected state.

The config/global.yaml file is misleading — it documents an onFlowStart hook that is silently ignored at runtime.

To apply the global config, add --config "$MAESTRO_DIR/config/global.yaml" to the maestro test invocation:

maestro test "$flow_file" \ $DEVICE_FLAG \ --config "$MAESTRO_DIR/config/global.yaml" \ --format junit \ --output "$flow_output/report.xml" \ > "$flow_output/stdout.log" 2>&1

greptile-apps · 2026-04-02T22:26:07Z

app/.maestro/scripts/run_all.sh

+    if [[ -d "$HOME/.maestro/tests" ]]; then
+        cp "$HOME/.maestro/tests"/*.png "$flow_output/" 2>/dev/null || true
+    fi


Screenshot copy mixes artifacts across all flows

$HOME/.maestro/tests/*.png is the global Maestro screenshot directory shared by all test runs. Copying every *.png found there into each per-flow output directory means screenshots from earlier flows bleed into later flows' directories, making per-flow debugging confusing.

Restrict the copy to only screenshots generated after the flow started:

if [[ -d "$HOME/.maestro/tests" ]]; then find "$HOME/.maestro/tests" -name "*.png" \ -newer "$flow_output/report.xml" \ -exec cp {} "$flow_output/" \; 2>/dev/null || true fi

greptile-apps · 2026-04-02T22:26:11Z

app/.maestro/scripts/run_all.sh

+        [[ -f "$flow_file" ]] || continue
+
+        local flow_tags
+        flow_tags=$(grep -A5 "^tags:" "$flow_file" | grep "^ *- " | sed 's/^ *- //' || true)


grep -A5 tag extraction is brittle

grep -A5 "^tags:" only captures the 5 lines immediately following the tags: key. If a future flow has more than ~4 tags, the trailing ones are silently dropped and tag-based filtering will produce incorrect results.

A safer extraction reads until the YAML block ends:

flow_tags=$(awk '/^tags:/{p=1; next} p && /^ *- /{gsub(/^ *- /,""); print} p && !/^ *- /{p=0}' "$flow_file" || true)

…robust tag parsing - P1: Pass --config global.yaml to maestro test so onFlowStart hooks fire - P1: Use find -newer to only copy screenshots from current flow run - P2: Replace brittle grep -A5 tag extraction with awk block parser

…syntax - P1: Replace text: '.*' (matches anything) with actual response keywords - P1: Fix pressKey optional: true to use block-mapping syntax - Wait for loading indicators to disappear before asserting response

greptile-apps bot reviewed Apr 2, 2026

View reviewed changes

hahimniane added 3 commits April 2, 2026 19:30

fix: correct pressKey optional syntax in chat flow

02d524a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Maestro E2E functional test suite for Omi app#6283

feat: Maestro E2E functional test suite for Omi app#6283
hniane1 wants to merge 4 commits intoBasedHardware:mainfrom
hniane1:feat/functional-tests

hniane1 commented Apr 2, 2026

Uh oh!

greptile-apps bot commented Apr 2, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

greptile-apps bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hniane1 commented Apr 2, 2026

What

Why

Test Coverage

How to Use

Design Decisions

Files Changed

Uh oh!

greptile-apps bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Apr 2, 2026 •

edited

Loading