Skip to content

feat: Maestro E2E functional test suite for Omi app#6283

Open
hniane1 wants to merge 4 commits intoBasedHardware:mainfrom
hniane1:feat/functional-tests
Open

feat: Maestro E2E functional test suite for Omi app#6283
hniane1 wants to merge 4 commits intoBasedHardware:mainfrom
hniane1:feat/functional-tests

Conversation

@hniane1
Copy link
Copy Markdown

@hniane1 hniane1 commented Apr 2, 2026

What

Adds a complete Maestro E2E functional test suite for the Omi Flutter mobile app. Closes #3857.

Why

The Omi app needs automated functional tests to detect regressions in core user flows — sign-in, conversations, recording, chat, etc. This suite lets a maintainer run tests on their machine with a single command and get a clear pass/fail report.

Test Coverage

# Flow Tags What It Tests
01 Login core, auth Google sign-in through consent sheet
02 Onboarding core, onboarding Name entry, permissions, device skip
03 Conversations core, conversations List, detail, Summary/Transcript/Action Items tabs, folder filters, three-dot menu
04 Memories core, memories View, create, edit, delete
05 Chat core, chat Send message, verify AI response
06 Apps/Plugins core, apps Browse marketplace, view app detail
07 Settings core, settings Navigate settings, verify key sections
08 Device Connection device_required BLE scan and pair Omi hardware
09 Recording device_required Audio capture and transcription verification
10 Logout core, auth Sign out and verify return to auth screen

How to Use

# Install Maestro
brew install maestro

# Run core tests (no Omi device needed)
bash app/.maestro/scripts/run_all.sh --tags core

# Run ALL tests (Omi device nearby + phone connected)
bash app/.maestro/scripts/run_all.sh --tags all

# Or via unified test.sh
bash app/scripts/test.sh --e2e --tags core

After the run, check app/.maestro/reports/report.md for the full Markdown report with:

  • Summary table (passed/failed/skipped)
  • Per-flow timing
  • Console output for debugging failures

Design Decisions

  • Tag system: core flows run on simulator/emulator (no physical Omi device needed). device_required flows need Omi hardware nearby.
  • DRY structure: Each flow is self-contained with no code duplication. Shared config in config/global.yaml.
  • Runner script: Tag-based filtering, JUnit XML output, Markdown report generation, configurable device target and output directory.
  • Integrated entry point: app/scripts/test.sh --e2e for Maestro, default (no flags) for unit/widget tests.

Files Changed

  • app/.maestro/ — 10 Maestro flow files, runner script, config, README
  • app/scripts/test.sh — Unified test runner
  • app/.gitignore — Exclude generated reports

…e#3857)

- 10 Maestro flow files covering all core app functionality:
  - Login & Sign-in (Google OAuth)
  - Onboarding (name, permissions, device skip)
  - Conversations (list, detail, tabs, CRUD, folder filters)
  - Memories (view, create, edit, delete)
  - Chat (send message, AI response)
  - Apps/Plugins (marketplace browse, detail)
  - Settings (navigation, key sections)
  - Device Connection (BLE scan & pair) [device_required]
  - Recording & Transcription [device_required]
  - Logout (sign out, verify auth screen)

- Tag system: 'core' (simulator-safe) vs 'device_required' (needs Omi HW)
- Runner script with tag filtering, parallel-safe output, JUnit XML + Markdown reports
- Unified test.sh entry point: --e2e flag for Maestro, default for unit/widget
- Clean, DRY flow structure — no code duplication across flows
- .gitignore updated to exclude generated reports
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR introduces a Maestro E2E functional test suite covering 10 core user flows (login, onboarding, conversations, memories, chat, apps, settings, device connection, recording, and logout) along with a Bash runner script and a unified app/scripts/test.sh entry point. The structure is well-organised and the tag-based filtering for core vs device_required flows is a sensible design.

There are several correctness issues that should be addressed before the suite is relied upon in CI:

  • global.yaml is never applied. run_all.sh calls maestro test <flow> without --config, so the onFlowStart: clearState directive is silently ignored. Flows are implicitly chained by shared app state rather than starting from a known baseline.
  • Chat response assertion is a no-op. extendedWaitUntil: visible: text: \".*\" matches any text already on screen, so the wait resolves immediately and does not verify that an AI response actually arrived.
  • Screenshot artifact mixing. The post-flow cp \"$HOME/.maestro/tests\"/*.png glob copies every screenshot ever produced by Maestro into the current flow's output directory, contaminating per-flow reports.
  • Tag extraction is capped at 5 lines (grep -A5), which will silently miss tags if a flow ever has more than ~4 of them.
  • pressKey: \"Enter\"\ optional: true uses unsupported sibling-key syntax for a scalar Maestro command and may not behave as intended.

Confidence Score: 2/5

Not ready to merge — the test suite has multiple logic bugs that cause it to not correctly verify app behaviour or maintain clean test state.

Three P1 bugs (global config never applied, meaningless chat assertion, screenshot cross-contamination) mean the suite would give false confidence when run. The flows themselves are well-structured and the runner script is a solid foundation, but these issues need to be fixed before the suite can be trusted.

app/.maestro/scripts/run_all.sh (runner bugs) and app/.maestro/flows/05_chat.yaml (assertion and syntax issues)

Important Files Changed

Filename Overview
app/.maestro/scripts/run_all.sh Core runner script with three bugs: global.yaml config is never passed to maestro test (clearState never fires), screenshot copy uses a global glob that mixes artifacts across flows, and tag extraction uses grep -A5 which silently truncates tags beyond 5 lines.
app/.maestro/flows/05_chat.yaml AI response verification uses text: ".*" which matches anything visible, making the wait a no-op; also contains pressKey with an unsupported optional: true sibling-key syntax.
app/.maestro/config/global.yaml Defines onFlowStart: clearState but the runner never passes this config file to maestro test, so the directive is silently ignored.
app/.maestro/flows/01_login.yaml Login flow is well-structured with optional Google account picker handling; final wait for landing screen is non-optional which is correct for a login test.
app/.maestro/flows/09_recording.yaml Recording flow uses a hardcoded 30-second wait (wait: 30000) which adds latency but is unavoidable for audio capture; correctly tagged as device_required.
app/scripts/test.sh Clean unified test runner; correctly delegates to run_all.sh for E2E and flutter test for unit tests.
app/.gitignore Correctly excludes generated .maestro/reports/ directory from version control.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["bash test.sh --e2e --tags core"] --> B["run_all.sh"]
    B --> C["collect_flows(tags)"]
    C -->|"grep -A5 tags:"| D["Flow file list"]
    D --> E{For each flow}
    E --> F["maestro test flow.yaml\n--format junit\n--output report.xml"]
    F -->|"exit 0"| G["PASS ✅"]
    F -->|"exit ≠ 0"| H["FAIL ❌"]
    G --> I["cp ~/.maestro/tests/*.png\nflow_output/\n⚠️ copies ALL screenshots"]
    H --> I
    I --> E
    E -->|done| J["Generate report.md"]
    J --> K[Exit 0 / 1]

    subgraph "Never Used"
        L["config/global.yaml\nonFlowStart: clearState\n❌ not passed via --config"]
    end
Loading

Comments Outside Diff (2)

  1. app/.maestro/flows/05_chat.yaml, line 454-458 (link)

    P1 AI response wait is a meaningless assertion

    text: ".*" matches any visible text on screen, so extendedWaitUntil here passes immediately regardless of whether an AI response has appeared. This makes the chat response verification effectively a no-op.

    To actually verify that a response was received, wait for something that could only appear after the AI responds:

    - extendedWaitUntil:
        notVisible:
          text: "Thinking|Loading|…"
        timeout: 30000
    - extendedWaitUntil:
        visible:
          text: "recently|Here's|Based on"
        timeout: 5000
  2. app/.maestro/flows/05_chat.yaml, line 449-451 (link)

    P1 optional: true on scalar pressKey command is unsupported syntax

    Maestro supports optional: true as a nested key only when the command uses block-mapping syntax. For scalar commands written as pressKey: "Enter", adding optional: true as a sibling key creates {pressKey: "Enter", optional: true}, which Maestro does not recognise.

    Change to block form:

Reviews (1): Last reviewed commit: "feat: add Maestro E2E functional test su..." | Re-trigger Greptile

Comment on lines +162 to +166
maestro test "$flow_file" \
$DEVICE_FLAG \
--format junit \
--output "$flow_output/report.xml" \
> "$flow_output/stdout.log" 2>&1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 global.yaml config is never applied

The runner calls maestro test "$flow_file" directly without passing --config or referencing config/global.yaml. This means the onFlowStart: clearState directive defined in global.yaml never fires — each flow inherits the full app state from the previous one rather than starting from a known-clean state.

This has two consequences:

  1. The suite only works if run in strict sequence starting from flow 01 (login). If any earlier flow is skipped or fails mid-way, subsequent flows may start in an unexpected state.
  2. The config/global.yaml file is misleading — it documents an onFlowStart hook that is silently ignored at runtime.

To apply the global config, add --config "$MAESTRO_DIR/config/global.yaml" to the maestro test invocation:

maestro test "$flow_file" \
    $DEVICE_FLAG \
    --config "$MAESTRO_DIR/config/global.yaml" \
    --format junit \
    --output "$flow_output/report.xml" \
    > "$flow_output/stdout.log" 2>&1

Comment on lines +183 to +185
if [[ -d "$HOME/.maestro/tests" ]]; then
cp "$HOME/.maestro/tests"/*.png "$flow_output/" 2>/dev/null || true
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Screenshot copy mixes artifacts across all flows

$HOME/.maestro/tests/*.png is the global Maestro screenshot directory shared by all test runs. Copying every *.png found there into each per-flow output directory means screenshots from earlier flows bleed into later flows' directories, making per-flow debugging confusing.

Restrict the copy to only screenshots generated after the flow started:

if [[ -d "$HOME/.maestro/tests" ]]; then
    find "$HOME/.maestro/tests" -name "*.png" \
        -newer "$flow_output/report.xml" \
        -exec cp {} "$flow_output/" \; 2>/dev/null || true
fi

[[ -f "$flow_file" ]] || continue

local flow_tags
flow_tags=$(grep -A5 "^tags:" "$flow_file" | grep "^ *- " | sed 's/^ *- //' || true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 grep -A5 tag extraction is brittle

grep -A5 "^tags:" only captures the 5 lines immediately following the tags: key. If a future flow has more than ~4 tags, the trailing ones are silently dropped and tag-based filtering will produce incorrect results.

A safer extraction reads until the YAML block ends:

flow_tags=$(awk '/^tags:/{p=1; next} p && /^ *- /{gsub(/^ *- /,""); print} p && !/^ *- /{p=0}' "$flow_file" || true)

…robust tag parsing

- P1: Pass --config global.yaml to maestro test so onFlowStart hooks fire
- P1: Use find -newer to only copy screenshots from current flow run
- P2: Replace brittle grep -A5 tag extraction with awk block parser
…syntax

- P1: Replace text: '.*' (matches anything) with actual response keywords
- P1: Fix pressKey optional: true to use block-mapping syntax
- Wait for loading indicators to disappear before asserting response
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

omi mobile app functional tests ($300)

2 participants