test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback#61
test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback#61
Conversation
…t checkbox feedback
Replaces multi-perspective subagent design with single-agent + structured reasoning
across requirement-review and requirement-clarification skills. Aligns with 2025
industry best practice: single agent + strong schema + grounded reasoning beats
parallel subagents for requirement-style analysis where context fits one window.
Skills
- requirement-review: drop multi-perspective branch (4.0/4.1a), promote 4.1b serial
review to canonical path with mandatory 假设/反例搜索/结论 + original-text quoting;
add 4.1.7 enum normalization (anti-collapse) and 4.1.8 multi-variant consistency
(class-inheritance defense); add P0/P1 priority + checkbox-list per-role issues;
add Closing Checklist Next Steps with handoffs to test-case-generation; add
rr-summary confidence field (0-100) with reproducible formula
- requirement-clarification: drop 3.2.1 multi-perspective branch; redesign single
agent confidence formula; align severity to Chinese 阻断/关注
- test-case-review: §6 修复 to-do uses MS edit links + 已修复/无需修复 checkbox;
rewrite todo lines to use action verbs (修步骤/补步骤/改预期 etc.) instead of
internal dimension tags; convert §2 coverage matrix to tables
- requirement-traceability: §4 缺陷清单 adds 有效 bug/无效 bug checkbox per defect
with `---` dividers; convert §3.1 coverage matrix and §3.2 code traceability to
tables
- change-analysis: convert §2.5 model fields, §4.1/§4.2 impact domains, §3 coverage
to tables; add §6 cross-validation summary template with §6.4 综合结论 checkbox
Conventions
- CONVENTIONS.md: new "飞书文档渲染规范" section as the single source of truth on
table-vs-bullet selection, banned elements, and severity terminology; replaces
the obsolete "no markdown table" rule that lived in each skill (Feishu import
has supported tables for some time, verified end-to-end)
- _shared/REQUIREMENT_DIMENSIONS.md: terminology mapping table aligns severity
(阻断/关注 with P0/P1 alias for report.md), status (per-FP vs per-dimension),
and verdict↔confidence mapping; deprecates blocking/concern English and the
4-tier 阻断/高/中/低
- contracts/rr-summary.schema.json: confidence required (0-100); blocking_issues
required (must be explicit empty array if none)
Schemas / tests
- contracts/rr-summary.schema.json fixture in tests/check-schemas.sh updated to
include confidence and out-of-range/missing rejection cases
Removed
- agents/requirement-understanding/{user,functional,exception}-perspective.md
(the three perspective subagent definitions are no longer referenced)
Versions
- plugins/test 0.0.9 → 0.0.16 (cumulative across multiple iterations)
- marketplace.json metadata 0.1.41 → 0.1.48
Automated Code Review概要:删除 3 个 perspective 子代理, 检查清单
说明:✅ 通过 | 🚫 阻塞问题
|
…ing + cumulative changelog - requirement-review/PHASES.md verdict hard rule: replace OR-joined three-tier with priority chain (hit-and-stop) so verdict is unique under any (confidence, blocking) combination; previously confidence=50 blocking=6 would land in both not_ready and ready_with_conditions - _shared/REQUIREMENT_DIMENSIONS.md: sync correspondence table to priority order - requirement-review/PHASES.md mode detection: drop misleading "align with clarification 4 modes" wording (review only has 3, no exploratory mode since Story context is required) - README: backfill cumulative v0.0.10–v0.0.16 changelog + add v0.0.17 entry - bump plugins/test 0.0.16 → 0.0.17 across plugin.json (×2) and marketplace entry; bump marketplace metadata 0.1.48 → 0.1.49 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reset versions from in-development 0.1.49 / 0.0.17 to production+1 (0.1.42 / 0.0.10) so the seven internal patch bumps (0.0.10-0.0.16 plus the 0.0.17 review fixup) collapse into a single release entry. - Bump plugins/test 0.0.17 -> 0.0.10 across plugin.json (Claude + Codex) and marketplace test entry - Bump marketplace metadata 0.1.49 -> 0.1.42 - CHANGELOG.md: insert 0.1.42 section consolidating single-agent rewrite, Feishu checkbox feedback, conventions/shared/contracts changes - plugins/test/README.md: replace the two interim 0.0.17 + cumulative entries with a single v0.0.10 entry - README.md / README.en.md: update test row 0.0.9 -> 0.0.10 tests/validate.sh passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ence completeness + ui-fidelity/api-contract consolidation GameJam漏报 root-cause fix (input_quality + supplementary cases routing) plus output-spec evidence completeness and architectural consolidation. Smoke-test honest verdict (cumulative across v0.0.12 → 0.0.14): - New PHASES §1.3.d input_quality check (full/medium/low) as single source of truth for all degradation behavior - §3.1 priority 1.5 tier consumes change_supplementary_cases.json (semantic match on title+steps+preconditions, NOT module reverse-lookup) - §4.6 fallback synthesis only when input_quality=low; non-low + empty fv → STOP (no more silent fallback masking) - §5S.2 verdict expanded from binary to 5 tiers - §5S.1 supplementary defects inherit ca priority directly Output-spec evidence completeness (§3.2.0a): - A (data-flow closure) + B (cross-boundary recording) + D (path-driven failure modes) mechanically validated by validate-fv - C (per-expected reconciliation) is honest model self-check, NOT in validator (mechanical keyword match would create false safety) - New cross_component_break defect source (5S.1 source 7) - forward_verification.schema: requirement_id pattern supports FP-N | FP-UNMAPPED-N | R-N; case_source enum added - smoke-test-report.schema: verdict 5-tier + input_quality + verification_channel as required fields - defect-list.schema: P3 added for supplementary inheritance ui-fidelity-check skill consolidated: - Standalone skill removed; check exclusively triggered inside requirement-traceability §3.4 via shared ui-fidelity-checker agent (Browser MCP dropped, structural-only) api-contract-validation kept + agent extracted: - New shared agents/api-contract-validator.md as stateless calc unit - requirement-traceability §3.2.5 launches same agent (no more lite-inline + upstream-first split-brain) Documentation cleanup: - README ↔ AI_CODING_BEST_PRACTICES dedup (README=catalog, BP=SOP) - metersphere-sync env var docs corrected (no fake "zero-config" claim) - All "rt"/"ca" abbreviations expanded to full skill names - "反查" misleading term replaced with "语义匹配" / "推断" Versioning: test plugin 0.0.10 → 0.0.14, marketplace 0.1.42 → 0.1.46 Verification: - bash tests/validate.sh all-green (incl. new Check N+2 A/B/D regression) - Manual replay against session 311+312 artifacts confirmed TC-11 P0 supplementary case now reaches defect_list (missing in 312 baseline) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI was failing because metersphere_helper.py module-level requires MS_ACCESS_KEY / MS_SECRET_KEY (for skills calling MS API), but the new Check N+2 only invokes validate-fv which is a pure local schema check that never touches MS. Without these env vars set in CI, the helper exits 1 at module load → validate-fv never runs → grep finds no completeness/A|B|D markers → fail. Quick fix: export ci_dummy values around the Check N+2 block. The deeper fix (move the env precondition from module-level to the actual MS-bound subcommands) is left as a follow-up since it touches every caller of metersphere_helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The module-level MS_ACCESS_KEY / MS_SECRET_KEY check at import time caused validate-fv (a pure local schema-validation subcommand) to fail in CI, where MS credentials are intentionally absent. The ci_dummy workaround in validate.sh (commit 8540307) treated the symptom; this commit fixes the root cause. - _check_ms_credentials(cmd) runs at main() dispatch time, before any subcommand handler. validate-fv (and any future local-only subcommand added to _LOCAL_ONLY_COMMANDS) skips the check; everything else fails early with the same precondition_failed payload as before. - Whitelist is "local only" rather than "MS only" so future MS-bound subcommands inherit the check by default — safer than the inverse. - Roll back the validate.sh dummy export now that the helper does the right thing without it. - Docstring updated to note the new conditional behavior. Verified locally: with .env hidden + MS_* unset, ping fails as expected ("missing required environment variables") and validate.sh Check N+2 still passes end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reset versions to production+1 per /prepare-release SOP. Collapses the dev-iteration changelog entries (0.1.46/0.1.45/0.1.44/0.1.43 and test plugin v0.0.14/0.0.13/0.0.12/0.0.11) into a single 0.1.42 / v0.0.10 release entry covering all 6 themes shipped in this PR: 1. requirement-review/clarification single-agent strong-reasoning rewrite 2. Feishu report checkbox feedback structures across 4 TEMPLATES 3. AI_CODING_BEST_PRACTICES.md engineer-perspective rewrite + team-internal content moved to Feishu + README↔BP dedup 4. MeterSphere config "zero-config" claim fixed; metersphere_helper lazy MS env credential check (validate-fv works in CI without MS creds) 5. Smoke-test honest verdict overhaul (input_quality routing, GameJam class fix) + A/B/D evidence completeness validator + Check N+2 6. ui-fidelity-check skill consolidated into traceability §3.4; api-contract-validator agent extracted as shared calc unit between the standalone skill and traceability §3.2.5 Verified: bash tests/validate.sh all-green (incl. Check N+2 A/B/D regression) under env -u MS_ACCESS_KEY -u MS_SECRET_KEY (CI sim). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Replaces multi-perspective subagent design with single-agent + structured reasoning across
requirement-reviewandrequirement-clarificationskills. Aligns with 2025 industry best practice: single agent + strong schema + grounded reasoning beats parallel subagents for requirement-style analysis where context fits one window.Adds Feishu-friendly report formatting throughout the QA reporting skills: tables for structured data, P0/P1 priority, MeterSphere edit links, and checkbox-based reviewer feedback (有效/无效, 已修复/无需修复) where decisions need to be tracked in the doc itself.
Skills
rr_summary.confidence(0-100) with reproducible formula---dividers; convert §3.1 coverage matrix and §3.2 code traceability to tablesConventions
Schemas / tests
tests/check-schemas.shupdated to include confidence and out-of-range/missing rejection casestests/validate.shpassRemoved
agents/requirement-understanding/{user,functional,exception}-perspective.md(the three perspective subagent definitions are no longer referenced; consumers wererequirement-reviewandrequirement-clarification, both refactored to single-agent)Versions
plugins/test0.0.9 → 0.0.16 (cumulative across multiple iterations)marketplace.jsonmetadata 0.1.41 → 0.1.48Test plan
bash tests/validate.shpasses---dividers all render correctly in import)