test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback by Jun-Hex · Pull Request #61 · taptap/agents-plugins

Jun-Hex · 2026-05-09T02:02:05Z

Summary

Replaces multi-perspective subagent design with single-agent + structured reasoning across requirement-review and requirement-clarification skills. Aligns with 2025 industry best practice: single agent + strong schema + grounded reasoning beats parallel subagents for requirement-style analysis where context fits one window.

Adds Feishu-friendly report formatting throughout the QA reporting skills: tables for structured data, P0/P1 priority, MeterSphere edit links, and checkbox-based reviewer feedback (有效/无效, 已修复/无需修复) where decisions need to be tracked in the doc itself.

Skills

requirement-review: drop multi-perspective branch (4.0/4.1a), promote 4.1b serial review to canonical path with mandatory 假设/反例搜索/结论 + original-text quoting; add 4.1.7 enum normalization (anti-collapse) and 4.1.8 multi-variant consistency (class-inheritance defense); §3 各职能问题列表改为 list+P0/P1+二选一 checkbox 反馈结构; add Closing Checklist Next Steps with handoffs to test-case-generation; add rr_summary.confidence (0-100) with reproducible formula
requirement-clarification: drop 3.2.1 multi-perspective branch; redesign single-agent confidence formula; align severity to Chinese 阻断/关注
test-case-review: §6 修复 to-do uses MS edit links + 已修复/无需修复 checkbox; rewrite todo lines to use action verbs (修步骤/补步骤/改预期 etc.) instead of internal dimension tags; convert §2 coverage matrix to tables
requirement-traceability: §4 缺陷清单 adds 有效 bug/无效 bug checkbox per defect with --- dividers; convert §3.1 coverage matrix and §3.2 code traceability to tables
change-analysis: convert §2.5 model fields, §4.1/§4.2 impact domains, §3 coverage to tables; add §6 cross-validation summary template with §6.4 综合结论 checkbox

Conventions

CONVENTIONS.md: new "飞书文档渲染规范" section as the single source of truth on table-vs-bullet selection, banned elements, and severity terminology; replaces the obsolete "no markdown table" rule that lived in each skill (Feishu import has supported tables for some time, verified end-to-end)
_shared/REQUIREMENT_DIMENSIONS.md: terminology mapping table aligns severity (阻断/关注 with P0/P1 alias for report.md), status (per-FP vs per-dimension), and verdict↔confidence mapping; deprecates blocking/concern English and the 4-tier 阻断/高/中/低
contracts/rr-summary.schema.json: confidence required (0-100); blocking_issues required (must be explicit empty array if none)

Schemas / tests

contracts/rr-summary.schema.json fixture in tests/check-schemas.sh updated to include confidence and out-of-range/missing rejection cases
All 30+ checks in tests/validate.sh pass

Removed

agents/requirement-understanding/{user,functional,exception}-perspective.md (the three perspective subagent definitions are no longer referenced; consumers were requirement-review and requirement-clarification, both refactored to single-agent)

Versions

plugins/test 0.0.9 → 0.0.16 (cumulative across multiple iterations)
marketplace.json metadata 0.1.41 → 0.1.48

Test plan

bash tests/validate.sh passes
Feishu rendering verified end-to-end (tables, nested checkboxes, MS links, --- dividers all render correctly in import)
Reviewer to spot-check rendered samples linked in commit comments

…t checkbox feedback Replaces multi-perspective subagent design with single-agent + structured reasoning across requirement-review and requirement-clarification skills. Aligns with 2025 industry best practice: single agent + strong schema + grounded reasoning beats parallel subagents for requirement-style analysis where context fits one window. Skills - requirement-review: drop multi-perspective branch (4.0/4.1a), promote 4.1b serial review to canonical path with mandatory 假设/反例搜索/结论 + original-text quoting; add 4.1.7 enum normalization (anti-collapse) and 4.1.8 multi-variant consistency (class-inheritance defense); add P0/P1 priority + checkbox-list per-role issues; add Closing Checklist Next Steps with handoffs to test-case-generation; add rr-summary confidence field (0-100) with reproducible formula - requirement-clarification: drop 3.2.1 multi-perspective branch; redesign single agent confidence formula; align severity to Chinese 阻断/关注 - test-case-review: §6 修复 to-do uses MS edit links + 已修复/无需修复 checkbox; rewrite todo lines to use action verbs (修步骤/补步骤/改预期 etc.) instead of internal dimension tags; convert §2 coverage matrix to tables - requirement-traceability: §4 缺陷清单 adds 有效 bug/无效 bug checkbox per defect with `---` dividers; convert §3.1 coverage matrix and §3.2 code traceability to tables - change-analysis: convert §2.5 model fields, §4.1/§4.2 impact domains, §3 coverage to tables; add §6 cross-validation summary template with §6.4 综合结论 checkbox Conventions - CONVENTIONS.md: new "飞书文档渲染规范" section as the single source of truth on table-vs-bullet selection, banned elements, and severity terminology; replaces the obsolete "no markdown table" rule that lived in each skill (Feishu import has supported tables for some time, verified end-to-end) - _shared/REQUIREMENT_DIMENSIONS.md: terminology mapping table aligns severity (阻断/关注 with P0/P1 alias for report.md), status (per-FP vs per-dimension), and verdict↔confidence mapping; deprecates blocking/concern English and the 4-tier 阻断/高/中/低 - contracts/rr-summary.schema.json: confidence required (0-100); blocking_issues required (must be explicit empty array if none) Schemas / tests - contracts/rr-summary.schema.json fixture in tests/check-schemas.sh updated to include confidence and out-of-range/missing rejection cases Removed - agents/requirement-understanding/{user,functional,exception}-perspective.md (the three perspective subagent definitions are no longer referenced) Versions - plugins/test 0.0.9 → 0.0.16 (cumulative across multiple iterations) - marketplace.json metadata 0.1.41 → 0.1.48

Jun-Hex · 2026-05-09T02:20:43Z

Automated Code Review

概要：删除 3 个 perspective 子代理，requirement-review/requirement-clarification 转单 Agent 强推理（4.1.7 反 collapse + 4.1.8 多变体一致性）；rr_summary 新增 confidence 0-100 公式 + 硬规则；test-case-review/requirement-traceability/change-analysis 模板表格化 + checkbox 反馈；新增 CONVENTIONS.md「飞书文档渲染规范」；版本 0.0.9→0.0.16，marketplace 0.1.41→0.1.48。

检查清单

类别	状态	备注
安全性	✅	无敏感信息变更
逻辑正确性	⚠️	verdict OR 规则歧义；CONVENTIONS vs PHASES 单问/多问不一致
资源管理	➖	不涉及
API 与兼容性	❌	dead reference + schema breaking change（Claude+Codex 双确认）
代码质量	⚠️	文档准确性偏差
📋 项目规则	✅	三处 plugin version 同步至 0.0.16；marketplace metadata 0.1.48；codex-plugin interface 完整；SKILL.md frontmatter `name` 全部匹配目录名

说明：✅ 通过 | ⚠️ 警告 | ❌ 未通过 | ➖ 不适用

🚫 阻塞问题

plugins/test/skills/test-case-generation/PHASES.md:185-216 + plugins/test/skills/_shared/AGENT_PROTOCOL.md:16-19,47 — dead reference：删除了 agents/requirement-understanding/ 整个目录但 test-case-generation 仍引用 [Claude+Codex 双确认]
- PHASES §2.4「多视角并行分析」完整保留 agents/requirement-understanding/functional-perspective.md 等 3 个子代理的 Read 路径与 Task 调用指令
- SKILL.md:29,81 features 列表与 model 表仍描述"多视角并行分析"
- _shared/AGENT_PROTOCOL.md 目录结构示意未同步更新
- 运行时影响：test-case-generation 阶段 2.4 的 Task 调用会读取不存在的文件，phase 跑不动 — PR 直接引入回归
- 建议：必须二选一并落地：(a) 同步把 test-case-generation §2.4 改为单 Agent 强推理；(b) 把 perspective agents 目录回退保留，仅 review/clarification 改为不再引用。同时同步 AGENT_PROTOCOL.md
plugins/test/contracts/rr-summary.schema.json:6 — confidence + blocking_issues 改 required 是 schema breaking change，无迁移路径 [Claude+Codex 双确认]
- additionalProperties: true 不保护缺失必填键；旧 rr_summary.json 产物（PR 之前生成的）解析直接报错
- PR 描述未声明这是 schema breaking change；ai-case 后端在 post_complete 阶段对历史 session 重放或前端从历史卡片解析会触发 ValidationError
- 建议：(a) schema description 显式注明该字段从 v1.1+ required 起强制；(b) PR 描述/CHANGELOG 明确 breaking change，需 ai-case 后端做兼容（缺字段填默认值或视为 legacy）；(c) 评估是否提供 migration script

⚠️ 警告

plugins/test/skills/requirement-review/contract.yaml:30 — rr_summary.json description 漏写新增 required 字段 confidence
- description 仍写「verdict / issue_count / blocking_issues」；contract.yaml 是 ai-case 平台读取并据此生成 schema doc 与 contract bridge 的源
- 建议：description 改为「verdict / issue_count / confidence / blocking_issues」并把 version 从 1.1 bump 至 1.2（新增 required field 属 breaking 语义）
plugins/test/skills/requirement-review/PHASES.md:417-420 — verdict 硬规则使用 OR 逻辑，存在多 verdict 同时命中的歧义
- 反例：confidence=50, 阻断项=6 时，ready_with_conditions（confidence ∈ [50,79]）和 not_ready（阻断>5）都命中，AI 无规则裁决导致输出不稳定
- 建议：定义优先级（先判 not_ready → ready → 兜底 ready_with_conditions），或重写为互斥三段：ready: ≥80 且=0 / ready_with_conditions: <80 且阻断∈[1,5] / not_ready: 阻断>5 或 confidence<50。同步更新 _shared/REQUIREMENT_DIMENSIONS.md:64-70
plugins/test/CONVENTIONS.md:220,233,238 vs requirement-review/PHASES.md:323 + requirement-clarification/PHASES.md:374 — CONVENTIONS 与 PHASES 单问/多问规则不一致
- CONVENTIONS 仍规定 1-4 个问题；review/clarification 已强制每次 1 个（CRITICAL，避免序列化抖动）
- 后续基于 CONVENTIONS 实现的新 skill 会复现"序列化抖动"问题
- 建议：CONVENTIONS 增补「需求侧 skill（review/clarification）每次仅发 1 个问题」例外条款，或评估升级为全局约束

💡 建议

plugins/test/skills/requirement-review/PHASES.md:28 — "对齐 clarification 4 模式命名"但 review 实际只列 3 个模式（文档/设计稿/描述）；改文案 4 → 3 或补「文档+设计稿联合模式」
plugins/test/skills/requirement-review/SKILL.md:163 — output checklist 4 字段 vs contract.yaml description 3 字段，与上方阻塞项 docs: 完善 README 使用说明文档 #2 一并修
plugins/test/skills/_shared/REQUIREMENT_DIMENSIONS.md:64-70 — verdict 表格 OR 表述与 PHASES 同根源问题，两处一并修
plugins/test/README.md:370 — README 版本历史段未见 v0.0.16 条目，建议补一条说明 single-agent 改造 + Feishu checkbox 反馈
PR 描述显式标注 "Reviewer to spot-check rendered samples" 未完成 — 请附飞书文档渲染截图/链接到 PR comment（特别是 §3 嵌套 checkbox、§5 风险表格在飞书 import 后的实际效果）

状态：⚠️ 待处理（2 个阻塞问题 / 3 个警告）

Powered by Claude Code (Opus 4.7) + Codex CLI · 双视角交叉验证
回复 忽略: [问题关键词] 可标记为已忽略，re-review 时自动识别。已修复的问题会自动检测。

…ing + cumulative changelog - requirement-review/PHASES.md verdict hard rule: replace OR-joined three-tier with priority chain (hit-and-stop) so verdict is unique under any (confidence, blocking) combination; previously confidence=50 blocking=6 would land in both not_ready and ready_with_conditions - _shared/REQUIREMENT_DIMENSIONS.md: sync correspondence table to priority order - requirement-review/PHASES.md mode detection: drop misleading "align with clarification 4 modes" wording (review only has 3, no exploratory mode since Story context is required) - README: backfill cumulative v0.0.10–v0.0.16 changelog + add v0.0.17 entry - bump plugins/test 0.0.16 → 0.0.17 across plugin.json (×2) and marketplace entry; bump marketplace metadata 0.1.48 → 0.1.49 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reset versions from in-development 0.1.49 / 0.0.17 to production+1 (0.1.42 / 0.0.10) so the seven internal patch bumps (0.0.10-0.0.16 plus the 0.0.17 review fixup) collapse into a single release entry. - Bump plugins/test 0.0.17 -> 0.0.10 across plugin.json (Claude + Codex) and marketplace test entry - Bump marketplace metadata 0.1.49 -> 0.1.42 - CHANGELOG.md: insert 0.1.42 section consolidating single-agent rewrite, Feishu checkbox feedback, conventions/shared/contracts changes - plugins/test/README.md: replace the two interim 0.0.17 + cumulative entries with a single v0.0.10 entry - README.md / README.en.md: update test row 0.0.9 -> 0.0.10 tests/validate.sh passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ence completeness + ui-fidelity/api-contract consolidation GameJam漏报 root-cause fix (input_quality + supplementary cases routing) plus output-spec evidence completeness and architectural consolidation. Smoke-test honest verdict (cumulative across v0.0.12 → 0.0.14): - New PHASES §1.3.d input_quality check (full/medium/low) as single source of truth for all degradation behavior - §3.1 priority 1.5 tier consumes change_supplementary_cases.json (semantic match on title+steps+preconditions, NOT module reverse-lookup) - §4.6 fallback synthesis only when input_quality=low; non-low + empty fv → STOP (no more silent fallback masking) - §5S.2 verdict expanded from binary to 5 tiers - §5S.1 supplementary defects inherit ca priority directly Output-spec evidence completeness (§3.2.0a): - A (data-flow closure) + B (cross-boundary recording) + D (path-driven failure modes) mechanically validated by validate-fv - C (per-expected reconciliation) is honest model self-check, NOT in validator (mechanical keyword match would create false safety) - New cross_component_break defect source (5S.1 source 7) - forward_verification.schema: requirement_id pattern supports FP-N | FP-UNMAPPED-N | R-N; case_source enum added - smoke-test-report.schema: verdict 5-tier + input_quality + verification_channel as required fields - defect-list.schema: P3 added for supplementary inheritance ui-fidelity-check skill consolidated: - Standalone skill removed; check exclusively triggered inside requirement-traceability §3.4 via shared ui-fidelity-checker agent (Browser MCP dropped, structural-only) api-contract-validation kept + agent extracted: - New shared agents/api-contract-validator.md as stateless calc unit - requirement-traceability §3.2.5 launches same agent (no more lite-inline + upstream-first split-brain) Documentation cleanup: - README ↔ AI_CODING_BEST_PRACTICES dedup (README=catalog, BP=SOP) - metersphere-sync env var docs corrected (no fake "zero-config" claim) - All "rt"/"ca" abbreviations expanded to full skill names - "反查" misleading term replaced with "语义匹配" / "推断" Versioning: test plugin 0.0.10 → 0.0.14, marketplace 0.1.42 → 0.1.46 Verification: - bash tests/validate.sh all-green (incl. new Check N+2 A/B/D regression) - Manual replay against session 311+312 artifacts confirmed TC-11 P0 supplementary case now reaches defect_list (missing in 312 baseline) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI was failing because metersphere_helper.py module-level requires MS_ACCESS_KEY / MS_SECRET_KEY (for skills calling MS API), but the new Check N+2 only invokes validate-fv which is a pure local schema check that never touches MS. Without these env vars set in CI, the helper exits 1 at module load → validate-fv never runs → grep finds no completeness/A|B|D markers → fail. Quick fix: export ci_dummy values around the Check N+2 block. The deeper fix (move the env precondition from module-level to the actual MS-bound subcommands) is left as a follow-up since it touches every caller of metersphere_helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The module-level MS_ACCESS_KEY / MS_SECRET_KEY check at import time caused validate-fv (a pure local schema-validation subcommand) to fail in CI, where MS credentials are intentionally absent. The ci_dummy workaround in validate.sh (commit 8540307) treated the symptom; this commit fixes the root cause. - _check_ms_credentials(cmd) runs at main() dispatch time, before any subcommand handler. validate-fv (and any future local-only subcommand added to _LOCAL_ONLY_COMMANDS) skips the check; everything else fails early with the same precondition_failed payload as before. - Whitelist is "local only" rather than "MS only" so future MS-bound subcommands inherit the check by default — safer than the inverse. - Roll back the validate.sh dummy export now that the helper does the right thing without it. - Docstring updated to note the new conditional behavior. Verified locally: with .env hidden + MS_* unset, ping fails as expected ("missing required environment variables") and validate.sh Check N+2 still passes end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reset versions to production+1 per /prepare-release SOP. Collapses the dev-iteration changelog entries (0.1.46/0.1.45/0.1.44/0.1.43 and test plugin v0.0.14/0.0.13/0.0.12/0.0.11) into a single 0.1.42 / v0.0.10 release entry covering all 6 themes shipped in this PR: 1. requirement-review/clarification single-agent strong-reasoning rewrite 2. Feishu report checkbox feedback structures across 4 TEMPLATES 3. AI_CODING_BEST_PRACTICES.md engineer-perspective rewrite + team-internal content moved to Feishu + README↔BP dedup 4. MeterSphere config "zero-config" claim fixed; metersphere_helper lazy MS env credential check (validate-fv works in CI without MS creds) 5. Smoke-test honest verdict overhaul (input_quality routing, GameJam class fix) + A/B/D evidence completeness validator + Check N+2 6. ui-fidelity-check skill consolidated into traceability §3.4; api-contract-validator agent extracted as shared calc unit between the standalone skill and traceability §3.2.5 Verified: bash tests/validate.sh all-green (incl. Check N+2 A/B/D regression) under env -u MS_ACCESS_KEY -u MS_SECRET_KEY (CI sim). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jun-Hex requested a review from em0t as a code owner May 9, 2026 02:02

github-actions Bot added scope:tests Changes tests or validation coverage. scope:plugins Changes one or more plugins. size:xl Very large PR, expect slower review. labels May 9, 2026

Jun-Hex and others added 2 commits May 9, 2026 10:50

github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026

github-actions Bot added scope:docs Changes documentation or contributor guidance. maintainer-review High-risk change that should be reviewed by a repository maintainer. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026

github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback#61

test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback#61
Jun-Hex wants to merge 7 commits intomainfrom
feat/feishu-report-checkbox-feedback

Jun-Hex commented May 9, 2026

Uh oh!

Jun-Hex commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jun-Hex commented May 9, 2026

Summary

Skills

Conventions

Schemas / tests

Removed

Versions

Test plan

Uh oh!

Jun-Hex commented May 9, 2026

Automated Code Review

检查清单

🚫 阻塞问题

⚠️ 警告

💡 建议

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant