Skip to content

test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback#61

Open
Jun-Hex wants to merge 7 commits intomainfrom
feat/feishu-report-checkbox-feedback
Open

test plugin: review/clarification single-agent rewrite + Feishu report checkbox feedback#61
Jun-Hex wants to merge 7 commits intomainfrom
feat/feishu-report-checkbox-feedback

Conversation

@Jun-Hex
Copy link
Copy Markdown
Collaborator

@Jun-Hex Jun-Hex commented May 9, 2026

Summary

Replaces multi-perspective subagent design with single-agent + structured reasoning across requirement-review and requirement-clarification skills. Aligns with 2025 industry best practice: single agent + strong schema + grounded reasoning beats parallel subagents for requirement-style analysis where context fits one window.

Adds Feishu-friendly report formatting throughout the QA reporting skills: tables for structured data, P0/P1 priority, MeterSphere edit links, and checkbox-based reviewer feedback (有效/无效, 已修复/无需修复) where decisions need to be tracked in the doc itself.

Skills

  • requirement-review: drop multi-perspective branch (4.0/4.1a), promote 4.1b serial review to canonical path with mandatory 假设/反例搜索/结论 + original-text quoting; add 4.1.7 enum normalization (anti-collapse) and 4.1.8 multi-variant consistency (class-inheritance defense); §3 各职能问题列表改为 list+P0/P1+二选一 checkbox 反馈结构; add Closing Checklist Next Steps with handoffs to test-case-generation; add rr_summary.confidence (0-100) with reproducible formula
  • requirement-clarification: drop 3.2.1 multi-perspective branch; redesign single-agent confidence formula; align severity to Chinese 阻断/关注
  • test-case-review: §6 修复 to-do uses MS edit links + 已修复/无需修复 checkbox; rewrite todo lines to use action verbs (修步骤/补步骤/改预期 etc.) instead of internal dimension tags; convert §2 coverage matrix to tables
  • requirement-traceability: §4 缺陷清单 adds 有效 bug/无效 bug checkbox per defect with --- dividers; convert §3.1 coverage matrix and §3.2 code traceability to tables
  • change-analysis: convert §2.5 model fields, §4.1/§4.2 impact domains, §3 coverage to tables; add §6 cross-validation summary template with §6.4 综合结论 checkbox

Conventions

  • CONVENTIONS.md: new "飞书文档渲染规范" section as the single source of truth on table-vs-bullet selection, banned elements, and severity terminology; replaces the obsolete "no markdown table" rule that lived in each skill (Feishu import has supported tables for some time, verified end-to-end)
  • _shared/REQUIREMENT_DIMENSIONS.md: terminology mapping table aligns severity (阻断/关注 with P0/P1 alias for report.md), status (per-FP vs per-dimension), and verdict↔confidence mapping; deprecates blocking/concern English and the 4-tier 阻断/高/中/低
  • contracts/rr-summary.schema.json: confidence required (0-100); blocking_issues required (must be explicit empty array if none)

Schemas / tests

  • contracts/rr-summary.schema.json fixture in tests/check-schemas.sh updated to include confidence and out-of-range/missing rejection cases
  • All 30+ checks in tests/validate.sh pass

Removed

  • agents/requirement-understanding/{user,functional,exception}-perspective.md (the three perspective subagent definitions are no longer referenced; consumers were requirement-review and requirement-clarification, both refactored to single-agent)

Versions

  • plugins/test 0.0.9 → 0.0.16 (cumulative across multiple iterations)
  • marketplace.json metadata 0.1.41 → 0.1.48

Test plan

  • bash tests/validate.sh passes
  • Feishu rendering verified end-to-end (tables, nested checkboxes, MS links, --- dividers all render correctly in import)
  • Reviewer to spot-check rendered samples linked in commit comments

…t checkbox feedback

Replaces multi-perspective subagent design with single-agent + structured reasoning
across requirement-review and requirement-clarification skills. Aligns with 2025
industry best practice: single agent + strong schema + grounded reasoning beats
parallel subagents for requirement-style analysis where context fits one window.

Skills

- requirement-review: drop multi-perspective branch (4.0/4.1a), promote 4.1b serial
  review to canonical path with mandatory 假设/反例搜索/结论 + original-text quoting;
  add 4.1.7 enum normalization (anti-collapse) and 4.1.8 multi-variant consistency
  (class-inheritance defense); add P0/P1 priority + checkbox-list per-role issues;
  add Closing Checklist Next Steps with handoffs to test-case-generation; add
  rr-summary confidence field (0-100) with reproducible formula
- requirement-clarification: drop 3.2.1 multi-perspective branch; redesign single
  agent confidence formula; align severity to Chinese 阻断/关注
- test-case-review: §6 修复 to-do uses MS edit links + 已修复/无需修复 checkbox;
  rewrite todo lines to use action verbs (修步骤/补步骤/改预期 etc.) instead of
  internal dimension tags; convert §2 coverage matrix to tables
- requirement-traceability: §4 缺陷清单 adds 有效 bug/无效 bug checkbox per defect
  with `---` dividers; convert §3.1 coverage matrix and §3.2 code traceability to
  tables
- change-analysis: convert §2.5 model fields, §4.1/§4.2 impact domains, §3 coverage
  to tables; add §6 cross-validation summary template with §6.4 综合结论 checkbox

Conventions

- CONVENTIONS.md: new "飞书文档渲染规范" section as the single source of truth on
  table-vs-bullet selection, banned elements, and severity terminology; replaces
  the obsolete "no markdown table" rule that lived in each skill (Feishu import
  has supported tables for some time, verified end-to-end)
- _shared/REQUIREMENT_DIMENSIONS.md: terminology mapping table aligns severity
  (阻断/关注 with P0/P1 alias for report.md), status (per-FP vs per-dimension),
  and verdict↔confidence mapping; deprecates blocking/concern English and the
  4-tier 阻断/高/中/低
- contracts/rr-summary.schema.json: confidence required (0-100); blocking_issues
  required (must be explicit empty array if none)

Schemas / tests

- contracts/rr-summary.schema.json fixture in tests/check-schemas.sh updated to
  include confidence and out-of-range/missing rejection cases

Removed

- agents/requirement-understanding/{user,functional,exception}-perspective.md
  (the three perspective subagent definitions are no longer referenced)

Versions

- plugins/test 0.0.9 → 0.0.16 (cumulative across multiple iterations)
- marketplace.json metadata 0.1.41 → 0.1.48
@Jun-Hex Jun-Hex requested a review from em0t as a code owner May 9, 2026 02:02
@github-actions github-actions Bot added scope:tests Changes tests or validation coverage. scope:plugins Changes one or more plugins. size:xl Very large PR, expect slower review. labels May 9, 2026
@Jun-Hex
Copy link
Copy Markdown
Collaborator Author

Jun-Hex commented May 9, 2026

Automated Code Review

概要:删除 3 个 perspective 子代理,requirement-review/requirement-clarification 转单 Agent 强推理(4.1.7 反 collapse + 4.1.8 多变体一致性);rr_summary 新增 confidence 0-100 公式 + 硬规则;test-case-review/requirement-traceability/change-analysis 模板表格化 + checkbox 反馈;新增 CONVENTIONS.md「飞书文档渲染规范」;版本 0.0.9→0.0.16,marketplace 0.1.41→0.1.48。

检查清单

类别 状态 备注
安全性 无敏感信息变更
逻辑正确性 ⚠️ verdict OR 规则歧义;CONVENTIONS vs PHASES 单问/多问不一致
资源管理 不涉及
API 与兼容性 dead reference + schema breaking change(Claude+Codex 双确认)
代码质量 ⚠️ 文档准确性偏差
📋 项目规则 三处 plugin version 同步至 0.0.16;marketplace metadata 0.1.48;codex-plugin interface 完整;SKILL.md frontmatter name 全部匹配目录名

说明:✅ 通过 | ⚠️ 警告 | ❌ 未通过 | ➖ 不适用

🚫 阻塞问题

  • plugins/test/skills/test-case-generation/PHASES.md:185-216 + plugins/test/skills/_shared/AGENT_PROTOCOL.md:16-19,47dead reference:删除了 agents/requirement-understanding/ 整个目录但 test-case-generation 仍引用 [Claude+Codex 双确认]

    • PHASES §2.4「多视角并行分析」完整保留 agents/requirement-understanding/functional-perspective.md 等 3 个子代理的 Read 路径与 Task 调用指令
    • SKILL.md:29,81 features 列表与 model 表仍描述"多视角并行分析"
    • _shared/AGENT_PROTOCOL.md 目录结构示意未同步更新
    • 运行时影响:test-case-generation 阶段 2.4 的 Task 调用会读取不存在的文件,phase 跑不动 — PR 直接引入回归
    • 建议:必须二选一并落地:(a) 同步把 test-case-generation §2.4 改为单 Agent 强推理;(b) 把 perspective agents 目录回退保留,仅 review/clarification 改为不再引用。同时同步 AGENT_PROTOCOL.md
  • plugins/test/contracts/rr-summary.schema.json:6confidence + blocking_issues 改 required 是 schema breaking change,无迁移路径 [Claude+Codex 双确认]

    • additionalProperties: true 不保护缺失必填键;旧 rr_summary.json 产物(PR 之前生成的)解析直接报错
    • PR 描述未声明这是 schema breaking change;ai-case 后端在 post_complete 阶段对历史 session 重放或前端从历史卡片解析会触发 ValidationError
    • 建议:(a) schema description 显式注明该字段从 v1.1+ required 起强制;(b) PR 描述/CHANGELOG 明确 breaking change,需 ai-case 后端做兼容(缺字段填默认值或视为 legacy);(c) 评估是否提供 migration script

⚠️ 警告

  • plugins/test/skills/requirement-review/contract.yaml:30rr_summary.json description 漏写新增 required 字段 confidence

    • description 仍写「verdict / issue_count / blocking_issues」;contract.yaml 是 ai-case 平台读取并据此生成 schema doc 与 contract bridge 的源
    • 建议:description 改为「verdict / issue_count / confidence / blocking_issues」并把 version 从 1.1 bump 至 1.2(新增 required field 属 breaking 语义)
  • plugins/test/skills/requirement-review/PHASES.md:417-420verdict 硬规则使用 OR 逻辑,存在多 verdict 同时命中的歧义

    • 反例:confidence=50, 阻断项=6 时,ready_with_conditions(confidence ∈ [50,79])和 not_ready(阻断>5)都命中,AI 无规则裁决导致输出不稳定
    • 建议:定义优先级(先判 not_readyready → 兜底 ready_with_conditions),或重写为互斥三段:ready: ≥80 且=0 / ready_with_conditions: <80 且 阻断∈[1,5] / not_ready: 阻断>5 或 confidence<50。同步更新 _shared/REQUIREMENT_DIMENSIONS.md:64-70
  • plugins/test/CONVENTIONS.md:220,233,238 vs requirement-review/PHASES.md:323 + requirement-clarification/PHASES.md:374CONVENTIONS 与 PHASES 单问/多问规则不一致

    • CONVENTIONS 仍规定 1-4 个问题;review/clarification 已强制每次 1 个(CRITICAL,避免序列化抖动)
    • 后续基于 CONVENTIONS 实现的新 skill 会复现"序列化抖动"问题
    • 建议:CONVENTIONS 增补「需求侧 skill(review/clarification)每次仅发 1 个问题」例外条款,或评估升级为全局约束

💡 建议

  • plugins/test/skills/requirement-review/PHASES.md:28 — "对齐 clarification 4 模式命名"但 review 实际只列 3 个模式(文档/设计稿/描述);改文案 43 或补「文档+设计稿联合模式」
  • plugins/test/skills/requirement-review/SKILL.md:163 — output checklist 4 字段 vs contract.yaml description 3 字段,与上方阻塞项 docs: 完善 README 使用说明文档 #2 一并修
  • plugins/test/skills/_shared/REQUIREMENT_DIMENSIONS.md:64-70 — verdict 表格 OR 表述与 PHASES 同根源问题,两处一并修
  • plugins/test/README.md:370 — README 版本历史段未见 v0.0.16 条目,建议补一条说明 single-agent 改造 + Feishu checkbox 反馈
  • PR 描述显式标注 "Reviewer to spot-check rendered samples" 未完成 — 请附飞书文档渲染截图/链接到 PR comment(特别是 §3 嵌套 checkbox、§5 风险表格在飞书 import 后的实际效果)

状态⚠️ 待处理(2 个阻塞问题 / 3 个警告)


Powered by Claude Code (Opus 4.7) + Codex CLI · 双视角交叉验证
回复 忽略: [问题关键词] 可标记为已忽略,re-review 时自动识别。已修复的问题会自动检测。

Jun-Hex and others added 2 commits May 9, 2026 10:50
…ing + cumulative changelog

- requirement-review/PHASES.md verdict hard rule: replace OR-joined three-tier with priority chain (hit-and-stop) so verdict is unique under any (confidence, blocking) combination; previously confidence=50 blocking=6 would land in both not_ready and ready_with_conditions
- _shared/REQUIREMENT_DIMENSIONS.md: sync correspondence table to priority order
- requirement-review/PHASES.md mode detection: drop misleading "align with clarification 4 modes" wording (review only has 3, no exploratory mode since Story context is required)
- README: backfill cumulative v0.0.10–v0.0.16 changelog + add v0.0.17 entry
- bump plugins/test 0.0.16 → 0.0.17 across plugin.json (×2) and marketplace entry; bump marketplace metadata 0.1.48 → 0.1.49

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reset versions from in-development 0.1.49 / 0.0.17 to production+1 (0.1.42 / 0.0.10) so the seven internal patch bumps (0.0.10-0.0.16 plus the 0.0.17 review fixup) collapse into a single release entry.

- Bump plugins/test 0.0.17 -> 0.0.10 across plugin.json (Claude + Codex) and marketplace test entry
- Bump marketplace metadata 0.1.49 -> 0.1.42
- CHANGELOG.md: insert 0.1.42 section consolidating single-agent rewrite, Feishu checkbox feedback, conventions/shared/contracts changes
- plugins/test/README.md: replace the two interim 0.0.17 + cumulative entries with a single v0.0.10 entry
- README.md / README.en.md: update test row 0.0.9 -> 0.0.10

tests/validate.sh passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026
…ence completeness + ui-fidelity/api-contract consolidation

GameJam漏报 root-cause fix (input_quality + supplementary cases routing)
plus output-spec evidence completeness and architectural consolidation.

Smoke-test honest verdict (cumulative across v0.0.12 → 0.0.14):
- New PHASES §1.3.d input_quality check (full/medium/low) as single
  source of truth for all degradation behavior
- §3.1 priority 1.5 tier consumes change_supplementary_cases.json
  (semantic match on title+steps+preconditions, NOT module reverse-lookup)
- §4.6 fallback synthesis only when input_quality=low; non-low + empty fv
  → STOP (no more silent fallback masking)
- §5S.2 verdict expanded from binary to 5 tiers
- §5S.1 supplementary defects inherit ca priority directly

Output-spec evidence completeness (§3.2.0a):
- A (data-flow closure) + B (cross-boundary recording) + D (path-driven
  failure modes) mechanically validated by validate-fv
- C (per-expected reconciliation) is honest model self-check, NOT in
  validator (mechanical keyword match would create false safety)
- New cross_component_break defect source (5S.1 source 7)
- forward_verification.schema: requirement_id pattern supports
  FP-N | FP-UNMAPPED-N | R-N; case_source enum added
- smoke-test-report.schema: verdict 5-tier + input_quality +
  verification_channel as required fields
- defect-list.schema: P3 added for supplementary inheritance

ui-fidelity-check skill consolidated:
- Standalone skill removed; check exclusively triggered inside
  requirement-traceability §3.4 via shared ui-fidelity-checker agent
  (Browser MCP dropped, structural-only)

api-contract-validation kept + agent extracted:
- New shared agents/api-contract-validator.md as stateless calc unit
- requirement-traceability §3.2.5 launches same agent (no more
  lite-inline + upstream-first split-brain)

Documentation cleanup:
- README ↔ AI_CODING_BEST_PRACTICES dedup (README=catalog, BP=SOP)
- metersphere-sync env var docs corrected (no fake "zero-config" claim)
- All "rt"/"ca" abbreviations expanded to full skill names
- "反查" misleading term replaced with "语义匹配" / "推断"

Versioning: test plugin 0.0.10 → 0.0.14, marketplace 0.1.42 → 0.1.46

Verification:
- bash tests/validate.sh all-green (incl. new Check N+2 A/B/D regression)
- Manual replay against session 311+312 artifacts confirmed TC-11 P0
  supplementary case now reaches defect_list (missing in 312 baseline)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. maintainer-review High-risk change that should be reviewed by a repository maintainer. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026
CI was failing because metersphere_helper.py module-level requires
MS_ACCESS_KEY / MS_SECRET_KEY (for skills calling MS API), but the
new Check N+2 only invokes validate-fv which is a pure local schema
check that never touches MS. Without these env vars set in CI, the
helper exits 1 at module load → validate-fv never runs → grep finds
no completeness/A|B|D markers → fail.

Quick fix: export ci_dummy values around the Check N+2 block. The
deeper fix (move the env precondition from module-level to the actual
MS-bound subcommands) is left as a follow-up since it touches every
caller of metersphere_helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026
The module-level MS_ACCESS_KEY / MS_SECRET_KEY check at import time
caused validate-fv (a pure local schema-validation subcommand) to fail
in CI, where MS credentials are intentionally absent. The ci_dummy
workaround in validate.sh (commit 8540307) treated the symptom; this
commit fixes the root cause.

- _check_ms_credentials(cmd) runs at main() dispatch time, before any
  subcommand handler. validate-fv (and any future local-only subcommand
  added to _LOCAL_ONLY_COMMANDS) skips the check; everything else fails
  early with the same precondition_failed payload as before.
- Whitelist is "local only" rather than "MS only" so future MS-bound
  subcommands inherit the check by default — safer than the inverse.
- Roll back the validate.sh dummy export now that the helper does the
  right thing without it.
- Docstring updated to note the new conditional behavior.

Verified locally: with .env hidden + MS_* unset, ping fails as expected
("missing required environment variables") and validate.sh Check N+2
still passes end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026
Reset versions to production+1 per /prepare-release SOP. Collapses
the dev-iteration changelog entries (0.1.46/0.1.45/0.1.44/0.1.43 and
test plugin v0.0.14/0.0.13/0.0.12/0.0.11) into a single 0.1.42 / v0.0.10
release entry covering all 6 themes shipped in this PR:

1. requirement-review/clarification single-agent strong-reasoning rewrite
2. Feishu report checkbox feedback structures across 4 TEMPLATES
3. AI_CODING_BEST_PRACTICES.md engineer-perspective rewrite + team-internal
   content moved to Feishu + README↔BP dedup
4. MeterSphere config "zero-config" claim fixed; metersphere_helper lazy
   MS env credential check (validate-fv works in CI without MS creds)
5. Smoke-test honest verdict overhaul (input_quality routing, GameJam
   class fix) + A/B/D evidence completeness validator + Check N+2
6. ui-fidelity-check skill consolidated into traceability §3.4;
   api-contract-validator agent extracted as shared calc unit between
   the standalone skill and traceability §3.2.5

Verified: bash tests/validate.sh all-green (incl. Check N+2 A/B/D
regression) under env -u MS_ACCESS_KEY -u MS_SECRET_KEY (CI sim).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer-review High-risk change that should be reviewed by a repository maintainer. scope:plugins Changes one or more plugins. scope:tests Changes tests or validation coverage. size:xl Very large PR, expect slower review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant