Skip to content

chore(test): Feishu report readability + skill audit + CI hardening#59

Merged
Jun-Hex merged 15 commits intomainfrom
feat/rr-report-readability
May 8, 2026
Merged

chore(test): Feishu report readability + skill audit + CI hardening#59
Jun-Hex merged 15 commits intomainfrom
feat/rr-report-readability

Conversation

@Jun-Hex
Copy link
Copy Markdown
Collaborator

@Jun-Hex Jun-Hex commented Apr 29, 2026

This PR consolidates 3 blocks of work:

  • Block 1 — Feishu report readability (commits 356280732eb7f1, 6 commits)
  • Block 2 — Test plugin audit + CI baseline hardening (commits 4679a7f, fad14aa, 117cbcf, 3 commits)
  • Block 3 — 8-dim cross-skill review phase 1-3 follow-ups (commits 836cb61bf41bce, 5 commits)

Single version bump to test 0.0.9 / marketplace 0.1.41 (consolidated via /prepare-release). All checks green (79 PASS / 0 FAIL, validate.sh).


TL;DR

Block 1 inspects 4 real Feishu report samples produced by ai-case workflows and constrains output format: 3 new TEMPLATES.md created (test-case-review / change-analysis / requirement-traceability had none) + 1 rebuilt (requirement-review) + 1 schema violation fixed (review_mode blacklist). All changes verified via real Feishu docx import (zero emoji / ASCII / table residue).

Block 2 systematically audits the test plugin (16 skills / 10 agents / 5 contract schemas) and hardens CI from 5 → 9 check categories. Resolves 8 contract output collisions, removes 3 zombie agents, makes dependencies honest.

Block 3 runs an 8-dimension cross-skill review: structural compliance, scope/overlap, trigger description, contract closure, orchestration, executability, dead code, cross-cutting docs consistency. Surfaces and resolves contract drift (schema bindings, demand-driven vs change-driven gating, MS writeback dual-entry clarity) and syncs Step.expected from ai-case Pydantic into testcase.schema.json (now required, allows empty string).


Block 1 — Feishu Report Readability

Background

ai-case auto-uploads markdown reports to Feishu Docs after each AI workflow. Real-world inspection of 4 samples surfaced format issues hurting readability, and one triggered a schema violation:

Sample Main issues
实体商品一期-需求评审-2026032500 Duplicate H1 / fragmented requirement-points list / [OK]/[!]/[X] ASCII markers / QA AI 助手 self-praise signature
测试用例评审-分包上传(2026-03-04) 5 tables fragmented into rows / no overall readiness verdict / no per-case fix to-do grouping
进社区默认落tab实验-变更分析-2026042310 8 tables fragmented / inconsistent ★/★★/★★★ markers across workflows / no multi-doc cross-references / vague "约 N 个" coverage stats
插件升级提示-冒烟测试-2026040710 5 tables fragmented / decorative emoji section prefixes 📋🐛📊💡 auto-converted to ASCII / no overall confidence summary
ai-case session 248 rr_summary.json wrote illegal review_mode: \"single_agent_serial\", triggering contract schema validation failure (logged in output_data._schema_violations)

Feishu docx import compatibility traps (discovered via real upload)

The first iteration (commit 3562807) introduced emoji + HTML <details> based on "looks better" assumptions — all failed in Feishu docx import testing. These 4 traps are the scientific basis for every constraint in this PR:

Markdown input Feishu actual rendering Conclusion
<details><summary>...</summary>...</details> Summary becomes plain text + content auto-expanded HTML collapse not supported
✅ 已获取 text: [OK] 已获取 ✅ auto-converted to ASCII
⚠️ 待定 text: [!] 待定 ⚠️ auto-converted to ASCII
❌ 未获取 bitable(17) embedded table block, entire row content lost ❌ triggers Feishu base table parsing
📋 基本信息 (decorative section prefix) text: [Doc] 基本信息 📋/🐛/📊/💡 all auto-converted to ASCII
markdown table | ... | ... | each cell becomes individual text/bullet block Tables fragmented into prose

Fix direction: unify Chinese square-bracket markers [通过]/[已覆盖]/[实证] to replace all emoji + ASCII + star ratings; tables → H3 + bullets; collapse → full-content H4 grouping.

Changes (6 commits / 4 skills / 11 files)

commit skill Change
3562807 requirement-review Rebuild TEMPLATES.md for report.md (10 readability optimizations)
26d3b03 requirement-review Post-import fixes: 5 compatibility + 4 structural issues (TOC, section numbering, intent table, chat output)
91ace7e test-case-review NEW TEMPLATES.md (no template existed before) — 8-section strict template
fb47249 change-analysis NEW TEMPLATES.md (multi-doc constraints for code_change_analysis.md + test_coverage_report.md) — bidirectional cross-references + per-line [实证]/[推测] confidence tagging
e459975 requirement-traceability APPEND smoke-test mode report.md template (PHASES 5S.2 only constrained JSON output before, no markdown template)
32eb7f1 requirement-review FIX rr_summary.json writing illegal review_mode values like single_agent_serial (was PR #58, merged in here)

Per-skill report structure:

Skill Report Sections Key innovation
requirement-review report.md 6 New §4 timebox checklist (pre-review / in-review / pre-dev / pre-test to-dos)
test-case-review review_summary.md 8 New §0 case-readiness verdict + §6 fix to-dos grouped by case ID (bold ID + · prefix to preserve hierarchy after Feishu flattens nested bullets)
change-analysis code_change_analysis.md + test_coverage_report.md 6 + 5 Multi-doc bidirectional refs + per-line [实证]/[推测] tagging + §5 confidence index
requirement-traceability smoke-test report.md 6 §0 overall confidence summary + §2 numerator/denominator titles 共 N/总数 + §3.2 explicit untraced changes

Unified conventions (11 rules across 4 workflows)

PR consolidates per-skill report conventions into unified rules:

  1. No H1 title — Feishu doc name already includes report type, avoid duplication
  2. No markdown tables — Feishu fragments tables into discrete text blocks that read like prose
  3. No emoji ✅/⚠️/❌/⭕ — Feishu auto-converts to ASCII or destroys content
  4. No ASCII [OK]/[!]/[X] — Semantic drift across workflows (rr means "conditional", ca_cov means "low risk but hard to reproduce")
  5. No star ratings ★/★★/★★★ — Inconsistent across workflows
  6. No decorative emoji section prefixes 📋/🐛/📊/💡 — Feishu auto-converts to ASCII
  7. No HTML <details><summary> collapse — Feishu doesn't recognize, summary becomes misleading text
  8. Unified Chinese square-bracket markers (by scenario):
    • Readiness/Pass: [通过]/[有条件]/[不通过]
    • Coverage: [已覆盖]/[部分覆盖]/[未覆盖]/[范围外]
    • Risk: [高]/[中]/[低]
    • Confidence: [实证]/[推测]
    • Priority: [P0]/[P1]/[P2]
  9. Trailing metadata must include "分析方式:AI 辅助 + ... 决策" — Replaces the old QA AI 助手自动生成 self-signature (which weakened enterprise trust)
  10. Code blocks must have prefix quote > 以下代码块仅 Dev 排查时阅读,PM/QA 可跳过 — Balances PM/QA vs Dev viewpoints
  11. Multi-doc must use bidirectional cross-references — Cross-section issues use numbered references, not restatement

Verification (real Feishu docx upload)

Used ai-case create_feishu_doc.py to upload 4 mock reports to Feishu docx, then verified rendering via docx_v1_documents/{id}/blocks API:

Report type Block count Emoji residue ASCII residue Table triggered Bitable triggered
requirement-review (E2E) 58 0 0 0 0
test-case-review 88 0 0 0 0
test-case-review §6 (case grouping) 11 0 0 0 0 (verified bold ID + · prefix preserves hierarchy)
change-analysis 35 0 0 0 0
smoke-test 53 0 0 0 0

All Feishu doc output elements (H2/H3/H4 hierarchy, bullets, code blocks, quote blocks, bold markers, Chinese square brackets) preserved verbatim.

Impact

  • Templates and prompt docs only — no skill behavior or contract changes:
    • rr_summary.json schema unchanged (commit 32eb7f1 only constrains AI output, schema enum was always strict)
    • defect_list.json / smoke_test_report.json schemas unchanged
  • Next workflow run picks up new templates automatically — no ai-case platform changes needed
  • Existing Feishu reports unaffected

Block 2 — Test Plugin Audit + CI Baseline Hardening

TL;DR

Systematic audit of test plugin (16 skills / 10 agents / 5 contract schemas / CONVENTIONS / 3 root docs). CI baseline expanded 5 → 9 check categories. Caught 8 real contract collisions (resolved via 4 renames + 4 whitelist entries), removed 3 zombie agents, made dependencies honest.

CI baseline hardening (validate.sh 5 → 9 check categories)

Check Scope Content
10 test plugin SKILL.md frontmatter name ↔ directory name match
11 test plugin handoffs [].skill targets exist
12 test plugin subagent_type=... referenced agent files exist
13 test plugin references/*.md paths resolve (incl. Chinese filenames)
14 test plugin validate_contracts.py cross-skill consistency + whitelist (contracts/known-collisions.yaml)

Plus: extended tests/check-schemas.sh from 1 → 5 schemas (testcase / ca-summary / defect-list / rr-summary / smoke-test-report) with ~30 negative cases. Fixed mktemp failure silently passing under sandbox mode.

Contract output collision resolution (8 real collisions → 4 renames + 4 whitelist)

Original filename Resolution
risk_assessment.json change-analysis renamed to bug_risk_assessment.json (semantic distinction from traceability)
test_execution_report.json Split to unit_test_execution_report.json / integration_test_execution_report.json
supplementary_cases.json (3-way collision) change-analysis → change_supplementary_cases.json; test-case-review → review_supplementary_cases.json; test-case-generation keeps canonical (internal scratch)
ui_fidelity_report.json Whitelisted (primary+fallback: traceability PHASES 3.4 only self-produces if work dir lacks ui-fidelity-check output)
forward_verification.enriched.json + 3 writeback files Whitelisted (same metersphere_helper.py writeback-from-fv helper, runtime-mutually-exclusive)
requirement_points.md Whitelisted (TCG/TCR understand-phase fallback, schema-identical; also fills TCG contract's previously-missing output declaration)

Added plugins/test/contracts/known-collisions.yaml as whitelist source of truth, with self-binding "review protocol" rule (rejects semantically different same-name entries).

Zombie agent cleanup (3 deletes + 13 dead references)

Deleted file Justification
agents/requirement-traceability/forward-tracer.md Self-marked DEPRECATED; prompt inlined into PHASES 3.2.4
agents/requirement-traceability/reverse-tracer.md PHASES 3.3 explicitly says "main agent inlines sequentially, no Task call"; zero real invocations
agents/failure-classifier.md test-failure-analyzer/{SKILL,PHASES} zero references; pure "reserved" doc

Cleaned 13 dead references across 7 files (SKILL/PHASES/TEMPLATES/TRACEABILITY_PROTOCOL/AGENT_PROTOCOL/README).

Dependencies honesty

File Change
shared-tools/requirements.txt Was "No third-party packages required"; corrected to list 5 real deps (pycryptodome/pyyaml required + jsonschema/python-dotenv/aiohttp optional)
feedback/requirements.txt NEW, declares aiohttp (required by feishu_api.py)
feedback/contract.yaml env_vars added FEISHU_PLUGIN_ID/SECRET/USER_KEY (was [] despite scripts needing them)

Helper script UX consistency

5 shared scripts now support standard -h/--help (previously --help was sent as a network query in search_mrs.py):

  • search_mrs.py / search_prs.py: added USAGE constant + --help interception
  • gitlab_helper.py / github_helper.py / metersphere_helper.py: "unknown subcommand" path now if argv[1] in (\"-h\", \"--help\"): print(usage); exit 0

Documentation polish

  • AI_CODING_BEST_PRACTICES.md / PIPELINES.md / CONTRACT_SPEC.md get audience notes ("developer reference, not skill runtime dependency")
  • README directory structure adds role notes for each root doc + contracts/ directory
  • CONVENTIONS L466 ID-prefix table fact correction: RP- isn't only TCG review; both TCG/TCR use it as understand-phase fallback
  • Fixed 2 broken anchor links: CONVENTIONS.md#审计输出格式_shared/ASSERTION_AUDIT.md#审计输出格式
  • AGENT_TEMPLATE: added sub-agent section-name exception clause (allows case-tracer to use ## 执行流程/输出/置信度评分 instead of standard names)
  • WORKFLOW_DEFS qa-lite writeback behavior corrected: from "will fail" → "graceful skip" (matches PHASES 6.1.b actual behavior)
  • integration-test-design: 5-phase content split from SKILL into standalone PHASES.md (matches unit-test-design style; SKILL trimmed 387 → 252 lines)
  • feedback/SKILL.md adds Quick Start + Closing Checklist
  • metersphere-sync/SKILL.md adds list-stages command docs + stdout shape
  • change-analysis/SKILL.md model-tier table adds codex-change-analyzer
  • Removed metersphere-sync deprecated confidence_threshold field (helper hadn't honored it; only contract.yaml placeholder)

Block 3 — 8-Dimension Cross-Skill Review Phase 1-3 Follow-ups

TL;DR

Multi-dimensional review of all 15 skills, 7 agents (under agents/), and 5 contracts. 23 decision points across 8 dimensions resolved. 7 actionable fixes implemented; 3 "keep current state" decisions noted as known limitations.

Coverage (8 dimensions)

# Dimension Verdict Real findings
1 Structure compliance (plugin manifests / SKILL frontmatter) 🟢 None
2 Scope boundaries / overlap (skill ↔ skill) 🟡 4 overlap pairs; 1 fixed (tcg/ca driver gating), 3 deferred to LLM routing
3 Trigger description quality 🟡 trace description called out but kept (per D2-Q4 tied)
4 Contract closure (schema ↔ skill IO ↔ consumers) 🟡 → 🟢 6 sub-issues resolved (see changes below)
5 Orchestration correctness (qa-workflow / PIPELINES) 🟡 MS writeback dual-entry clarified via cross-ref to known-collisions
6 Executability (scripts / external deps / agent refs) 🟡 Narrative-only agents accepted; fallback docs deferred
7 Redundancy / dead code 🟢 3 minor findings, none requiring deletion
8 Cross-cutting docs consistency (README / CONVENTIONS / PIPELINES / CONTRACT_SPEC / BEST_PRACTICES) 🟢 3 minor drift fixes applied

Changes implemented

  • Schema bindings: 3 contract.yaml files (tcg/tcr/ca) explicitly bind testcase.schema.json on 5 *_cases.json outputs (was implicit via CONVENTIONS.md inheritance; now machine-checkable)
  • Demand-driven vs change-driven gating: tcg SKILL description marked 需求驱动 for supplementary cases; ca SKILL description marked 变更驱动 + reverse SKIP cross-references between ca and trace
  • review_result.json non-strict JSON: test-case-review/SKILL.md notes it's a verdict object (not TestCaseList), deliberately not schema-bound
  • Step.expected now required: synced from ai-case Pydantic source (companion ai-case MR !40); allows empty string but field must exist. MCP tool input_schema auto-reflects via TypeAdapter at LLM tool-call time
  • test_method relaxed to optional: CONVENTIONS.md updated to match Pydantic (which was already Optional; doc was lagging) — supplementary case producers may omit
  • MS writeback dual-entry clarified: requirement-traceability PHASES.md Phase 6 + PIPELINES.md cross-reference known-collisions.yaml explaining trace Phase 6 and metersphere-sync mode=execute share the metersphere_helper.py writeback-from-fv helper as auto vs manual entry points (not duplicate implementations)
  • qa-workflow re-entry support: accepts re_entry_phase + requirement_change_summary optional inputs, passthrough to test-case-generation for "rerun after requirement change" scenarios
  • README selection guide: surfaces demand-driven (tcg) vs change-driven (ca) split; drops stale "v0.0.10+" version gates
  • Sample annotation: requirement-clarification SKILL Quick Start clarifies output/*.json are pre-shipped format samples, not runtime artifacts
  • Agent loader docs: README + _shared/AGENT_PROTOCOL.md add inline note explaining plugins/test/agents/ markdown files lack YAML frontmatter and are loaded explicitly via Task tool calls inside skills, not auto-registered as Claude/Codex subagents

Known limitations (deferred decisions)

  • 5 narrative agents not declared in dependencies.agents (kept as-is, per D6-A)
  • feedback / tcg / tcr / qa-workflow lack explicit fallback documentation when external deps unavailable (per D6-B; relies on fail-loud + error-message guidance)
  • requirement-clarificationrequirement-review overlap not gated; LLM routes (per D2-Q1)
  • tcg internal redundancy-pair review vs standalone test-case-review boundary not enforced (per D2-Q2)

Companion change in ai-case

ai-case MR !40 (fix/skill-loading-doc) commits case_schema.py Step.expected required. Merge order recommendation: this PR first → ai-case MR !40 second → submodule pointer update via small follow-up MR.


Commit map

Block Commits Versions touched
1 (Feishu readability) 3562807 26d3b03 91ace7e fb47249 e459975 32eb7f1 (in-flight)
2 (audit + CI) 4679a7f fad14aa 117cbcf (in-flight)
1+2 prep-release 3a57da6 test 0.0.9 / mp 0.1.41
3 (8-dim follow-ups) 836cb61 3564b01 bb3381b b3c4f1a (in-flight bumps to 0.0.10–0.0.14)
3 prep-release bf41bce reset to test 0.0.9 / mp 0.1.41

Test plan

  • bash tests/validate.sh PASS (79 PASS / 0 FAIL)
  • All 5 contract schemas validate + ~30 negative cases rejected
  • Real Feishu docx upload of 4 mock reports — 0 emoji / 0 ASCII / 0 table residue
  • testcase.schema.json synced from ai-case Pydantic via docker exec; verified Step.required = ['action', 'expected']
  • CI: Analyze (python) / label x2 / validate / CodeQL — all SUCCESS
  • Post-merge: companion ai-case MR !40 merged + submodule pointer follow-up MR

🤖 Generated with Claude Code

基于 ai-case 实测样本(实体商品一期-需求评审-2026032500)逐项排查发现
report.md 模板存在 10 处影响用户阅读体验的问题,本次按飞书云文档场景
集中重构 TEMPLATES.md report.md 模板与 PHASES.md 5.1 章节描述。

10 处具体修复:
1. 删除 H1 重复标题 — 飞书 doc 名已包含「需求评审」字样,AI 再写
   `# {需求名称} - 需求评审报告` 会出现两行几乎一样的标题
2. 功能点清单格式破碎 — 原模板用 `F1 | 模块 | 描述` 半列表+分组糅合,
   渲染时分组与第 1 个功能点合并成一行;改为 `### {模块}` H3 分组
   + `- F1: 描述` 列表
3. (略)评审统计加表头 — review_checklist.md 模板已含表头,本次只
   在报告侧明确不再嵌入裸数字段
4. ASCII 状态符号 [OK]/[\!]/[X] 全面替换为 emoji ✅/⚠️/❌;新增"关键
   约束"明确禁止 ASCII 兜底
5. 删除 §7「评审完整性声明」整节 — 90% 信息与 §1 元数据 / §3 章节
   末尾计数重复,"阻断项 X% 在合理范围内" 类 AI 自评对读者无价值
6. 功能点清单 46 条过长 — 强制用 `<details><summary>` 包裹默认折叠,
   飞书云文档导入后会渲染为可展开 block
7. 维度内部编码 FB2/LC2 + UX3 等 — 全部去掉,只保留人类可读的
   「功能边界」「逻辑一致性」维度名
8. 新增 §4「时间盒清单」— 按"评审会前/会上/开发前/提测前"重新组织
   §3 + §5 的 to-do,引用编号 + 一句话动作,不重复问题原文
9. 元数据加 `分析方式:AI 辅助 + 评审会决策`,明确强调结论由人在评审
   会最终决策;不写「评审人:QA AI 助手」之类的 AI 署名(避免企业用户
   质疑结论可信度)
10. 章节序号重排:1 评审结论 / 2 需求理解 / 3 各职能问题 / 4 时间盒
    清单 / 5 风险项 / 6 评审范围声明(旧版本 1/2/3/4/5/7 跳号)

实测下一份需求评审 session 将自动按新模板生成,预期:
- 报告长度从 8.2KB 降到约 3-4KB(折叠功能点清单 + 删 §7)
- 关键 to-do 在 §4 一目了然,不需要在 §3 里翻 PM/Dev/QA 各 section 拼凑
- 飞书 doc 标题与正文不再重复

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Jun-Hex Jun-Hex requested a review from em0t as a code owner April 29, 2026 09:32
@github-actions github-actions Bot added scope:plugins Changes one or more plugins. kind:feature Adds a feature or expands behavior. size:m Medium PR. labels Apr 29, 2026
Jun-Hex and others added 2 commits April 29, 2026 17:52
PR #59 第一版本基于"看起来更好"的假设引入了 emoji ✅/⚠️/❌ 与
HTML <details><summary> 折叠 block。今天上传一份 mock report 到飞书
docx 实测结果(doc_blocks API 验证):

1. ❌ 触发飞书的 bitable block 解析逻辑,整行内容被吃掉
2. ✅ 自动转换为 [OK]、⚠️ 自动转换为 [\!]
3. <details>/<summary> HTML 标签完全被忽略,summary 内容反而成为
   误导性"点击展开"普通文本(用户实际无法点击)

立即修正 5 处:

1. 状态符号统一为中文方括号标记(飞书 import 全部原样保留):
   - 就绪度: [通过] / [有条件] / [不通过](替代 ✅/⚠️/❌)
   - 信息边界: [已获取] / [未获取] / [不完整](替代 ✅/❌/⚠️)
   - 中间产物 review_checklist.md 仍用 [已确认]/[需关注]/[待确认]/
     [不适用] 4 类,TEMPLATES 顶部新增"关于符号约定"段明确不要混用
2. 移除 <details> 折叠约束,功能点清单全量列出,按模块用 H4
   `#### {模块名}` 分组(H4 视觉权重比 H3 低,不打乱大纲)
3. 关键约束新增 emoji 与 details 的禁用说明,附实测原因

同步本次审查发现的另 4 处结构问题(A1/A2/A4/F2):

4. TEMPLATES.md 目录写"7 章结构定义"但实际 6 章,已纠正
5. review_checklist.md 模板章节序号 ## 三、出现两次(标准 QA
   Checklist + 文档-设计稿交叉比对),重排为 三/四/五
6. SKILL.md 意图识别表新增 smoke-test / test-case-generation
   的引导,避免用户说"做冒烟测试"被错配
7. SKILL.md output 阶段 chat 输出措辞与 PHASES.md 5.2 对齐,
   明确 chat 给指标摘要 + 飞书链接,详细模板见 PHASES.md
8. TEMPLATES.md 三、标准 QA Checklist 表头明确引用
   CHECKLIST.md 的具体检查项,AI 不需要重新发明

实测验证:
- 用最新模板生成 mock report 上传飞书,58 个 block 全部正确渲染
- 0 emoji 残留、0 ASCII [OK] 残留、0 HTML 标签残留
- H2/H3/H4 三级标题层次正常,bullet list、§4 时间盒结构完整

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
test-case-review skill 之前没有 TEMPLATES.md,summary 阶段产出的
review_summary.md 完全靠 AI 自由发挥,导致实测样本(分包上传 DC 开放
入口-用例评审-2026-03-04)出现以下问题:

1. §1 评审总览、§2.1 覆盖映射、§3.2 缺失场景、§4.1 / §7 全部用
   markdown 表格,飞书 import 后被拆成离散 text/bullet block,看起来
   是散文不是表格(实测 5 处 ≈ 31 行散行)
2. emoji ✅/⚠️ 大量使用,飞书 import 会自动转 [OK]/[\!],且 ❌ 还会
   触发 bitable block 解析(详见 PR #59 D1 验证结论)
3. 缺"用例就绪度判定"——读者读完不知道这套用例算不算通过评审
4. §4 / §5 / §6 严重重叠:§6 改进建议高/中/低 15 条 ≈ 1:1 复述
   §4 (6 条) + §5 (9 条)
5. 同一用例(如 b3769bb8)在 §3 / §4 / §5 多处被点名,修复者要来回
   翻才能拼出"这个用例到底要改什么"
6. "QA AI 助手自动生成"署名削弱企业用户对结论的可信度

新增 TEMPLATES.md 强约束 8 节结构(0-7):
- §0 评审结论:用例就绪度判定 [通过]/[有条件]/[不通过] + 待修复分级
- §1 评审总览:核心指标 + 覆盖率目标线 ≥ 90%
- §2 需求覆盖率:已覆盖/部分覆盖/未覆盖 三组分别列出(不用表格)
- §3 场景完整性:端到端不闭环 + 缺失边界场景按需求点分组
- §4 用例正确性 / §5 用例规范性
- §6 修复 to-do(按用例 ID 分组):核心可执行价值章节,跨章节问题
  汇总到一起,用例 ID 行用粗体 + 待办行用 `· ` 前缀保留层次
- §7 建议补充的用例

关键约束("关键约束"小节):
- 禁止 markdown 表格(飞书 import 散行)
- 禁止 emoji ✅/⚠️/❌(飞书 import 会破坏内容)
- 禁止 §6 复述 §3/§4/§5 详情(用章节编号引用即可)
- 禁止 AI 助手署名(用「分析方式:AI 辅助 + 评审人决策」)
- 不写 H1(飞书 doc 名已包含「{需求名称}-用例评审-{YYYYMMDDHH}」)
- §6 to-do 用「粗体用例 ID + `· ` 前缀」保层次(飞书会拍平嵌套
  bullet 但 bold + `·` 字符可视觉区分)

同步更新:
- PHASES.md 5.4 引用新 TEMPLATES,列出 8 节结构
- SKILL.md 中间文件清单标注 review_summary.md 是飞书最终产物

实测验证(mock 报告 88 个 block):
- 0 emoji 残留 / 0 ASCII 残留 / 0 表格触发 / 0 bitable 触发
- §6 用例 ID 行带 [BOLD] 标记,待办行无粗体 + `·` 前缀
- 所有中文方括号标记([有条件]/[已覆盖]/[P0]/[完整性] 等)原样保留

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Jun-Hex Jun-Hex changed the title feat(test/requirement-review): 优化飞书报告 10 处可读性问题 feat(test): 优化飞书报告可读性 — requirement-review + test-case-review Apr 29, 2026
@github-actions github-actions Bot added size:l Large PR. and removed size:m Medium PR. labels Apr 29, 2026
change-analysis skill 之前没有 TEMPLATES.md,code_change_analysis.md
和 test_coverage_report.md 完全靠 AI 自由发挥,导致实测样本
(进社区默认落 tab 实验-变更分析-2026042310)出现以下问题:

1. 8 张 markdown 表格全部触发飞书 import 散行(变更分组、API 模型、
   论坛模块、4.1 直接影响、4.2 间接影响、置信度、PR Test plan、
   覆盖率汇总)— 是 4 份报告中表格密度最高的
2. ASCII [OK]/[X]/[\!] 大量使用(飞书 import 的自动转换结果)
3. ★/★★/★★★ 风险标记与跨 workflow 的 [阻断]/[高]/[中]/[低] 不统一
4. 缺"分析就绪度"全局判定 — 读者读完不知道这次变更算低/中/高风险
5. 第五节"置信度说明"放最后 — 读完 9KB 才发现某些结论是推测
6. multi-doc 间无交叉引用 — code_change_analysis §4.4 回归范围、
   test_coverage_report §2 缺口、ca_sup TC 用例 三处信息互相孤立
7. 覆盖率统计含糊 — "总关键变更点约 13 个 → 约 54%" 没声明依据
8. 代码 diff 块极长(Swift 源码)夹在自然语言报告中,PM/QA 视角下
   显得冗余但 Dev 又需要

新增 TEMPLATES.md 强约束 2 份报告:

【code_change_analysis.md】6 节
- §0 变更评估:变更风险 [低]/[中]/[高] + 实证 vs 推测占比
- §1 变更分组总览(H3 + bullet,不用表格)
- §2 核心变更深度(每条结论行尾标 [实证]/[推测])
  * 代码 diff 块前必须加 quote 提示「以下仅 Dev 可跳过」
- §3 调用链分析(代码块保留 ASCII 树)
- §4 影响面评估(4.1/4.2 用 bullet;4.4 回归范围必须 → 详见 [测试覆盖
  评估] 缺口 N 跨文档引用)
- §5 置信度索引(实证 vs 推测占比 + 章节列表,作索引不重复结论)

【test_coverage_report.md】5 节
- §0 覆盖率结论:测试覆盖 [已达标]/[有缺口]/[严重不足] + 精确总变更
  点数(禁止"约 N 个")+ 覆盖率目标线 ≥ 80%
- §1 PR Test Plan 覆盖(bullet 列表)
- §2 测试覆盖缺口分析(每缺口 → 详见 [代码变更分析] §X 跨引 + → ca_sup
  TC-XX 反向引用)
- §3 覆盖率汇总(已覆盖/未覆盖/待评估三组列表,末尾给精确公式)
- §4 高优先级补充测试建议(双向引用缺口 + TC-XX)

关键约束("关键约束"小节):
- 禁止 markdown 表格(飞书 import 散行)
- 禁止 emoji ✅/⚠️/❌(飞书 import 破坏内容)
- 禁止 ASCII [OK]/[\!]/[X](跨 workflow 语义漂移)
- 禁止 ★/★★/★★★(与跨 workflow 分级符号不统一)
- 代码 diff 块前必须加 quote 提示
- multi-doc 必须双向引用
- 置信度逐条标注 [实证]/[推测]
- 覆盖率统计必须精确,禁止"约 N 个"

同步更新:
- PHASES.md 阶段 4 / 阶段 5 引用新 TEMPLATES,列出章节结构
- SKILL.md Story 场景 7 阶段表加 TEMPLATES 链接 + 飞书 multi-doc 说明

实测验证(mock report 35 个 block):
- 0 emoji 残留 / 0 ASCII 残留 / 0 表格触发 / 0 star 残留
- code block + quote block 正常渲染
- bold 标记保留(§0 元数据、变更点标题、§5 占比)
- 中文方括号 [高]/[中]/[实证]/[推测] 全部原样保留

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Jun-Hex Jun-Hex changed the title feat(test): 优化飞书报告可读性 — requirement-review + test-case-review feat(test): 优化飞书报告可读性 — requirement-review + test-case-review + change-analysis Apr 29, 2026
冒烟测试 workflow 复用 requirement-traceability skill 的 smoke-test
模式,但之前 PHASES.md 只约束了 smoke_test_report.json 结构化输出,
没约束最终上传到飞书云文档的 report.md 人类可读报告格式。

实测样本(插件升级提示-冒烟测试报告-2026040710)暴露的问题:

1. 5 张 markdown 表格全部触发飞书 import 散行(基本信息 + 指标 +
   P0 用例评估 + 需求覆盖矩阵 + 代码变更追溯)— 与 ca 并列最多
2. 装饰性 emoji 章节前缀 📋/🐛/📊/💡 被飞书自动转为 [Doc]/[Bug]/
   [Chart]/[Tip] ASCII 形式
3. ⭕ 范围外 emoji 也被破坏
4. ASCII [OK]/[\!] 大量使用
5. 冒烟结论位置不在第一行(必须先看完基本信息表格)
6. 缺陷代码块过长(Kotlin 源码)无 Dev 跳过提示
7. 置信度散落各需求点(87%/75%/85%)但无整体均值汇总
8. 缺陷与用例的关联线弱(DEF-01 关联用例不在 P0 评估 6 条中)
9. "未追溯变更:0 个(无范围蔓延)" 含糊,PM 不懂"范围蔓延"
10. "本报告由 QA AI 助手...自动生成" 署名削弱可信度

新增 TEMPLATES.md「report.md(smoke-test 模式专用)」节,6 节强约束:
- §0 冒烟测试结论(首行 `判定:[通过]/[不通过]` + 整体置信度均值)
- §1 核心指标(bullet 列表,不用表格)
- §2 P0 用例评估(标题必须 `共 N/总数` 明确分子分母)
  * 用粗体用例 ID + bullet `· ` 前缀(参照 test-case-review §6 模式)
- §3 双通道追溯结论(§3.1 需求覆盖矩阵 + §3.2 代码变更追溯)
  * §3.2 末尾未追溯变更必须明确表述(禁止"无范围蔓延"含糊文案)
- §4 缺陷清单(每个 DEF 的代码块前必须加 quote 提示「以下代码块
  仅 Dev 排查时阅读」)
- §5 其他观察(粗体短词分类)

关键约束("关键约束"小节):
- 禁止 markdown 表格(飞书散行)
- 禁止 emoji ⭕/✅/⚠️/❌(⭕ 转乱码、❌ 触发 bitable)
- 禁止 ASCII [OK]/[\!]/[X](跨 workflow 语义漂移)
- 禁止装饰性 emoji 章节前缀 📋/🐛/📊/💡(飞书自动转 ASCII)
- 禁止「QA AI 助手」署名
- 代码块前必须加 quote 提示
- §2 标题必须明确分子分母(共 N/总数)
- §3.2 未追溯变更必须明确表述
- 置信度逐条标注且需汇总到 §0

同步 PHASES.md 5S.2:在 defect_list.json + smoke_test_report.json
两个产物之间新增第 3 步「写入 report.md」,引用 TEMPLATES 中 6 节
结构定义。

实测验证(mock report 53 个 block):
- 0 emoji 残留 / 0 ASCII 残留 / 0 表格触发 / 0 装饰性章节前缀
- bold 标记保留(§0 元数据/缺陷字段名/§5 分类)
- 中文方括号 [通过]/[待定]/[已覆盖]/[范围外]/[P0]/[推测] 全部原样

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Jun-Hex Jun-Hex changed the title feat(test): 优化飞书报告可读性 — requirement-review + test-case-review + change-analysis feat(test): 优化飞书报告可读性 — 4 大业务 workflow(rr/tcr/ca/smoke) Apr 29, 2026
@github-actions github-actions Bot added size:xl Very large PR, expect slower review. and removed size:l Large PR. labels Apr 29, 2026
PHASES.md 5.2 chat 输出文案 '单 Agent 串行' 与 schema enum
['multi_perspective', 'single_agent'] 不一致,AI 在写
rr_summary.json 时直译 '串行' → 'serial' 拼出 single_agent_serial,
触发 contracts/rr-summary.schema.json 校验违规(非阻断但污染
output_data._schema_violations)。

修复:
1. 5.2 文案从 '单 Agent 串行' 改为 '单 Agent'(与 enum 对齐)
2. 5.1.5 约束清单补一条 review_mode 枚举说明 + 黑名单
   single_agent_serial/serial 变体,与 verdict 同样的强制写法

实测案例:ai-case session 248 完成时触发该违规,verdict + 49
issues 均正常,仅 review_mode 字段被污染。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…isions resolved

A systematic review covering all 16 skills, 10 agents, 5 contract schemas, and shared
infrastructure (CONVENTIONS.md, CONTRACT_SPEC.md, root docs). Lifts version 0.0.8 → 0.0.17.

Highlights:

CI baseline (tests/validate.sh): 5 → 9 check categories.
- Add Check 10/11/12/13: SKILL frontmatter name vs dir, handoffs targets, subagent
  references, references/* paths (test plugin scope only)
- Add Check 14: cross-skill contract.yaml consistency via validate_contracts.py +
  known-collisions.yaml allowlist (primary+fallback / shared-helper patterns)
- Extend tests/check-schemas.sh: 1 → 5 schemas (testcase / ca-summary / defect-list
  / rr-summary / smoke-test-report) with ~30 negative samples
- Fix sandbox false-green: new_tmpdir hard-fails on mktemp denial

Zombie agent cleanup: 3 deleted, ~13 dead refs cleaned.
- agents/requirement-traceability/forward-tracer.md (DEPRECATED, prompt inlined to
  PHASES 3.2.4)
- agents/requirement-traceability/reverse-tracer.md (already 'main agent inline' per
  PHASES 3.3, zero Task calls)
- agents/failure-classifier.md (zero refs in test-failure-analyzer/{SKILL,PHASES})

Contract collision resolution (8 detected → 4 renamed + 4 allowlisted):
- risk_assessment.json: change-analysis renamed to bug_risk_assessment.json
- test_execution_report.json: split into unit_test_execution_report.json /
  integration_test_execution_report.json
- supplementary_cases.json (3-way): change-analysis → change_supplementary_cases.json,
  test-case-review → review_supplementary_cases.json, test-case-generation keeps
  canonical (internal scratch)
- ui_fidelity_report.json + 4 writeback files + requirement_points.md: allowlisted as
  primary+fallback or shared-helper outputs

Dependency honesty:
- shared-tools/requirements.txt: corrected from 'stdlib only' to list pycryptodome /
  pyyaml (required) + jsonschema / python-dotenv (optional)
- feedback/requirements.txt: new, declares aiohttp (required by feishu_api.py)
- feedback/contract.yaml: env_vars now includes FEISHU_PLUGIN_ID/SECRET/USER_KEY

Helper UX consistency: 5 scripts now support -h/--help cleanly (search_mrs/prs,
gitlab/github/metersphere helper).

Documentation:
- 3 root-level docs (AI_CODING_BEST_PRACTICES / PIPELINES / CONTRACT_SPEC) tagged
  with audience notes (developer references, not runtime deps)
- README目录结构: each root doc annotated with role
- CONVENTIONS L466 numbering prefix table fixed: RP- belongs to TCG/TCR understand
  (not 'TCG review')
- 2 broken anchors fixed: CONVENTIONS.md#审计输出格式 → _shared/ASSERTION_AUDIT.md
- AGENT_TEMPLATE: section-name exception extended for sub-agents (case-tracer)
- WORKFLOW_DEFS qa-lite: writeback behavior corrected from 'will fail' → 'gracefully
  skip' (matches PHASES 6.1.b actual behavior)
- integration-test-design: 5 phases extracted to PHASES.md (matches unit-test-design
  pattern)
- feedback SKILL: added Quick Start + Closing Checklist
- metersphere-sync: documented list-stages command + stdout shape
- change-analysis SKILL: added codex-change-analyzer to model layering table
- metersphere-sync deprecated confidence_threshold field removed (helper已ignored
  since v0.0.16; safe to drop)

CI: 79 PASS / 0 FAIL, EXIT=0.
@github-actions github-actions Bot added scope:tests Changes tests or validation coverage. maintainer-review High-risk change that should be reviewed by a repository maintainer. labels Apr 30, 2026
@Jun-Hex Jun-Hex changed the title feat(test): 优化飞书报告可读性 — 4 大业务 workflow(rr/tcr/ca/smoke) chore(test): 飞书报告可读性优化 + 全量 skill 评审 + CI 基线加固 (v0.0.8 → 0.0.17) Apr 30, 2026
@github-actions github-actions Bot added kind:chore Maintenance or configuration change. and removed kind:feature Adds a feature or expands behavior. labels Apr 30, 2026
Jun-Hex and others added 3 commits April 30, 2026 18:40
…reakdown 正反例 (v0.0.18)

ca-summary.schema.json:
- 删 scenario 字段(无 backend / 无下游 skill 消费)
- risk_count description 加防混淆说明(指明 integer 总数,分桶用 risk_breakdown)
- risk_breakdown description 加示例

rr-summary.schema.json:
- 删 role_breakdown 字段(前端不展示,下游也未读)
- 删 review_mode 字段(无消费方)

change-analysis/PHASES.md:
- ca_summary example 调整字段顺序:risk_breakdown 前 risk_count 后,避免 AI 边读边写时把 dict 错填到 risk_count(304 实证 confusion)
- 加正反例对比块,明确"分桶 dict 应放 risk_breakdown 而非 risk_count"
- 删 scenario 相关约束行

requirement-review/PHASES.md:
- rr_summary example 删 role_breakdown / review_mode
- 约束段同步删除

contract.yaml:
- ca/rr description 同步去掉已删字段

tests/check-schemas.sh:
- 同步移除 scenario / role_breakdown / review_mode 的 valid sample 与负样本

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to v0.0.10 contract renames (risk_assessment.json,
supplementary_cases.json, test_execution_report.json) — propagate the new
names through PIPELINES.md, README.md, and two TEMPLATES.md files that
still pointed at the old canonical names. Also fix a broken anchor in
requirement-traceability/PHASES.md and a self-contradicting description
in change-analysis/contract.yaml.

- requirement-traceability/PHASES.md: anchor #coverage_reportjson → #traceability_coverage_reportjson
- change-analysis/contract.yaml: description aligned with outputs (change_supplementary_cases.json)
- PIPELINES.md: 8 stale refs updated; work_dir layout split risk/supplementary into per-skill entries
- README.md: 7 stale refs updated across scenario tables and link diagrams
- test-case-review/TEMPLATES.md: §7 references review_supplementary_cases.json
- change-analysis/TEMPLATES.md: §0 + §5 cross-doc note references change_supplementary_cases.json
Reset versions from rolling patch bumps (0.0.18 → 0.0.19 over the
branch lifetime) down to production+1: marketplace 0.1.40 → 0.1.41,
test 0.0.8 → 0.0.9. Updated CHANGELOG.md, plugins/test/README.md
version history, and root README.md / README.en.md plugin tables.
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 6, 2026
@Jun-Hex Jun-Hex requested a review from wangbax May 6, 2026 02:57
…(v0.0.11)

- Bind testcase.schema.json on *_cases.json outputs in tcg/tcr/ca contract.yaml
- Add demand-driven vs change-driven distinction in tcg/ca SKILL descriptions + cross-reference SKIPs
- Mark review_result.json as non-strict JSON (verdict object, not TestCaseList)
- Sync testcase.schema.json from ai-case Pydantic: Step.expected now required (allows empty string)
- Relax CONVENTIONS.md test_method to optional (Pydantic was already Optional; doc was lagging)
- Move AGENT_TEMPLATE.md from agents/ to docs/ (avoid loader picking it up as agent)
- Bump test 0.0.9 -> 0.0.11, marketplace metadata 0.1.41 -> 0.1.43

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 7, 2026
…0.0.12)

- 5-A: Cross-reference known-collisions.yaml from PHASES.md & PIPELINES.md
       to clarify trace Phase 6 writeback vs ms-sync mode=execute share the
       same helper but are mutually exclusive entry points
- 5-C: qa-workflow accepts re_entry_phase + requirement_change_summary;
       passes through to test-case-generation for "rerun after requirement
       change" scenarios
- 8-A: README selection guide adds explicit demand-driven (tcg) vs
       change-driven (ca) rows to match D2-Q3 boundary decision
- 8-B: README architecture features section drops stale v0.0.10+ version
       gates (current is 0.0.12)
- 8-C: README directory tree moves AGENT_TEMPLATE.md from agents/ to docs/
       (matches the file move done in 0.0.10)
- Bump test 0.0.11 -> 0.0.12, marketplace metadata 0.1.43 -> 0.1.44

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 7, 2026
…clarify sample artifacts (v0.0.13)

- 7-C: Fix _shared/AGENT_PROTOCOL.md broken link to AGENT_TEMPLATE.md
       (was agents/AGENT_TEMPLATE.md, now ../../docs/AGENT_TEMPLATE.md);
       remove AGENT_TEMPLATE from agents/ tree diagram; add note that
       template lives outside agents/ to avoid loader pickup
- 7-B: Add quickstart line in requirement-clarification SKILL.md noting
       output/*.json files are pre-shipped format samples, not runtime
       artifacts (avoid confusion for first-time users)
- 7-A: Keep AI_CODING_BEST_PRACTICES.md (developer onboarding value); no change
- Bump test 0.0.12 -> 0.0.13, marketplace metadata 0.1.44 -> 0.1.45

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 7, 2026
… was wrong (v0.0.14)

Reverts the move from v0.0.10 (D1-Q2 decision). The original justification
was that AGENT_TEMPLATE.md would be picked up by Claude/Codex agent loader
and pollute the agent picker. Verified with claude-code-guide that this is
incorrect: Claude Code does NOT auto-scan plugins/<name>/agents/ for
subagents — they must be explicitly declared. Additionally, all 9 .md
files under plugins/test/agents/ have NO YAML frontmatter, so they would
be skipped by the loader even if scanning happened. They are prompt
templates loaded via Task tool calls inside skills, not subagents.

- mv plugins/test/docs/AGENT_TEMPLATE.md back to plugins/test/agents/
- Remove empty plugins/test/docs/ directory
- AGENT_PROTOCOL.md: tree restored, link reverted, add inline note
  explaining why loader is not a concern
- README.md: directory tree updated to reflect agents/ location and
  add inline note about loader behavior
- Bump test 0.0.13 -> 0.0.14, marketplace metadata 0.1.45 -> 0.1.46

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 7, 2026
Reset versions back to production+1 baseline (test 0.0.9 / marketplace
0.1.41) after consolidating 4 in-flight commits (836cb61/3564b01/bb3381b/
b3c4f1a) into a single release-prep entry. File contents from those
commits are preserved; only version markers + CHANGELOG + plugin README
are updated.

CHANGELOG 0.1.41 entry extended with "Skill audit phase 1-3 follow-ups"
section covering: schema bindings on *_cases.json, demand/change-driven
gating for tcg/ca, review_result.json non-strict JSON note, ai-case
Pydantic Step.expected required sync, CONVENTIONS.md test_method
optional, known-collisions cross-references, qa-workflow re_entry_phase
support, README selection guide updates, output sample annotations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 8, 2026
@Jun-Hex Jun-Hex removed the request for review from wangbax May 8, 2026 02:29
@Jun-Hex Jun-Hex changed the title chore(test): 飞书报告可读性优化 + 全量 skill 评审 + CI 基线加固 (v0.0.8 → 0.0.17) chore(test): Feishu report readability + skill audit + CI hardening May 8, 2026
@github-actions github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 8, 2026
@Jun-Hex Jun-Hex merged commit 44d5635 into main May 8, 2026
6 checks passed
@Jun-Hex Jun-Hex deleted the feat/rr-report-readability branch May 8, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:chore Maintenance or configuration change. maintainer-review High-risk change that should be reviewed by a repository maintainer. scope:plugins Changes one or more plugins. scope:tests Changes tests or validation coverage. size:xl Very large PR, expect slower review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants