chore(test): Feishu report readability + skill audit + CI hardening by Jun-Hex · Pull Request #59 · taptap/agents-plugins

Jun-Hex · 2026-04-29T09:32:14Z

This PR consolidates 3 blocks of work:

Block 1 — Feishu report readability (commits 3562807 → 32eb7f1, 6 commits)

Block 2 — Test plugin audit + CI baseline hardening (commits 4679a7f, fad14aa, 117cbcf, 3 commits)

Block 3 — 8-dim cross-skill review phase 1-3 follow-ups (commits 836cb61 → bf41bce, 5 commits)

Single version bump to test 0.0.9 / marketplace 0.1.41 (consolidated via /prepare-release). All checks green (79 PASS / 0 FAIL, validate.sh).

TL;DR

Block 1 inspects 4 real Feishu report samples produced by ai-case workflows and constrains output format: 3 new TEMPLATES.md created (test-case-review / change-analysis / requirement-traceability had none) + 1 rebuilt (requirement-review) + 1 schema violation fixed (review_mode blacklist). All changes verified via real Feishu docx import (zero emoji / ASCII / table residue).

Block 2 systematically audits the test plugin (16 skills / 10 agents / 5 contract schemas) and hardens CI from 5 → 9 check categories. Resolves 8 contract output collisions, removes 3 zombie agents, makes dependencies honest.

Block 3 runs an 8-dimension cross-skill review: structural compliance, scope/overlap, trigger description, contract closure, orchestration, executability, dead code, cross-cutting docs consistency. Surfaces and resolves contract drift (schema bindings, demand-driven vs change-driven gating, MS writeback dual-entry clarity) and syncs Step.expected from ai-case Pydantic into testcase.schema.json (now required, allows empty string).

Block 1 — Feishu Report Readability

Background

ai-case auto-uploads markdown reports to Feishu Docs after each AI workflow. Real-world inspection of 4 samples surfaced format issues hurting readability, and one triggered a schema violation:

Sample	Main issues
实体商品一期-需求评审-2026032500	Duplicate H1 / fragmented requirement-points list / `[OK]/[!]/[X]` ASCII markers / `QA AI 助手` self-praise signature
测试用例评审-分包上传(2026-03-04)	5 tables fragmented into rows / no overall readiness verdict / no per-case fix to-do grouping
进社区默认落tab实验-变更分析-2026042310	8 tables fragmented / inconsistent `★/★★/★★★` markers across workflows / no multi-doc cross-references / vague "约 N 个" coverage stats
插件升级提示-冒烟测试-2026040710	5 tables fragmented / decorative emoji section prefixes 📋🐛📊💡 auto-converted to ASCII / no overall confidence summary
ai-case session 248	`rr_summary.json` wrote illegal `review_mode: \"single_agent_serial\"`, triggering contract schema validation failure (logged in `output_data._schema_violations`)

Feishu docx import compatibility traps (discovered via real upload)

The first iteration (commit 3562807) introduced emoji + HTML <details> based on "looks better" assumptions — all failed in Feishu docx import testing. These 4 traps are the scientific basis for every constraint in this PR:

Markdown input	Feishu actual rendering	Conclusion
`<details><summary>...</summary>...</details>`	Summary becomes plain text + content auto-expanded	HTML collapse not supported
`✅ 已获取`	text: `[OK] 已获取`	✅ auto-converted to ASCII
`⚠️ 待定`	text: `[!] 待定`	⚠️ auto-converted to ASCII
`❌ 未获取`	bitable(17) embedded table block, entire row content lost	❌ triggers Feishu base table parsing
`📋 基本信息` (decorative section prefix)	text: `[Doc] 基本信息`	📋/🐛/📊/💡 all auto-converted to ASCII
markdown table `\| ... \| ... \|`	each cell becomes individual text/bullet block	Tables fragmented into prose

Fix direction: unify Chinese square-bracket markers [通过]/[已覆盖]/[实证] to replace all emoji + ASCII + star ratings; tables → H3 + bullets; collapse → full-content H4 grouping.

Changes (6 commits / 4 skills / 11 files)

commit	skill	Change
`3562807`	requirement-review	Rebuild `TEMPLATES.md` for `report.md` (10 readability optimizations)
`26d3b03`	requirement-review	Post-import fixes: 5 compatibility + 4 structural issues (TOC, section numbering, intent table, chat output)
`91ace7e`	test-case-review	NEW `TEMPLATES.md` (no template existed before) — 8-section strict template
`fb47249`	change-analysis	NEW `TEMPLATES.md` (multi-doc constraints for `code_change_analysis.md` + `test_coverage_report.md`) — bidirectional cross-references + per-line `[实证]/[推测]` confidence tagging
`e459975`	requirement-traceability	APPEND smoke-test mode `report.md` template (PHASES 5S.2 only constrained JSON output before, no markdown template)
`32eb7f1`	requirement-review	FIX `rr_summary.json` writing illegal `review_mode` values like `single_agent_serial` (was PR #58, merged in here)

Per-skill report structure:

Skill	Report	Sections	Key innovation
requirement-review	`report.md`	6	New §4 timebox checklist (pre-review / in-review / pre-dev / pre-test to-dos)
test-case-review	`review_summary.md`	8	New §0 case-readiness verdict + §6 fix to-dos grouped by case ID (bold ID + `·` prefix to preserve hierarchy after Feishu flattens nested bullets)
change-analysis	`code_change_analysis.md` + `test_coverage_report.md`	6 + 5	Multi-doc bidirectional refs + per-line `[实证]/[推测]` tagging + §5 confidence index
requirement-traceability	smoke-test `report.md`	6	§0 overall confidence summary + §2 numerator/denominator titles `共 N/总数` + §3.2 explicit untraced changes

Unified conventions (11 rules across 4 workflows)

PR consolidates per-skill report conventions into unified rules:

No H1 title — Feishu doc name already includes report type, avoid duplication
No markdown tables — Feishu fragments tables into discrete text blocks that read like prose
No emoji ✅/⚠️/❌/⭕ — Feishu auto-converts to ASCII or destroys content
No ASCII [OK]/[!]/[X] — Semantic drift across workflows (rr means "conditional", ca_cov means "low risk but hard to reproduce")
No star ratings ★/★★/★★★ — Inconsistent across workflows
No decorative emoji section prefixes 📋/🐛/📊/💡 — Feishu auto-converts to ASCII
No HTML <details><summary> collapse — Feishu doesn't recognize, summary becomes misleading text
Unified Chinese square-bracket markers (by scenario):
- Readiness/Pass: [通过]/[有条件]/[不通过]
- Coverage: [已覆盖]/[部分覆盖]/[未覆盖]/[范围外]
- Risk: [高]/[中]/[低]
- Confidence: [实证]/[推测]
- Priority: [P0]/[P1]/[P2]
Trailing metadata must include "分析方式：AI 辅助 + ... 决策" — Replaces the old QA AI 助手自动生成 self-signature (which weakened enterprise trust)
Code blocks must have prefix quote > 以下代码块仅 Dev 排查时阅读，PM/QA 可跳过 — Balances PM/QA vs Dev viewpoints
Multi-doc must use bidirectional cross-references — Cross-section issues use numbered references, not restatement

Verification (real Feishu docx upload)

Used ai-case create_feishu_doc.py to upload 4 mock reports to Feishu docx, then verified rendering via docx_v1_documents/{id}/blocks API:

Report type	Block count	Bitable triggered
requirement-review (E2E)	58	0
test-case-review	88	0
test-case-review §6 (case grouping)	11	0 (verified bold ID + `·` prefix preserves hierarchy)
change-analysis	35	0
smoke-test	53	0

All Feishu doc output elements (H2/H3/H4 hierarchy, bullets, code blocks, quote blocks, bold markers, Chinese square brackets) preserved verbatim.

Impact

Templates and prompt docs only — no skill behavior or contract changes:
- rr_summary.json schema unchanged (commit 32eb7f1 only constrains AI output, schema enum was always strict)
- defect_list.json / smoke_test_report.json schemas unchanged
Next workflow run picks up new templates automatically — no ai-case platform changes needed
Existing Feishu reports unaffected

Block 2 — Test Plugin Audit + CI Baseline Hardening

TL;DR

Systematic audit of test plugin (16 skills / 10 agents / 5 contract schemas / CONVENTIONS / 3 root docs). CI baseline expanded 5 → 9 check categories. Caught 8 real contract collisions (resolved via 4 renames + 4 whitelist entries), removed 3 zombie agents, made dependencies honest.

CI baseline hardening (validate.sh 5 → 9 check categories)

Check	Scope	Content
10	test plugin	SKILL.md frontmatter `name` ↔ directory name match
11	test plugin	handoffs `[].skill` targets exist
12	test plugin	`subagent_type=...` referenced agent files exist
13	test plugin	`references/*.md` paths resolve (incl. Chinese filenames)
14	test plugin	`validate_contracts.py` cross-skill consistency + whitelist (`contracts/known-collisions.yaml`)

Plus: extended tests/check-schemas.sh from 1 → 5 schemas (testcase / ca-summary / defect-list / rr-summary / smoke-test-report) with ~30 negative cases. Fixed mktemp failure silently passing under sandbox mode.

Contract output collision resolution (8 real collisions → 4 renames + 4 whitelist)

Original filename	Resolution
`risk_assessment.json`	change-analysis renamed to `bug_risk_assessment.json` (semantic distinction from traceability)
`test_execution_report.json`	Split to `unit_test_execution_report.json` / `integration_test_execution_report.json`
`supplementary_cases.json` (3-way collision)	change-analysis → `change_supplementary_cases.json`; test-case-review → `review_supplementary_cases.json`; test-case-generation keeps canonical (internal scratch)
`ui_fidelity_report.json`	Whitelisted (primary+fallback: traceability PHASES 3.4 only self-produces if work dir lacks ui-fidelity-check output)
`forward_verification.enriched.json` + 3 writeback files	Whitelisted (same `metersphere_helper.py writeback-from-fv` helper, runtime-mutually-exclusive)
`requirement_points.md`	Whitelisted (TCG/TCR understand-phase fallback, schema-identical; also fills TCG contract's previously-missing output declaration)

Added plugins/test/contracts/known-collisions.yaml as whitelist source of truth, with self-binding "review protocol" rule (rejects semantically different same-name entries).

Zombie agent cleanup (3 deletes + 13 dead references)

Deleted file	Justification
`agents/requirement-traceability/forward-tracer.md`	Self-marked DEPRECATED; prompt inlined into PHASES 3.2.4
`agents/requirement-traceability/reverse-tracer.md`	PHASES 3.3 explicitly says "main agent inlines sequentially, no Task call"; zero real invocations
`agents/failure-classifier.md`	test-failure-analyzer/{SKILL,PHASES} zero references; pure "reserved" doc

Cleaned 13 dead references across 7 files (SKILL/PHASES/TEMPLATES/TRACEABILITY_PROTOCOL/AGENT_PROTOCOL/README).

Dependencies honesty

File	Change
`shared-tools/requirements.txt`	Was "No third-party packages required"; corrected to list 5 real deps (pycryptodome/pyyaml required + jsonschema/python-dotenv/aiohttp optional)
`feedback/requirements.txt`	NEW, declares aiohttp (required by feishu_api.py)
`feedback/contract.yaml`	env_vars added `FEISHU_PLUGIN_ID/SECRET/USER_KEY` (was `[]` despite scripts needing them)

Helper script UX consistency

5 shared scripts now support standard -h/--help (previously --help was sent as a network query in search_mrs.py):

search_mrs.py / search_prs.py: added USAGE constant + --help interception
gitlab_helper.py / github_helper.py / metersphere_helper.py: "unknown subcommand" path now if argv[1] in (\"-h\", \"--help\"): print(usage); exit 0

Documentation polish

AI_CODING_BEST_PRACTICES.md / PIPELINES.md / CONTRACT_SPEC.md get audience notes ("developer reference, not skill runtime dependency")
README directory structure adds role notes for each root doc + contracts/ directory
CONVENTIONS L466 ID-prefix table fact correction: RP- isn't only TCG review; both TCG/TCR use it as understand-phase fallback
Fixed 2 broken anchor links: CONVENTIONS.md#审计输出格式 → _shared/ASSERTION_AUDIT.md#审计输出格式
AGENT_TEMPLATE: added sub-agent section-name exception clause (allows case-tracer to use ## 执行流程/输出/置信度评分 instead of standard names)
WORKFLOW_DEFS qa-lite writeback behavior corrected: from "will fail" → "graceful skip" (matches PHASES 6.1.b actual behavior)
integration-test-design: 5-phase content split from SKILL into standalone PHASES.md (matches unit-test-design style; SKILL trimmed 387 → 252 lines)
feedback/SKILL.md adds Quick Start + Closing Checklist
metersphere-sync/SKILL.md adds list-stages command docs + stdout shape
change-analysis/SKILL.md model-tier table adds codex-change-analyzer
Removed metersphere-sync deprecated confidence_threshold field (helper hadn't honored it; only contract.yaml placeholder)

Block 3 — 8-Dimension Cross-Skill Review Phase 1-3 Follow-ups

TL;DR

Multi-dimensional review of all 15 skills, 7 agents (under agents/), and 5 contracts. 23 decision points across 8 dimensions resolved. 7 actionable fixes implemented; 3 "keep current state" decisions noted as known limitations.

Coverage (8 dimensions)

#	Dimension	Verdict	Real findings
1	Structure compliance (plugin manifests / SKILL frontmatter)	🟢	None
2	Scope boundaries / overlap (skill ↔ skill)	🟡	4 overlap pairs; 1 fixed (tcg/ca driver gating), 3 deferred to LLM routing
3	Trigger description quality	🟡	trace description called out but kept (per D2-Q4 tied)
4	Contract closure (schema ↔ skill IO ↔ consumers)	🟡 → 🟢	6 sub-issues resolved (see changes below)
5	Orchestration correctness (qa-workflow / PIPELINES)	🟡	MS writeback dual-entry clarified via cross-ref to known-collisions
6	Executability (scripts / external deps / agent refs)	🟡	Narrative-only agents accepted; fallback docs deferred
7	Redundancy / dead code	🟢	3 minor findings, none requiring deletion
8	Cross-cutting docs consistency (README / CONVENTIONS / PIPELINES / CONTRACT_SPEC / BEST_PRACTICES)	🟢	3 minor drift fixes applied

Changes implemented

Schema bindings: 3 contract.yaml files (tcg/tcr/ca) explicitly bind testcase.schema.json on 5 *_cases.json outputs (was implicit via CONVENTIONS.md inheritance; now machine-checkable)
Demand-driven vs change-driven gating: tcg SKILL description marked 需求驱动 for supplementary cases; ca SKILL description marked 变更驱动 + reverse SKIP cross-references between ca and trace
review_result.json non-strict JSON: test-case-review/SKILL.md notes it's a verdict object (not TestCaseList), deliberately not schema-bound
Step.expected now required: synced from ai-case Pydantic source (companion ai-case MR !40); allows empty string but field must exist. MCP tool input_schema auto-reflects via TypeAdapter at LLM tool-call time
test_method relaxed to optional: CONVENTIONS.md updated to match Pydantic (which was already Optional; doc was lagging) — supplementary case producers may omit
MS writeback dual-entry clarified: requirement-traceability PHASES.md Phase 6 + PIPELINES.md cross-reference known-collisions.yaml explaining trace Phase 6 and metersphere-sync mode=execute share the metersphere_helper.py writeback-from-fv helper as auto vs manual entry points (not duplicate implementations)
qa-workflow re-entry support: accepts re_entry_phase + requirement_change_summary optional inputs, passthrough to test-case-generation for "rerun after requirement change" scenarios
README selection guide: surfaces demand-driven (tcg) vs change-driven (ca) split; drops stale "v0.0.10+" version gates
Sample annotation: requirement-clarification SKILL Quick Start clarifies output/*.json are pre-shipped format samples, not runtime artifacts
Agent loader docs: README + _shared/AGENT_PROTOCOL.md add inline note explaining plugins/test/agents/ markdown files lack YAML frontmatter and are loaded explicitly via Task tool calls inside skills, not auto-registered as Claude/Codex subagents

Known limitations (deferred decisions)

5 narrative agents not declared in dependencies.agents (kept as-is, per D6-A)
feedback / tcg / tcr / qa-workflow lack explicit fallback documentation when external deps unavailable (per D6-B; relies on fail-loud + error-message guidance)
requirement-clarification ↔ requirement-review overlap not gated; LLM routes (per D2-Q1)
tcg internal redundancy-pair review vs standalone test-case-review boundary not enforced (per D2-Q2)

Companion change in ai-case

ai-case MR !40 (fix/skill-loading-doc) commits case_schema.py Step.expected required. Merge order recommendation: this PR first → ai-case MR !40 second → submodule pointer update via small follow-up MR.

Commit map

Block	Commits	Versions touched
1 (Feishu readability)	`3562807` `26d3b03` `91ace7e` `fb47249` `e459975` `32eb7f1`	(in-flight)
2 (audit + CI)	`4679a7f` `fad14aa` `117cbcf`	(in-flight)
1+2 prep-release	`3a57da6`	test 0.0.9 / mp 0.1.41
3 (8-dim follow-ups)	`836cb61` `3564b01` `bb3381b` `b3c4f1a`	(in-flight bumps to 0.0.10–0.0.14)
3 prep-release	`bf41bce`	reset to test 0.0.9 / mp 0.1.41

Test plan

bash tests/validate.sh PASS (79 PASS / 0 FAIL)
All 5 contract schemas validate + ~30 negative cases rejected
Real Feishu docx upload of 4 mock reports — 0 emoji / 0 ASCII / 0 table residue
testcase.schema.json synced from ai-case Pydantic via docker exec; verified Step.required = ['action', 'expected']
CI: Analyze (python) / label x2 / validate / CodeQL — all SUCCESS
Post-merge: companion ai-case MR !40 merged + submodule pointer follow-up MR

🤖 Generated with Claude Code

基于 ai-case 实测样本（实体商品一期-需求评审-2026032500）逐项排查发现 report.md 模板存在 10 处影响用户阅读体验的问题，本次按飞书云文档场景集中重构 TEMPLATES.md report.md 模板与 PHASES.md 5.1 章节描述。 10 处具体修复： 1. 删除 H1 重复标题 — 飞书 doc 名已包含「需求评审」字样，AI 再写 `# {需求名称} - 需求评审报告` 会出现两行几乎一样的标题 2. 功能点清单格式破碎 — 原模板用 `F1 | 模块 | 描述` 半列表+分组糅合，渲染时分组与第 1 个功能点合并成一行；改为 `### {模块}` H3 分组 + `- F1: 描述` 列表 3. （略）评审统计加表头 — review_checklist.md 模板已含表头，本次只在报告侧明确不再嵌入裸数字段 4. ASCII 状态符号 [OK]/[\!]/[X] 全面替换为 emoji ✅/⚠️/❌；新增"关键约束"明确禁止 ASCII 兜底 5. 删除 §7「评审完整性声明」整节 — 90% 信息与 §1 元数据 / §3 章节末尾计数重复，"阻断项 X% 在合理范围内" 类 AI 自评对读者无价值 6. 功能点清单 46 条过长 — 强制用 `<details><summary>` 包裹默认折叠，飞书云文档导入后会渲染为可展开 block 7. 维度内部编码 FB2/LC2 + UX3 等 — 全部去掉，只保留人类可读的「功能边界」「逻辑一致性」维度名 8. 新增 §4「时间盒清单」— 按"评审会前/会上/开发前/提测前"重新组织 §3 + §5 的 to-do，引用编号 + 一句话动作，不重复问题原文 9. 元数据加 `分析方式：AI 辅助 + 评审会决策`，明确强调结论由人在评审会最终决策；不写「评审人：QA AI 助手」之类的 AI 署名（避免企业用户质疑结论可信度） 10. 章节序号重排：1 评审结论 / 2 需求理解 / 3 各职能问题 / 4 时间盒清单 / 5 风险项 / 6 评审范围声明（旧版本 1/2/3/4/5/7 跳号）实测下一份需求评审 session 将自动按新模板生成，预期： - 报告长度从 8.2KB 降到约 3-4KB（折叠功能点清单 + 删 §7） - 关键 to-do 在 §4 一目了然，不需要在 §3 里翻 PM/Dev/QA 各 section 拼凑 - 飞书 doc 标题与正文不再重复 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

PR #59 第一版本基于"看起来更好"的假设引入了 emoji ✅/⚠️/❌ 与 HTML <details><summary> 折叠 block。今天上传一份 mock report 到飞书 docx 实测结果（doc_blocks API 验证）： 1. ❌ 触发飞书的 bitable block 解析逻辑，整行内容被吃掉 2. ✅ 自动转换为 [OK]、⚠️ 自动转换为 [\!] 3. <details>/<summary> HTML 标签完全被忽略，summary 内容反而成为误导性"点击展开"普通文本（用户实际无法点击）立即修正 5 处： 1. 状态符号统一为中文方括号标记（飞书 import 全部原样保留）： - 就绪度: [通过] / [有条件] / [不通过]（替代 ✅/⚠️/❌） - 信息边界: [已获取] / [未获取] / [不完整]（替代 ✅/❌/⚠️） - 中间产物 review_checklist.md 仍用 [已确认]/[需关注]/[待确认]/ [不适用] 4 类，TEMPLATES 顶部新增"关于符号约定"段明确不要混用 2. 移除 <details> 折叠约束，功能点清单全量列出，按模块用 H4 `#### {模块名}` 分组（H4 视觉权重比 H3 低，不打乱大纲） 3. 关键约束新增 emoji 与 details 的禁用说明，附实测原因同步本次审查发现的另 4 处结构问题（A1/A2/A4/F2）： 4. TEMPLATES.md 目录写"7 章结构定义"但实际 6 章，已纠正 5. review_checklist.md 模板章节序号 ## 三、出现两次（标准 QA Checklist + 文档-设计稿交叉比对），重排为三/四/五 6. SKILL.md 意图识别表新增 smoke-test / test-case-generation 的引导，避免用户说"做冒烟测试"被错配 7. SKILL.md output 阶段 chat 输出措辞与 PHASES.md 5.2 对齐，明确 chat 给指标摘要 + 飞书链接，详细模板见 PHASES.md 8. TEMPLATES.md 三、标准 QA Checklist 表头明确引用 CHECKLIST.md 的具体检查项，AI 不需要重新发明实测验证： - 用最新模板生成 mock report 上传飞书，58 个 block 全部正确渲染 - 0 emoji 残留、0 ASCII [OK] 残留、0 HTML 标签残留 - H2/H3/H4 三级标题层次正常，bullet list、§4 时间盒结构完整 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

test-case-review skill 之前没有 TEMPLATES.md，summary 阶段产出的 review_summary.md 完全靠 AI 自由发挥，导致实测样本（分包上传 DC 开放入口-用例评审-2026-03-04）出现以下问题： 1. §1 评审总览、§2.1 覆盖映射、§3.2 缺失场景、§4.1 / §7 全部用 markdown 表格，飞书 import 后被拆成离散 text/bullet block，看起来是散文不是表格（实测 5 处 ≈ 31 行散行） 2. emoji ✅/⚠️ 大量使用，飞书 import 会自动转 [OK]/[\!]，且 ❌ 还会触发 bitable block 解析（详见 PR #59 D1 验证结论） 3. 缺"用例就绪度判定"——读者读完不知道这套用例算不算通过评审 4. §4 / §5 / §6 严重重叠：§6 改进建议高/中/低 15 条 ≈ 1:1 复述 §4 (6 条) + §5 (9 条) 5. 同一用例（如 b3769bb8）在 §3 / §4 / §5 多处被点名，修复者要来回翻才能拼出"这个用例到底要改什么" 6. "QA AI 助手自动生成"署名削弱企业用户对结论的可信度新增 TEMPLATES.md 强约束 8 节结构（0-7）： - §0 评审结论：用例就绪度判定 [通过]/[有条件]/[不通过] + 待修复分级 - §1 评审总览：核心指标 + 覆盖率目标线 ≥ 90% - §2 需求覆盖率：已覆盖/部分覆盖/未覆盖三组分别列出（不用表格） - §3 场景完整性：端到端不闭环 + 缺失边界场景按需求点分组 - §4 用例正确性 / §5 用例规范性 - §6 修复 to-do（按用例 ID 分组）：核心可执行价值章节，跨章节问题汇总到一起，用例 ID 行用粗体 + 待办行用 `· ` 前缀保留层次 - §7 建议补充的用例关键约束（"关键约束"小节）： - 禁止 markdown 表格（飞书 import 散行） - 禁止 emoji ✅/⚠️/❌（飞书 import 会破坏内容） - 禁止 §6 复述 §3/§4/§5 详情（用章节编号引用即可） - 禁止 AI 助手署名（用「分析方式：AI 辅助 + 评审人决策」） - 不写 H1（飞书 doc 名已包含「{需求名称}-用例评审-{YYYYMMDDHH}」） - §6 to-do 用「粗体用例 ID + `· ` 前缀」保层次（飞书会拍平嵌套 bullet 但 bold + `·` 字符可视觉区分）同步更新： - PHASES.md 5.4 引用新 TEMPLATES，列出 8 节结构 - SKILL.md 中间文件清单标注 review_summary.md 是飞书最终产物实测验证（mock 报告 88 个 block）： - 0 emoji 残留 / 0 ASCII 残留 / 0 表格触发 / 0 bitable 触发 - §6 用例 ID 行带 [BOLD] 标记，待办行无粗体 + `·` 前缀 - 所有中文方括号标记（[有条件]/[已覆盖]/[P0]/[完整性] 等）原样保留 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

change-analysis skill 之前没有 TEMPLATES.md，code_change_analysis.md 和 test_coverage_report.md 完全靠 AI 自由发挥，导致实测样本（进社区默认落 tab 实验-变更分析-2026042310）出现以下问题： 1. 8 张 markdown 表格全部触发飞书 import 散行（变更分组、API 模型、论坛模块、4.1 直接影响、4.2 间接影响、置信度、PR Test plan、覆盖率汇总）— 是 4 份报告中表格密度最高的 2. ASCII [OK]/[X]/[\!] 大量使用（飞书 import 的自动转换结果） 3. ★/★★/★★★ 风险标记与跨 workflow 的 [阻断]/[高]/[中]/[低] 不统一 4. 缺"分析就绪度"全局判定 — 读者读完不知道这次变更算低/中/高风险 5. 第五节"置信度说明"放最后 — 读完 9KB 才发现某些结论是推测 6. multi-doc 间无交叉引用 — code_change_analysis §4.4 回归范围、 test_coverage_report §2 缺口、ca_sup TC 用例三处信息互相孤立 7. 覆盖率统计含糊 — "总关键变更点约 13 个 → 约 54%" 没声明依据 8. 代码 diff 块极长（Swift 源码）夹在自然语言报告中，PM/QA 视角下显得冗余但 Dev 又需要新增 TEMPLATES.md 强约束 2 份报告：【code_change_analysis.md】6 节 - §0 变更评估：变更风险 [低]/[中]/[高] + 实证 vs 推测占比 - §1 变更分组总览（H3 + bullet，不用表格） - §2 核心变更深度（每条结论行尾标 [实证]/[推测]） * 代码 diff 块前必须加 quote 提示「以下仅 Dev 可跳过」 - §3 调用链分析（代码块保留 ASCII 树） - §4 影响面评估（4.1/4.2 用 bullet；4.4 回归范围必须 → 详见 [测试覆盖评估] 缺口 N 跨文档引用） - §5 置信度索引（实证 vs 推测占比 + 章节列表，作索引不重复结论）【test_coverage_report.md】5 节 - §0 覆盖率结论：测试覆盖 [已达标]/[有缺口]/[严重不足] + 精确总变更点数（禁止"约 N 个"）+ 覆盖率目标线 ≥ 80% - §1 PR Test Plan 覆盖（bullet 列表） - §2 测试覆盖缺口分析（每缺口 → 详见 [代码变更分析] §X 跨引 + → ca_sup TC-XX 反向引用） - §3 覆盖率汇总（已覆盖/未覆盖/待评估三组列表，末尾给精确公式） - §4 高优先级补充测试建议（双向引用缺口 + TC-XX）关键约束（"关键约束"小节）： - 禁止 markdown 表格（飞书 import 散行） - 禁止 emoji ✅/⚠️/❌（飞书 import 破坏内容） - 禁止 ASCII [OK]/[\!]/[X]（跨 workflow 语义漂移） - 禁止 ★/★★/★★★（与跨 workflow 分级符号不统一） - 代码 diff 块前必须加 quote 提示 - multi-doc 必须双向引用 - 置信度逐条标注 [实证]/[推测] - 覆盖率统计必须精确，禁止"约 N 个" 同步更新： - PHASES.md 阶段 4 / 阶段 5 引用新 TEMPLATES，列出章节结构 - SKILL.md Story 场景 7 阶段表加 TEMPLATES 链接 + 飞书 multi-doc 说明实测验证（mock report 35 个 block）： - 0 emoji 残留 / 0 ASCII 残留 / 0 表格触发 / 0 star 残留 - code block + quote block 正常渲染 - bold 标记保留（§0 元数据、变更点标题、§5 占比） - 中文方括号 [高]/[中]/[实证]/[推测] 全部原样保留 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

冒烟测试 workflow 复用 requirement-traceability skill 的 smoke-test 模式，但之前 PHASES.md 只约束了 smoke_test_report.json 结构化输出，没约束最终上传到飞书云文档的 report.md 人类可读报告格式。实测样本（插件升级提示-冒烟测试报告-2026040710）暴露的问题： 1. 5 张 markdown 表格全部触发飞书 import 散行（基本信息 + 指标 + P0 用例评估 + 需求覆盖矩阵 + 代码变更追溯）— 与 ca 并列最多 2. 装饰性 emoji 章节前缀 📋/🐛/📊/💡 被飞书自动转为 [Doc]/[Bug]/ [Chart]/[Tip] ASCII 形式 3. ⭕ 范围外 emoji 也被破坏 4. ASCII [OK]/[\!] 大量使用 5. 冒烟结论位置不在第一行（必须先看完基本信息表格） 6. 缺陷代码块过长（Kotlin 源码）无 Dev 跳过提示 7. 置信度散落各需求点（87%/75%/85%）但无整体均值汇总 8. 缺陷与用例的关联线弱（DEF-01 关联用例不在 P0 评估 6 条中） 9. "未追溯变更：0 个（无范围蔓延）" 含糊，PM 不懂"范围蔓延" 10. "本报告由 QA AI 助手...自动生成" 署名削弱可信度新增 TEMPLATES.md「report.md（smoke-test 模式专用）」节，6 节强约束： - §0 冒烟测试结论（首行 `判定：[通过]/[不通过]` + 整体置信度均值） - §1 核心指标（bullet 列表，不用表格） - §2 P0 用例评估（标题必须 `共 N/总数` 明确分子分母） * 用粗体用例 ID + bullet `· ` 前缀（参照 test-case-review §6 模式） - §3 双通道追溯结论（§3.1 需求覆盖矩阵 + §3.2 代码变更追溯） * §3.2 末尾未追溯变更必须明确表述（禁止"无范围蔓延"含糊文案） - §4 缺陷清单（每个 DEF 的代码块前必须加 quote 提示「以下代码块仅 Dev 排查时阅读」） - §5 其他观察（粗体短词分类）关键约束（"关键约束"小节）： - 禁止 markdown 表格（飞书散行） - 禁止 emoji ⭕/✅/⚠️/❌（⭕ 转乱码、❌ 触发 bitable） - 禁止 ASCII [OK]/[\!]/[X]（跨 workflow 语义漂移） - 禁止装饰性 emoji 章节前缀 📋/🐛/📊/💡（飞书自动转 ASCII） - 禁止「QA AI 助手」署名 - 代码块前必须加 quote 提示 - §2 标题必须明确分子分母（共 N/总数） - §3.2 未追溯变更必须明确表述 - 置信度逐条标注且需汇总到 §0 同步 PHASES.md 5S.2：在 defect_list.json + smoke_test_report.json 两个产物之间新增第 3 步「写入 report.md」，引用 TEMPLATES 中 6 节结构定义。实测验证（mock report 53 个 block）： - 0 emoji 残留 / 0 ASCII 残留 / 0 表格触发 / 0 装饰性章节前缀 - bold 标记保留（§0 元数据/缺陷字段名/§5 分类） - 中文方括号 [通过]/[待定]/[已覆盖]/[范围外]/[P0]/[推测] 全部原样 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

PHASES.md 5.2 chat 输出文案 '单 Agent 串行' 与 schema enum ['multi_perspective', 'single_agent'] 不一致，AI 在写 rr_summary.json 时直译 '串行' → 'serial' 拼出 single_agent_serial，触发 contracts/rr-summary.schema.json 校验违规（非阻断但污染 output_data._schema_violations）。修复： 1. 5.2 文案从 '单 Agent 串行' 改为 '单 Agent'（与 enum 对齐） 2. 5.1.5 约束清单补一条 review_mode 枚举说明 + 黑名单 single_agent_serial/serial 变体，与 verdict 同样的强制写法实测案例：ai-case session 248 完成时触发该违规，verdict + 49 issues 均正常，仅 review_mode 字段被污染。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…isions resolved A systematic review covering all 16 skills, 10 agents, 5 contract schemas, and shared infrastructure (CONVENTIONS.md, CONTRACT_SPEC.md, root docs). Lifts version 0.0.8 → 0.0.17. Highlights: CI baseline (tests/validate.sh): 5 → 9 check categories. - Add Check 10/11/12/13: SKILL frontmatter name vs dir, handoffs targets, subagent references, references/* paths (test plugin scope only) - Add Check 14: cross-skill contract.yaml consistency via validate_contracts.py + known-collisions.yaml allowlist (primary+fallback / shared-helper patterns) - Extend tests/check-schemas.sh: 1 → 5 schemas (testcase / ca-summary / defect-list / rr-summary / smoke-test-report) with ~30 negative samples - Fix sandbox false-green: new_tmpdir hard-fails on mktemp denial Zombie agent cleanup: 3 deleted, ~13 dead refs cleaned. - agents/requirement-traceability/forward-tracer.md (DEPRECATED, prompt inlined to PHASES 3.2.4) - agents/requirement-traceability/reverse-tracer.md (already 'main agent inline' per PHASES 3.3, zero Task calls) - agents/failure-classifier.md (zero refs in test-failure-analyzer/{SKILL,PHASES}) Contract collision resolution (8 detected → 4 renamed + 4 allowlisted): - risk_assessment.json: change-analysis renamed to bug_risk_assessment.json - test_execution_report.json: split into unit_test_execution_report.json / integration_test_execution_report.json - supplementary_cases.json (3-way): change-analysis → change_supplementary_cases.json, test-case-review → review_supplementary_cases.json, test-case-generation keeps canonical (internal scratch) - ui_fidelity_report.json + 4 writeback files + requirement_points.md: allowlisted as primary+fallback or shared-helper outputs Dependency honesty: - shared-tools/requirements.txt: corrected from 'stdlib only' to list pycryptodome / pyyaml (required) + jsonschema / python-dotenv (optional) - feedback/requirements.txt: new, declares aiohttp (required by feishu_api.py) - feedback/contract.yaml: env_vars now includes FEISHU_PLUGIN_ID/SECRET/USER_KEY Helper UX consistency: 5 scripts now support -h/--help cleanly (search_mrs/prs, gitlab/github/metersphere helper). Documentation: - 3 root-level docs (AI_CODING_BEST_PRACTICES / PIPELINES / CONTRACT_SPEC) tagged with audience notes (developer references, not runtime deps) - README目录结构: each root doc annotated with role - CONVENTIONS L466 numbering prefix table fixed: RP- belongs to TCG/TCR understand (not 'TCG review') - 2 broken anchors fixed: CONVENTIONS.md#审计输出格式 → _shared/ASSERTION_AUDIT.md - AGENT_TEMPLATE: section-name exception extended for sub-agents (case-tracer) - WORKFLOW_DEFS qa-lite: writeback behavior corrected from 'will fail' → 'gracefully skip' (matches PHASES 6.1.b actual behavior) - integration-test-design: 5 phases extracted to PHASES.md (matches unit-test-design pattern) - feedback SKILL: added Quick Start + Closing Checklist - metersphere-sync: documented list-stages command + stdout shape - change-analysis SKILL: added codex-change-analyzer to model layering table - metersphere-sync deprecated confidence_threshold field removed (helper已ignored since v0.0.16; safe to drop) CI: 79 PASS / 0 FAIL, EXIT=0.

…reakdown 正反例 (v0.0.18) ca-summary.schema.json: - 删 scenario 字段（无 backend / 无下游 skill 消费） - risk_count description 加防混淆说明（指明 integer 总数，分桶用 risk_breakdown） - risk_breakdown description 加示例 rr-summary.schema.json: - 删 role_breakdown 字段（前端不展示，下游也未读） - 删 review_mode 字段（无消费方） change-analysis/PHASES.md: - ca_summary example 调整字段顺序：risk_breakdown 前 risk_count 后，避免 AI 边读边写时把 dict 错填到 risk_count（304 实证 confusion） - 加正反例对比块，明确"分桶 dict 应放 risk_breakdown 而非 risk_count" - 删 scenario 相关约束行 requirement-review/PHASES.md: - rr_summary example 删 role_breakdown / review_mode - 约束段同步删除 contract.yaml: - ca/rr description 同步去掉已删字段 tests/check-schemas.sh: - 同步移除 scenario / role_breakdown / review_mode 的 valid sample 与负样本 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to v0.0.10 contract renames (risk_assessment.json, supplementary_cases.json, test_execution_report.json) — propagate the new names through PIPELINES.md, README.md, and two TEMPLATES.md files that still pointed at the old canonical names. Also fix a broken anchor in requirement-traceability/PHASES.md and a self-contradicting description in change-analysis/contract.yaml. - requirement-traceability/PHASES.md: anchor #coverage_reportjson → #traceability_coverage_reportjson - change-analysis/contract.yaml: description aligned with outputs (change_supplementary_cases.json) - PIPELINES.md: 8 stale refs updated; work_dir layout split risk/supplementary into per-skill entries - README.md: 7 stale refs updated across scenario tables and link diagrams - test-case-review/TEMPLATES.md: §7 references review_supplementary_cases.json - change-analysis/TEMPLATES.md: §0 + §5 cross-doc note references change_supplementary_cases.json

Reset versions from rolling patch bumps (0.0.18 → 0.0.19 over the branch lifetime) down to production+1: marketplace 0.1.40 → 0.1.41, test 0.0.8 → 0.0.9. Updated CHANGELOG.md, plugins/test/README.md version history, and root README.md / README.en.md plugin tables.

…(v0.0.11) - Bind testcase.schema.json on *_cases.json outputs in tcg/tcr/ca contract.yaml - Add demand-driven vs change-driven distinction in tcg/ca SKILL descriptions + cross-reference SKIPs - Mark review_result.json as non-strict JSON (verdict object, not TestCaseList) - Sync testcase.schema.json from ai-case Pydantic: Step.expected now required (allows empty string) - Relax CONVENTIONS.md test_method to optional (Pydantic was already Optional; doc was lagging) - Move AGENT_TEMPLATE.md from agents/ to docs/ (avoid loader picking it up as agent) - Bump test 0.0.9 -> 0.0.11, marketplace metadata 0.1.41 -> 0.1.43 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…0.0.12) - 5-A: Cross-reference known-collisions.yaml from PHASES.md & PIPELINES.md to clarify trace Phase 6 writeback vs ms-sync mode=execute share the same helper but are mutually exclusive entry points - 5-C: qa-workflow accepts re_entry_phase + requirement_change_summary; passes through to test-case-generation for "rerun after requirement change" scenarios - 8-A: README selection guide adds explicit demand-driven (tcg) vs change-driven (ca) rows to match D2-Q3 boundary decision - 8-B: README architecture features section drops stale v0.0.10+ version gates (current is 0.0.12) - 8-C: README directory tree moves AGENT_TEMPLATE.md from agents/ to docs/ (matches the file move done in 0.0.10) - Bump test 0.0.11 -> 0.0.12, marketplace metadata 0.1.43 -> 0.1.44 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…clarify sample artifacts (v0.0.13) - 7-C: Fix _shared/AGENT_PROTOCOL.md broken link to AGENT_TEMPLATE.md (was agents/AGENT_TEMPLATE.md, now ../../docs/AGENT_TEMPLATE.md); remove AGENT_TEMPLATE from agents/ tree diagram; add note that template lives outside agents/ to avoid loader pickup - 7-B: Add quickstart line in requirement-clarification SKILL.md noting output/*.json files are pre-shipped format samples, not runtime artifacts (avoid confusion for first-time users) - 7-A: Keep AI_CODING_BEST_PRACTICES.md (developer onboarding value); no change - Bump test 0.0.12 -> 0.0.13, marketplace metadata 0.1.44 -> 0.1.45 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… was wrong (v0.0.14) Reverts the move from v0.0.10 (D1-Q2 decision). The original justification was that AGENT_TEMPLATE.md would be picked up by Claude/Codex agent loader and pollute the agent picker. Verified with claude-code-guide that this is incorrect: Claude Code does NOT auto-scan plugins/<name>/agents/ for subagents — they must be explicitly declared. Additionally, all 9 .md files under plugins/test/agents/ have NO YAML frontmatter, so they would be skipped by the loader even if scanning happened. They are prompt templates loaded via Task tool calls inside skills, not subagents. - mv plugins/test/docs/AGENT_TEMPLATE.md back to plugins/test/agents/ - Remove empty plugins/test/docs/ directory - AGENT_PROTOCOL.md: tree restored, link reverted, add inline note explaining why loader is not a concern - README.md: directory tree updated to reflect agents/ location and add inline note about loader behavior - Bump test 0.0.13 -> 0.0.14, marketplace metadata 0.1.45 -> 0.1.46 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reset versions back to production+1 baseline (test 0.0.9 / marketplace 0.1.41) after consolidating 4 in-flight commits (836cb61/3564b01/bb3381b/ b3c4f1a) into a single release-prep entry. File contents from those commits are preserved; only version markers + CHANGELOG + plugin README are updated. CHANGELOG 0.1.41 entry extended with "Skill audit phase 1-3 follow-ups" section covering: schema bindings on *_cases.json, demand/change-driven gating for tcg/ca, review_result.json non-strict JSON note, ai-case Pydantic Step.expected required sync, CONVENTIONS.md test_method optional, known-collisions cross-references, qa-workflow re_entry_phase support, README selection guide updates, output sample annotations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Jun-Hex requested a review from em0t as a code owner April 29, 2026 09:32

github-actions Bot added scope:plugins Changes one or more plugins. kind:feature Adds a feature or expands behavior. size:m Medium PR. labels Apr 29, 2026

Jun-Hex and others added 2 commits April 29, 2026 17:52

Jun-Hex changed the title ~~feat(test/requirement-review): 优化飞书报告 10 处可读性问题~~ feat(test): 优化飞书报告可读性 — requirement-review + test-case-review Apr 29, 2026

github-actions Bot added size:l Large PR. and removed size:m Medium PR. labels Apr 29, 2026

Jun-Hex changed the title ~~feat(test): 优化飞书报告可读性 — requirement-review + test-case-review~~ feat(test): 优化飞书报告可读性 — requirement-review + test-case-review + change-analysis Apr 29, 2026

Jun-Hex changed the title ~~feat(test): 优化飞书报告可读性 — requirement-review + test-case-review + change-analysis~~ feat(test): 优化飞书报告可读性 — 4 大业务 workflow（rr/tcr/ca/smoke） Apr 29, 2026

github-actions Bot added size:xl Very large PR, expect slower review. and removed size:l Large PR. labels Apr 29, 2026

Jun-Hex mentioned this pull request Apr 29, 2026

fix(test/requirement-review): 杜绝 review_mode 写出 single_agent_serial 等非法值 #58

Closed

github-actions Bot added scope:tests Changes tests or validation coverage. maintainer-review High-risk change that should be reviewed by a repository maintainer. labels Apr 30, 2026

Jun-Hex changed the title ~~feat(test): 优化飞书报告可读性 — 4 大业务 workflow（rr/tcr/ca/smoke）~~ chore(test): 飞书报告可读性优化 + 全量 skill 评审 + CI 基线加固 (v0.0.8 → 0.0.17) Apr 30, 2026

github-actions Bot added kind:chore Maintenance or configuration change. and removed kind:feature Adds a feature or expands behavior. labels Apr 30, 2026

Jun-Hex and others added 3 commits April 30, 2026 18:40

github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 6, 2026

Jun-Hex requested a review from wangbax May 6, 2026 02:57

github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 7, 2026

github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 8, 2026

Jun-Hex removed the request for review from wangbax May 8, 2026 02:29

Jun-Hex changed the title ~~chore(test): 飞书报告可读性优化 + 全量 skill 评审 + CI 基线加固 (v0.0.8 → 0.0.17)~~ chore(test): Feishu report readability + skill audit + CI hardening May 8, 2026

github-actions Bot added scope:docs Changes documentation or contributor guidance. and removed scope:docs Changes documentation or contributor guidance. labels May 8, 2026

em0t approved these changes May 8, 2026

View reviewed changes

Jun-Hex merged commit 44d5635 into main May 8, 2026
6 checks passed

Jun-Hex deleted the feat/rr-report-readability branch May 8, 2026 02:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(test): Feishu report readability + skill audit + CI hardening#59

chore(test): Feishu report readability + skill audit + CI hardening#59
Jun-Hex merged 15 commits intomainfrom
feat/rr-report-readability

Jun-Hex commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jun-Hex commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Block 1 — Feishu Report Readability

Background

Feishu docx import compatibility traps (discovered via real upload)

Changes (6 commits / 4 skills / 11 files)

Unified conventions (11 rules across 4 workflows)

Verification (real Feishu docx upload)

Impact

Block 2 — Test Plugin Audit + CI Baseline Hardening

TL;DR

CI baseline hardening (validate.sh 5 → 9 check categories)

Contract output collision resolution (8 real collisions → 4 renames + 4 whitelist)

Zombie agent cleanup (3 deletes + 13 dead references)

Dependencies honesty

Helper script UX consistency

Documentation polish

Block 3 — 8-Dimension Cross-Skill Review Phase 1-3 Follow-ups

TL;DR

Coverage (8 dimensions)

Changes implemented

Known limitations (deferred decisions)

Companion change in ai-case

Commit map

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jun-Hex commented Apr 29, 2026 •

edited

Loading