Add CI workflow to check missing redirects on PRs#3541
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds CI checks for broken links and missing redirects; externalizes docs navigation/redirects to Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant PR as "Pull Request"
participant GH as "GitHub Actions"
participant Runner as "Runner (ubuntu-latest)"
participant Script as "Checker Script"
participant Files as "docs.json / config/*.json"
participant Fathom as "Fathom API"
PR->>GH: trigger workflow (pull_request / workflow_dispatch)
GH->>Runner: start job
Runner->>Runner: checkout repo\nsetup Node 20\nnpm install
Runner->>Script: run checker (missing-redirects or broken-links)
Script->>Files: read `docs.json`\nresolve `$ref` -> load `config/navigation.json` & `config/redirects.json`
Script->>Fathom: (missing-redirects) query pageviews using secrets
Fathom-->>Script: return pageview data
Script->>Files: scan site pages / openapi\napply filters, compute suggestions (if missing)
Script-->>Runner: write report (`broken-links.txt` or redirect report)\nexit (0 if none, 1 if any)
Runner-->>GH: job completes (pass/fail)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/check-missing-redirects.yml (2)
15-15: Add a job timeout to prevent stuck external-call runs.This job depends on external API calls; add a bounded timeout for CI reliability.
Proposed fix
check-missing-redirects: name: Check missing redirects runs-on: ubuntu-latest + timeout-minutes: 10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/check-missing-redirects.yml at line 15, This workflow job currently only specifies "runs-on: ubuntu-latest" and lacks a job timeout; add a "timeout-minutes" key (e.g., timeout-minutes: 10) to the job definition so the CI run is forcibly canceled if external API calls hang—insert the timeout-minutes alongside the existing "runs-on" entry in the job block to bound execution time.
24-25: Usenpm ciinstead ofnpm installin CI.
npm ciis faster and lockfile-deterministic for workflow reproducibility.Proposed fix
- name: Install dependencies - run: npm install + run: npm ci🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/check-missing-redirects.yml around lines 24 - 25, Replace the workflow step named "Install dependencies" so the run command uses npm ci instead of npm install to ensure fast, lockfile-deterministic installs in CI; update the step's run value from "npm install" to "npm ci" (optionally add caching separately if desired).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/check-missing-redirects.yml:
- Around line 13-31: The Check missing redirects job will fail on forked
pull_request runs because secrets FATHOM_API_KEY and FATHOM_SITE_ID are
unavailable; update the job named check-missing-redirects to skip when the PR
comes from a fork by adding a guard like if: github.event_name != 'pull_request'
|| github.event.pull_request.head.repo.full_name == github.repository so the job
runs for pushes and intra-repo PRs but not fork PRs, ensuring the env secrets
are only referenced when available.
---
Nitpick comments:
In @.github/workflows/check-missing-redirects.yml:
- Line 15: This workflow job currently only specifies "runs-on: ubuntu-latest"
and lacks a job timeout; add a "timeout-minutes" key (e.g., timeout-minutes: 10)
to the job definition so the CI run is forcibly canceled if external API calls
hang—insert the timeout-minutes alongside the existing "runs-on" entry in the
job block to bound execution time.
- Around line 24-25: Replace the workflow step named "Install dependencies" so
the run command uses npm ci instead of npm install to ensure fast,
lockfile-deterministic installs in CI; update the step's run value from "npm
install" to "npm ci" (optionally add caching separately if desired).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7e6b601c-9202-4b6e-a111-7d6ca0dde774
📒 Files selected for processing (1)
.github/workflows/check-missing-redirects.yml
There was a problem hiding this comment.
🧹 Nitpick comments (4)
scripts/check-missing-redirects.mjs (4)
174-180: Add error handling for$refresolution.If the referenced file doesn't exist or contains invalid JSON, this will throw a cryptic error. Consider wrapping with a try-catch for clearer diagnostics.
🛠️ Proposed fix with error handling
function resolveRef(value) { if (value && typeof value === "object" && "$ref" in value) { const refPath = path.resolve(REPO_ROOT, value["$ref"]); - return JSON.parse(fs.readFileSync(refPath, "utf-8")); + try { + return JSON.parse(fs.readFileSync(refPath, "utf-8")); + } catch (err) { + throw new Error(`Failed to resolve $ref "${value["$ref"]}": ${err.message}`); + } } return value; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-missing-redirects.mjs` around lines 174 - 180, The resolveRef function currently reads and parses a referenced JSON file directly (using path.resolve(REPO_ROOT, value["$ref"]), fs.readFileSync and JSON.parse) which can throw opaque errors; wrap the file read/parse logic in a try-catch inside resolveRef, and on error throw or log a new Error that includes the resolved refPath and the original error message so callers get a clear diagnostic (or return value unchanged/null if that matches existing flow), ensuring you still check value && typeof value === "object" && "$ref" in value before attempting resolution.
321-326: Suggestions shown in console but not in report file.The console output includes suggestions for interactive triage, while the file report (
missing-redirects.txt) omits them. This seems intentional but worth noting—if you want suggestions preserved for async review, consider adding them tooutputLinesas well.Also applies to: 335-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-missing-redirects.mjs` around lines 321 - 326, The report file currently omits the interactive suggestions printed to console; update the code that builds the report (where outputLines is populated for writing missing-redirects.txt) to include the same suggestion text produced by suggestDestination(p, docsPages). Specifically, when iterating sortedMissing (the loop that logs `${String(views).padStart(6)} | ${p}${hint}` using suggestDestination), append the identical string (including the `hint` variable) to outputLines so the written report contains the suggested destination entries as well.
20-20: Lowering threshold to 1 view may increase noise.While this catches more missing redirects, single-view URLs could be bot probes or typos. The expanded junk filters (lines 269-281) help mitigate this, but consider if
MIN_VIEWS = 2or3might provide a better signal-to-noise ratio while still being comprehensive.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-missing-redirects.mjs` at line 20, The MIN_VIEWS constant set to 1 may produce excessive noise; update the constant MIN_VIEWS to a higher value (suggest 2 or 3) in scripts/check-missing-redirects.mjs and adjust any related logic that uses MIN_VIEWS (e.g., filtering or reporting in the functions that iterate hits) so behavior reflects the new threshold; also confirm the expanded junk filters (the rules referenced around the block that currently spans lines ~269-281) are still applied after changing the threshold to avoid reintroducing low-signal entries.
363-365: Consider writing report to a temporary or artifacts directory in CI.Writing
missing-redirects.txtto the repo root creates a file in the working directory during CI runs. While not committed, this could trigger "dirty tree" warnings in some workflows. Consider:
- Writing to a temp directory, or
- Adding
missing-redirects.txtto.gitignore, or- Using GitHub Actions' artifact upload to preserve the report
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-missing-redirects.mjs` around lines 363 - 365, The script currently writes the report file to REPO_ROOT as missing-redirects.txt using fs.writeFileSync(outputPath, ...), which can leave a dirty working tree in CI; change the output location to a non-repo path (e.g., use os.tmpdir() or a CI artifacts env var like process.env.CI_ARTIFACTS_DIR/ARTIFACTS_DIR) when present, falling back to REPO_ROOT only as a last resort, and update the outputPath construction and the console message to reference the new path (identify the change around outputPath, REPO_ROOT, missing-redirects.txt, fs.writeFileSync, and outputLines.join).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/check-missing-redirects.mjs`:
- Around line 174-180: The resolveRef function currently reads and parses a
referenced JSON file directly (using path.resolve(REPO_ROOT, value["$ref"]),
fs.readFileSync and JSON.parse) which can throw opaque errors; wrap the file
read/parse logic in a try-catch inside resolveRef, and on error throw or log a
new Error that includes the resolved refPath and the original error message so
callers get a clear diagnostic (or return value unchanged/null if that matches
existing flow), ensuring you still check value && typeof value === "object" &&
"$ref" in value before attempting resolution.
- Around line 321-326: The report file currently omits the interactive
suggestions printed to console; update the code that builds the report (where
outputLines is populated for writing missing-redirects.txt) to include the same
suggestion text produced by suggestDestination(p, docsPages). Specifically, when
iterating sortedMissing (the loop that logs `${String(views).padStart(6)} |
${p}${hint}` using suggestDestination), append the identical string (including
the `hint` variable) to outputLines so the written report contains the suggested
destination entries as well.
- Line 20: The MIN_VIEWS constant set to 1 may produce excessive noise; update
the constant MIN_VIEWS to a higher value (suggest 2 or 3) in
scripts/check-missing-redirects.mjs and adjust any related logic that uses
MIN_VIEWS (e.g., filtering or reporting in the functions that iterate hits) so
behavior reflects the new threshold; also confirm the expanded junk filters (the
rules referenced around the block that currently spans lines ~269-281) are still
applied after changing the threshold to avoid reintroducing low-signal entries.
- Around line 363-365: The script currently writes the report file to REPO_ROOT
as missing-redirects.txt using fs.writeFileSync(outputPath, ...), which can
leave a dirty working tree in CI; change the output location to a non-repo path
(e.g., use os.tmpdir() or a CI artifacts env var like
process.env.CI_ARTIFACTS_DIR/ARTIFACTS_DIR) when present, falling back to
REPO_ROOT only as a last resort, and update the outputPath construction and the
console message to reference the new path (identify the change around
outputPath, REPO_ROOT, missing-redirects.txt, fs.writeFileSync, and
outputLines.join).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e27d779a-ebf8-42c6-9794-c9203642250e
📒 Files selected for processing (4)
config/navigation.jsonconfig/redirects.jsondocs.jsonscripts/check-missing-redirects.mjs
✅ Files skipped from review due to trivial changes (1)
- config/navigation.json
There was a problem hiding this comment.
Hey @qdequele sounds like a good addition
But I don't understand how it fixes #3540?
My issue was probably not clear at all and I updated it to make it clearer: we need a basic checker of links that are display IN the the content of our documentation. If the link redirects to a 404 page or an empty page, the link is considered broken, and we need to be aware of it. It has nothing to do with redirections.
We have to check internal links (redirecting to any other content of our docs) and external links (redirecting to any other content)
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/check-broken-links.mjs (1)
106-122: Dead code:extractInternalLinksandINTERNAL_LINK_REare unused.The
extractInternalLinksfunction andINTERNAL_LINK_REconstant are defined but never called. ThescanMdxFilesfunction (line 145) defines its own inline regex for link extraction instead.Either remove this dead code or refactor
scanMdxFilesto useextractInternalLinks.♻️ Option 1: Remove dead code
-const INTERNAL_LINK_RE = /(?:\[(?:[^\]]*)\]\(|href=")(\/?[^)"#?\s][^)"#?\s]*)/g; - -function extractInternalLinks(content) { - const links = []; - let match; - while ((match = INTERNAL_LINK_RE.exec(content)) !== null) { - const href = match[1]; - // Only absolute internal paths (start with /) - if (!href.startsWith("/")) continue; - // Skip external URLs - if (href.startsWith("//") || href.startsWith("/http")) continue; - // Skip asset paths — those are static files, not doc pages - if (href.startsWith("/assets/")) continue; - links.push(href); - } - return links; -}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-broken-links.mjs` around lines 106 - 122, The file defines INTERNAL_LINK_RE and extractInternalLinks but they are unused; either delete both to remove dead code or update scanMdxFiles to reuse them: in scanMdxFiles replace the inline link-extraction regex/logic with a call to extractInternalLinks(content) and ensure the same filtering behavior (skip non-leading "/", "//", "/http", and "/assets/"); keep the function and constant names exact (INTERNAL_LINK_RE, extractInternalLinks, scanMdxFiles) so the change is easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check-broken-links.mjs`:
- Around line 128-131: The IGNORED_DIRS set in buildValidPaths currently
includes an unused "api_v0" entry; either remove "api_v0" from the IGNORED_DIRS
definition inside the buildValidPaths function to avoid dead configuration, or
if you intend to keep it for future/compatibility reasons, add a short comment
next to the IGNORED_DIRS declaration explaining why "api_v0" is retained; update
the IGNORED_DIRS usage in buildValidPaths (and/or the similar constant in
scanMdxFiles if duplicated) so the code and intent are consistent.
---
Nitpick comments:
In `@scripts/check-broken-links.mjs`:
- Around line 106-122: The file defines INTERNAL_LINK_RE and
extractInternalLinks but they are unused; either delete both to remove dead code
or update scanMdxFiles to reuse them: in scanMdxFiles replace the inline
link-extraction regex/logic with a call to extractInternalLinks(content) and
ensure the same filtering behavior (skip non-leading "/", "//", "/http", and
"/assets/"); keep the function and constant names exact (INTERNAL_LINK_RE,
extractInternalLinks, scanMdxFiles) so the change is easy to locate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 324bf7cc-2a78-4b61-8f80-a6a36cf34724
📒 Files selected for processing (8)
.github/workflows/check-broken-links.yml.gitignorecapabilities/full_text_search/advanced/debug_search_performance.mdxcapabilities/indexing/how_to/document_relations.mdxpackage.jsonresources/help/experimental_features_overview.mdxresources/self_hosting/enterprise_edition.mdxscripts/check-broken-links.mjs
✅ Files skipped from review due to trivial changes (6)
- .gitignore
- resources/self_hosting/enterprise_edition.mdx
- resources/help/experimental_features_overview.mdx
- capabilities/full_text_search/advanced/debug_search_performance.mdx
- package.json
- capabilities/indexing/how_to/document_relations.mdx
| function scanMdxFiles(validPaths) { | ||
| const broken = []; // { file, link, line } | ||
| const IGNORED_DIRS = new Set([".git", "node_modules", "assets", "snippets", ".mintlify"]); | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if api_v0 directory exists and contains MDX files
echo "=== Checking api_v0 directory ==="
if [ -d "api_v0" ]; then
echo "api_v0 exists. Counting MDX files:"
fd -e mdx . api_v0 | wc -l
echo "Sample files:"
fd -e mdx . api_v0 | head -5
else
echo "api_v0 directory does not exist"
fiRepository: meilisearch/documentation
Length of output: 134
Remove the unused api_v0 entry from IGNORED_DIRS in buildValidPaths (line 58), or clarify why it's needed.
The api_v0/ directory does not exist in the repository, so the ignore entry in buildValidPaths has no effect. Either remove it for clarity, or add a comment explaining why it's retained for future reference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/check-broken-links.mjs` around lines 128 - 131, The IGNORED_DIRS set
in buildValidPaths currently includes an unused "api_v0" entry; either remove
"api_v0" from the IGNORED_DIRS definition inside the buildValidPaths function to
avoid dead configuration, or if you intend to keep it for future/compatibility
reasons, add a short comment next to the IGNORED_DIRS declaration explaining why
"api_v0" is retained; update the IGNORED_DIRS usage in buildValidPaths (and/or
the similar constant in scanMdxFiles if duplicated) so the code and intent are
consistent.
Hey @curquiza I updated the script to check the broken links in two steps:
Question: I chose to go with one script and one CI, but we might have done it with two scripts and two CIs. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check-missing-redirects.mjs`:
- Line 20: The CI threshold MIN_VIEWS is currently set to 1 and causes the job
to fail on any single live hit; change this by raising the fail threshold (e.g.,
set MIN_VIEWS to a higher number like 50) or better: introduce two constants
(REPORT_VIEWS = 1 for visibility and FAIL_VIEWS = 50 for CI blocking) and update
the logic that evaluates hits (the checks referenced around the current failure
logic) to use FAIL_VIEWS for failing the job while using REPORT_VIEWS for
non-blocking reports.
- Around line 381-383: The error message that references docs.json is outdated:
update the console.error call that uses sortedMissing to direct contributors to
the new redirects file (replace "docs.json" with the new redirects file name,
e.g., "redirects.json") while keeping the same error text and the
process.exit(1) behavior; locate the console.error in
scripts/check-missing-redirects.mjs that mentions sortedMissing and change only
the filename in the hint.
- Around line 224-249: The boost for reference pages currently runs even when
there are zero token matches and can make a /docs/reference/ page win
incorrectly; modify the loop in which you compute score for each page (using
tokenize, queryTokens, pageTokens, score, bestScore, bestPage, isRefPath,
docsPages) so that you only add the +0.5 boost when score > 0 (i.e., there was
at least one real token match) before comparing to bestScore; keep returning
null when bestScore === 0 and still strip the /docs prefix from bestPage as
before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 647d4940-3a8d-40eb-baf8-096caa3b7962
📒 Files selected for processing (2)
config/redirects.jsonscripts/check-missing-redirects.mjs
- Add 711 missing redirects (271 → 982 total) based on real Fathom Analytics traffic - Split docs.json into config/redirects.json and config/navigation.json using Mintlify $ref to keep the main config file manageable - Improve check-missing-redirects script: auto-suggest destinations via word matching, exit with code 1 on missing redirects, filter junk/bot URLs, resolve $ref when reading redirects
- Add scripts/check-broken-links.mjs: static checker that builds valid paths from local MDX files + OpenAPI-generated URLs, then scans all MDX files for internal links pointing to non-existent pages - Add .github/workflows/check-broken-links.yml: runs on every PR targeting main, no secrets required - Fix the 5 broken links found by the new checker: - /resources/self_hosting/sharding → sharding/overview (3 files) - /reference/api/settings/get-foreign-keys → get-foreignkeys - /capabilities/personalization/getting_started → getting_started/personalized_search
Change the minimum pageview threshold from 1 to 2 to reduce noise from bot traffic and one-off mistyped URLs. Add redirects for the 7 URLs that still have >= 2 views. Add junk URL filters for embedded protocols, non-ASCII characters, and uppercase file extensions.
87da7a7 to
bd0458f
Compare
Fix 2 broken links in conversational search setup page: - experimental features API endpoint - chat settings API endpoint Add 13 missing redirects for removed cloud platform pages and other URLs with >= 2 views in the last 90 days.
Summary
The `check-missing-redirects` workflow needs two secrets to call the Fathom Analytics API:
The `check-broken-links` workflow requires no secrets.
Summary by CodeRabbit