Skip to content

Add CI workflow to check missing redirects on PRs#3541

Merged
qdequele merged 11 commits intomainfrom
ci-check-missing-redirects
Apr 10, 2026
Merged

Add CI workflow to check missing redirects on PRs#3541
qdequele merged 11 commits intomainfrom
ci-check-missing-redirects

Conversation

@qdequele
Copy link
Copy Markdown
Member

@qdequele qdequele commented Apr 2, 2026

Summary

  • Adds `.github/workflows/check-missing-redirects.yml` to run the existing `check-missing-redirects` script as a CI check on every PR targeting `main`
  • Fixes all 740 missing redirects found by the script (271 → 982 total redirects)
  • Due to the large number of redirects added, splits `docs.json` into separate files using Mintlify's `$ref` support: `config/redirects.json` and `config/navigation.json`
  • Improves the script: adds auto-suggestion of destination pages, exits with code 1 on missing redirects, and filters out junk/bot URLs
  • Adds `.github/workflows/check-broken-links.yml`: a static checker that scans all MDX files for internal links pointing to non-existent pages — no secrets required, runs on every PR
  • Adds `scripts/check-broken-links.mjs`: builds valid paths from local MDX files + OpenAPI-generated URLs (slugified from operation summaries), then reports broken links with file and line number
  • Fixes the 5 broken internal links found by the new checker
  • Closes Implement a broen-link checker #3540

⚠️ Required secrets

The `check-missing-redirects` workflow needs two secrets to call the Fathom Analytics API:

  • `FATHOM_API_KEY`
  • `FATHOM_SITE_ID`

The `check-broken-links` workflow requires no secrets.

Summary by CodeRabbit

  • New Features
    • Added automated checks for missing documentation redirects (with suggested destinations) and for broken internal docs links; both run on PRs and via manual trigger.
  • Chores
    • Moved documentation navigation and redirects into external config files.
    • Added npm script to run the broken-links check and ignored its output file.
  • Documentation
    • Updated several internal documentation links to new target routes.

@mintlify
Copy link
Copy Markdown

mintlify bot commented Apr 2, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
meilisearch-documentation 🟢 Ready View Preview Apr 2, 2026, 5:10 PM

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds CI checks for broken links and missing redirects; externalizes docs navigation/redirects to config/ JSON; enhances missing-redirects script (dereferencing refs, lower threshold, junk filters, suggestions, fail-on-findings); adds a broken-links scanner and workflows.

Changes

Cohort / File(s) Summary
Workflows
​.github/workflows/check-missing-redirects.yml, ​.github/workflows/check-broken-links.yml
New GitHub Actions workflows (manual + PR on main) that checkout, setup Node 20, cache npm, install deps, and run the redirect/link checker scripts.
Link / Redirect check scripts
scripts/check-missing-redirects.mjs, scripts/check-broken-links.mjs
Added check-broken-links.mjs to validate internal MDX/OpenAPI/redirect paths, produce broken-links.txt, and exit non-zero on findings. Updated check-missing-redirects.mjs to resolve $ref redirects, lower MIN_VIEWS to 1, add junk-path filters, suggest destination matches, and exit with non-zero on findings.
Docs manifest / config
config
config/navigation.json, config/redirects.json, docs.json
Externalized embedded navigation and redirects: docs.json now references ./config/navigation.json and ./config/redirects.json via $ref; config/navigation.json (full nav manifest) and redirects JSON referenced.
Package & Git ignore
package.json, .gitignore
Added npm script check-broken-links and added broken-links.txt to .gitignore.
Documentation edits
capabilities/full_text_search/advanced/debug_search_performance.mdx, capabilities/indexing/how_to/document_relations.mdx, resources/help/experimental_features_overview.mdx, resources/self_hosting/enterprise_edition.mdx
Small hyperlink updates adjusting target URLs (e.g., adding /overview or correcting endpoint path).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant PR as "Pull Request"
  participant GH as "GitHub Actions"
  participant Runner as "Runner (ubuntu-latest)"
  participant Script as "Checker Script"
  participant Files as "docs.json / config/*.json"
  participant Fathom as "Fathom API"

  PR->>GH: trigger workflow (pull_request / workflow_dispatch)
  GH->>Runner: start job
  Runner->>Runner: checkout repo\nsetup Node 20\nnpm install
  Runner->>Script: run checker (missing-redirects or broken-links)
  Script->>Files: read `docs.json`\nresolve `$ref` -> load `config/navigation.json` & `config/redirects.json`
  Script->>Fathom: (missing-redirects) query pageviews using secrets
  Fathom-->>Script: return pageview data
  Script->>Files: scan site pages / openapi\napply filters, compute suggestions (if missing)
  Script-->>Runner: write report (`broken-links.txt` or redirect report)\nexit (0 if none, 1 if any)
  Runner-->>GH: job completes (pass/fail)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through JSON lanes and traced each track,

nudged stray links and left a gentle snack.
I scored the paths and pointed a door,
then twitched my nose—so fixes come soon more.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive The PR includes additional changes beyond the linked issue: splitting docs.json into separate config files, improving the check-missing-redirects script with suggestions and filtering, and fixing five broken internal links discovered during implementation. Clarify whether splitting docs.json into config/navigation.json and config/redirects.json and improving check-missing-redirects are intentional dependencies or scope creep that should be separated into different PRs.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: adding a CI workflow to check missing redirects on PRs. It is specific, concise, and reflects the main objective.
Linked Issues check ✅ Passed The PR substantially addresses issue #3540 by implementing a broken-link checker in CI that detects both internal links (from MDX files against docs.json) and external links (via Fathom Analytics traffic), reporting broken links on every PR targeting main.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci-check-missing-redirects

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/check-missing-redirects.yml (2)

15-15: Add a job timeout to prevent stuck external-call runs.

This job depends on external API calls; add a bounded timeout for CI reliability.

Proposed fix
   check-missing-redirects:
     name: Check missing redirects
     runs-on: ubuntu-latest
+    timeout-minutes: 10
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/check-missing-redirects.yml at line 15, This workflow job
currently only specifies "runs-on: ubuntu-latest" and lacks a job timeout; add a
"timeout-minutes" key (e.g., timeout-minutes: 10) to the job definition so the
CI run is forcibly canceled if external API calls hang—insert the
timeout-minutes alongside the existing "runs-on" entry in the job block to bound
execution time.

24-25: Use npm ci instead of npm install in CI.

npm ci is faster and lockfile-deterministic for workflow reproducibility.

Proposed fix
       - name: Install dependencies
-        run: npm install
+        run: npm ci
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/check-missing-redirects.yml around lines 24 - 25, Replace
the workflow step named "Install dependencies" so the run command uses npm ci
instead of npm install to ensure fast, lockfile-deterministic installs in CI;
update the step's run value from "npm install" to "npm ci" (optionally add
caching separately if desired).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/check-missing-redirects.yml:
- Around line 13-31: The Check missing redirects job will fail on forked
pull_request runs because secrets FATHOM_API_KEY and FATHOM_SITE_ID are
unavailable; update the job named check-missing-redirects to skip when the PR
comes from a fork by adding a guard like if: github.event_name != 'pull_request'
|| github.event.pull_request.head.repo.full_name == github.repository so the job
runs for pushes and intra-repo PRs but not fork PRs, ensuring the env secrets
are only referenced when available.

---

Nitpick comments:
In @.github/workflows/check-missing-redirects.yml:
- Line 15: This workflow job currently only specifies "runs-on: ubuntu-latest"
and lacks a job timeout; add a "timeout-minutes" key (e.g., timeout-minutes: 10)
to the job definition so the CI run is forcibly canceled if external API calls
hang—insert the timeout-minutes alongside the existing "runs-on" entry in the
job block to bound execution time.
- Around line 24-25: Replace the workflow step named "Install dependencies" so
the run command uses npm ci instead of npm install to ensure fast,
lockfile-deterministic installs in CI; update the step's run value from "npm
install" to "npm ci" (optionally add caching separately if desired).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e6b601c-9202-4b6e-a111-7d6ca0dde774

📥 Commits

Reviewing files that changed from the base of the PR and between e67d2b7 and 26e1366.

📒 Files selected for processing (1)
  • .github/workflows/check-missing-redirects.yml

Comment thread .github/workflows/check-missing-redirects.yml
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
scripts/check-missing-redirects.mjs (4)

174-180: Add error handling for $ref resolution.

If the referenced file doesn't exist or contains invalid JSON, this will throw a cryptic error. Consider wrapping with a try-catch for clearer diagnostics.

🛠️ Proposed fix with error handling
 function resolveRef(value) {
   if (value && typeof value === "object" && "$ref" in value) {
     const refPath = path.resolve(REPO_ROOT, value["$ref"]);
-    return JSON.parse(fs.readFileSync(refPath, "utf-8"));
+    try {
+      return JSON.parse(fs.readFileSync(refPath, "utf-8"));
+    } catch (err) {
+      throw new Error(`Failed to resolve $ref "${value["$ref"]}": ${err.message}`);
+    }
   }
   return value;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-missing-redirects.mjs` around lines 174 - 180, The resolveRef
function currently reads and parses a referenced JSON file directly (using
path.resolve(REPO_ROOT, value["$ref"]), fs.readFileSync and JSON.parse) which
can throw opaque errors; wrap the file read/parse logic in a try-catch inside
resolveRef, and on error throw or log a new Error that includes the resolved
refPath and the original error message so callers get a clear diagnostic (or
return value unchanged/null if that matches existing flow), ensuring you still
check value && typeof value === "object" && "$ref" in value before attempting
resolution.

321-326: Suggestions shown in console but not in report file.

The console output includes suggestions for interactive triage, while the file report (missing-redirects.txt) omits them. This seems intentional but worth noting—if you want suggestions preserved for async review, consider adding them to outputLines as well.

Also applies to: 335-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-missing-redirects.mjs` around lines 321 - 326, The report file
currently omits the interactive suggestions printed to console; update the code
that builds the report (where outputLines is populated for writing
missing-redirects.txt) to include the same suggestion text produced by
suggestDestination(p, docsPages). Specifically, when iterating sortedMissing
(the loop that logs `${String(views).padStart(6)} | ${p}${hint}` using
suggestDestination), append the identical string (including the `hint` variable)
to outputLines so the written report contains the suggested destination entries
as well.

20-20: Lowering threshold to 1 view may increase noise.

While this catches more missing redirects, single-view URLs could be bot probes or typos. The expanded junk filters (lines 269-281) help mitigate this, but consider if MIN_VIEWS = 2 or 3 might provide a better signal-to-noise ratio while still being comprehensive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-missing-redirects.mjs` at line 20, The MIN_VIEWS constant set
to 1 may produce excessive noise; update the constant MIN_VIEWS to a higher
value (suggest 2 or 3) in scripts/check-missing-redirects.mjs and adjust any
related logic that uses MIN_VIEWS (e.g., filtering or reporting in the functions
that iterate hits) so behavior reflects the new threshold; also confirm the
expanded junk filters (the rules referenced around the block that currently
spans lines ~269-281) are still applied after changing the threshold to avoid
reintroducing low-signal entries.

363-365: Consider writing report to a temporary or artifacts directory in CI.

Writing missing-redirects.txt to the repo root creates a file in the working directory during CI runs. While not committed, this could trigger "dirty tree" warnings in some workflows. Consider:

  • Writing to a temp directory, or
  • Adding missing-redirects.txt to .gitignore, or
  • Using GitHub Actions' artifact upload to preserve the report
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-missing-redirects.mjs` around lines 363 - 365, The script
currently writes the report file to REPO_ROOT as missing-redirects.txt using
fs.writeFileSync(outputPath, ...), which can leave a dirty working tree in CI;
change the output location to a non-repo path (e.g., use os.tmpdir() or a CI
artifacts env var like process.env.CI_ARTIFACTS_DIR/ARTIFACTS_DIR) when present,
falling back to REPO_ROOT only as a last resort, and update the outputPath
construction and the console message to reference the new path (identify the
change around outputPath, REPO_ROOT, missing-redirects.txt, fs.writeFileSync,
and outputLines.join).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/check-missing-redirects.mjs`:
- Around line 174-180: The resolveRef function currently reads and parses a
referenced JSON file directly (using path.resolve(REPO_ROOT, value["$ref"]),
fs.readFileSync and JSON.parse) which can throw opaque errors; wrap the file
read/parse logic in a try-catch inside resolveRef, and on error throw or log a
new Error that includes the resolved refPath and the original error message so
callers get a clear diagnostic (or return value unchanged/null if that matches
existing flow), ensuring you still check value && typeof value === "object" &&
"$ref" in value before attempting resolution.
- Around line 321-326: The report file currently omits the interactive
suggestions printed to console; update the code that builds the report (where
outputLines is populated for writing missing-redirects.txt) to include the same
suggestion text produced by suggestDestination(p, docsPages). Specifically, when
iterating sortedMissing (the loop that logs `${String(views).padStart(6)} |
${p}${hint}` using suggestDestination), append the identical string (including
the `hint` variable) to outputLines so the written report contains the suggested
destination entries as well.
- Line 20: The MIN_VIEWS constant set to 1 may produce excessive noise; update
the constant MIN_VIEWS to a higher value (suggest 2 or 3) in
scripts/check-missing-redirects.mjs and adjust any related logic that uses
MIN_VIEWS (e.g., filtering or reporting in the functions that iterate hits) so
behavior reflects the new threshold; also confirm the expanded junk filters (the
rules referenced around the block that currently spans lines ~269-281) are still
applied after changing the threshold to avoid reintroducing low-signal entries.
- Around line 363-365: The script currently writes the report file to REPO_ROOT
as missing-redirects.txt using fs.writeFileSync(outputPath, ...), which can
leave a dirty working tree in CI; change the output location to a non-repo path
(e.g., use os.tmpdir() or a CI artifacts env var like
process.env.CI_ARTIFACTS_DIR/ARTIFACTS_DIR) when present, falling back to
REPO_ROOT only as a last resort, and update the outputPath construction and the
console message to reference the new path (identify the change around
outputPath, REPO_ROOT, missing-redirects.txt, fs.writeFileSync, and
outputLines.join).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e27d779a-ebf8-42c6-9794-c9203642250e

📥 Commits

Reviewing files that changed from the base of the PR and between 26e1366 and 4dfd168.

📒 Files selected for processing (4)
  • config/navigation.json
  • config/redirects.json
  • docs.json
  • scripts/check-missing-redirects.mjs
✅ Files skipped from review due to trivial changes (1)
  • config/navigation.json

@qdequele qdequele requested review from CaroFG and curquiza April 3, 2026 10:13
@qdequele qdequele self-assigned this Apr 3, 2026
Copy link
Copy Markdown
Member

@curquiza curquiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @qdequele sounds like a good addition

But I don't understand how it fixes #3540?
My issue was probably not clear at all and I updated it to make it clearer: we need a basic checker of links that are display IN the the content of our documentation. If the link redirects to a 404 page or an empty page, the link is considered broken, and we need to be aware of it. It has nothing to do with redirections.
We have to check internal links (redirecting to any other content of our docs) and external links (redirecting to any other content)

Image

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/check-broken-links.mjs (1)

106-122: Dead code: extractInternalLinks and INTERNAL_LINK_RE are unused.

The extractInternalLinks function and INTERNAL_LINK_RE constant are defined but never called. The scanMdxFiles function (line 145) defines its own inline regex for link extraction instead.

Either remove this dead code or refactor scanMdxFiles to use extractInternalLinks.

♻️ Option 1: Remove dead code
-const INTERNAL_LINK_RE = /(?:\[(?:[^\]]*)\]\(|href=")(\/?[^)"#?\s][^)"#?\s]*)/g;
-
-function extractInternalLinks(content) {
-  const links = [];
-  let match;
-  while ((match = INTERNAL_LINK_RE.exec(content)) !== null) {
-    const href = match[1];
-    // Only absolute internal paths (start with /)
-    if (!href.startsWith("/")) continue;
-    // Skip external URLs
-    if (href.startsWith("//") || href.startsWith("/http")) continue;
-    // Skip asset paths — those are static files, not doc pages
-    if (href.startsWith("/assets/")) continue;
-    links.push(href);
-  }
-  return links;
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-broken-links.mjs` around lines 106 - 122, The file defines
INTERNAL_LINK_RE and extractInternalLinks but they are unused; either delete
both to remove dead code or update scanMdxFiles to reuse them: in scanMdxFiles
replace the inline link-extraction regex/logic with a call to
extractInternalLinks(content) and ensure the same filtering behavior (skip
non-leading "/", "//", "/http", and "/assets/"); keep the function and constant
names exact (INTERNAL_LINK_RE, extractInternalLinks, scanMdxFiles) so the change
is easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/check-broken-links.mjs`:
- Around line 128-131: The IGNORED_DIRS set in buildValidPaths currently
includes an unused "api_v0" entry; either remove "api_v0" from the IGNORED_DIRS
definition inside the buildValidPaths function to avoid dead configuration, or
if you intend to keep it for future/compatibility reasons, add a short comment
next to the IGNORED_DIRS declaration explaining why "api_v0" is retained; update
the IGNORED_DIRS usage in buildValidPaths (and/or the similar constant in
scanMdxFiles if duplicated) so the code and intent are consistent.

---

Nitpick comments:
In `@scripts/check-broken-links.mjs`:
- Around line 106-122: The file defines INTERNAL_LINK_RE and
extractInternalLinks but they are unused; either delete both to remove dead code
or update scanMdxFiles to reuse them: in scanMdxFiles replace the inline
link-extraction regex/logic with a call to extractInternalLinks(content) and
ensure the same filtering behavior (skip non-leading "/", "//", "/http", and
"/assets/"); keep the function and constant names exact (INTERNAL_LINK_RE,
extractInternalLinks, scanMdxFiles) so the change is easy to locate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 324bf7cc-2a78-4b61-8f80-a6a36cf34724

📥 Commits

Reviewing files that changed from the base of the PR and between 489240b and 85795d1.

📒 Files selected for processing (8)
  • .github/workflows/check-broken-links.yml
  • .gitignore
  • capabilities/full_text_search/advanced/debug_search_performance.mdx
  • capabilities/indexing/how_to/document_relations.mdx
  • package.json
  • resources/help/experimental_features_overview.mdx
  • resources/self_hosting/enterprise_edition.mdx
  • scripts/check-broken-links.mjs
✅ Files skipped from review due to trivial changes (6)
  • .gitignore
  • resources/self_hosting/enterprise_edition.mdx
  • resources/help/experimental_features_overview.mdx
  • capabilities/full_text_search/advanced/debug_search_performance.mdx
  • package.json
  • capabilities/indexing/how_to/document_relations.mdx

Comment on lines +128 to +131
function scanMdxFiles(validPaths) {
const broken = []; // { file, link, line }
const IGNORED_DIRS = new Set([".git", "node_modules", "assets", "snippets", ".mintlify"]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if api_v0 directory exists and contains MDX files
echo "=== Checking api_v0 directory ==="
if [ -d "api_v0" ]; then
  echo "api_v0 exists. Counting MDX files:"
  fd -e mdx . api_v0 | wc -l
  echo "Sample files:"
  fd -e mdx . api_v0 | head -5
else
  echo "api_v0 directory does not exist"
fi

Repository: meilisearch/documentation

Length of output: 134


Remove the unused api_v0 entry from IGNORED_DIRS in buildValidPaths (line 58), or clarify why it's needed.

The api_v0/ directory does not exist in the repository, so the ignore entry in buildValidPaths has no effect. Either remove it for clarity, or add a comment explaining why it's retained for future reference.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check-broken-links.mjs` around lines 128 - 131, The IGNORED_DIRS set
in buildValidPaths currently includes an unused "api_v0" entry; either remove
"api_v0" from the IGNORED_DIRS definition inside the buildValidPaths function to
avoid dead configuration, or if you intend to keep it for future/compatibility
reasons, add a short comment next to the IGNORED_DIRS declaration explaining why
"api_v0" is retained; update the IGNORED_DIRS usage in buildValidPaths (and/or
the similar constant in scanMdxFiles if duplicated) so the code and intent are
consistent.

@qdequele
Copy link
Copy Markdown
Member Author

qdequele commented Apr 7, 2026

Hey @qdequele sounds like a good addition

But I don't understand how it fixes #3540? My issue was probably not clear at all and I updated it to make it clearer: we need a basic checker of links that are display IN the the content of our documentation. If the link redirects to a 404 page or an empty page, the link is considered broken, and we need to be aware of it. It has nothing to do with redirections. We have to check internal links (redirecting to any other content of our docs) and external links (redirecting to any other content)

Hey @curquiza I updated the script to check the broken links in two steps:

  • first checking all the internal link by extracting the urls from the markdown and checking the existing ones in the docs.json
  • second checking all the external links by getting the last 90 days of traffic and looking if all the links have a proper redirect.

Question: I chose to go with one script and one CI, but we might have done it with two scripts and two CIs.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/check-missing-redirects.mjs`:
- Line 20: The CI threshold MIN_VIEWS is currently set to 1 and causes the job
to fail on any single live hit; change this by raising the fail threshold (e.g.,
set MIN_VIEWS to a higher number like 50) or better: introduce two constants
(REPORT_VIEWS = 1 for visibility and FAIL_VIEWS = 50 for CI blocking) and update
the logic that evaluates hits (the checks referenced around the current failure
logic) to use FAIL_VIEWS for failing the job while using REPORT_VIEWS for
non-blocking reports.
- Around line 381-383: The error message that references docs.json is outdated:
update the console.error call that uses sortedMissing to direct contributors to
the new redirects file (replace "docs.json" with the new redirects file name,
e.g., "redirects.json") while keeping the same error text and the
process.exit(1) behavior; locate the console.error in
scripts/check-missing-redirects.mjs that mentions sortedMissing and change only
the filename in the hint.
- Around line 224-249: The boost for reference pages currently runs even when
there are zero token matches and can make a /docs/reference/ page win
incorrectly; modify the loop in which you compute score for each page (using
tokenize, queryTokens, pageTokens, score, bestScore, bestPage, isRefPath,
docsPages) so that you only add the +0.5 boost when score > 0 (i.e., there was
at least one real token match) before comparing to bestScore; keep returning
null when bestScore === 0 and still strip the /docs prefix from bestPage as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 647d4940-3a8d-40eb-baf8-096caa3b7962

📥 Commits

Reviewing files that changed from the base of the PR and between 85795d1 and ea42795.

📒 Files selected for processing (2)
  • config/redirects.json
  • scripts/check-missing-redirects.mjs

Comment thread scripts/check-missing-redirects.mjs Outdated
Comment thread scripts/check-missing-redirects.mjs
Comment thread scripts/check-missing-redirects.mjs
- Add 711 missing redirects (271 → 982 total) based on real Fathom Analytics traffic
- Split docs.json into config/redirects.json and config/navigation.json using Mintlify $ref to keep the main config file manageable
- Improve check-missing-redirects script: auto-suggest destinations via word matching, exit with code 1 on missing redirects, filter junk/bot URLs, resolve $ref when reading redirects
- Add scripts/check-broken-links.mjs: static checker that builds valid
  paths from local MDX files + OpenAPI-generated URLs, then scans all
  MDX files for internal links pointing to non-existent pages
- Add .github/workflows/check-broken-links.yml: runs on every PR
  targeting main, no secrets required
- Fix the 5 broken links found by the new checker:
  - /resources/self_hosting/sharding → sharding/overview (3 files)
  - /reference/api/settings/get-foreign-keys → get-foreignkeys
  - /capabilities/personalization/getting_started → getting_started/personalized_search
Change the minimum pageview threshold from 1 to 2 to reduce noise from
bot traffic and one-off mistyped URLs. Add redirects for the 7 URLs that
still have >= 2 views. Add junk URL filters for embedded protocols,
non-ASCII characters, and uppercase file extensions.
@qdequele qdequele force-pushed the ci-check-missing-redirects branch from 87da7a7 to bd0458f Compare April 10, 2026 22:07
Fix 2 broken links in conversational search setup page:
- experimental features API endpoint
- chat settings API endpoint

Add 13 missing redirects for removed cloud platform pages and
other URLs with >= 2 views in the last 90 days.
@qdequele qdequele disabled auto-merge April 10, 2026 22:22
@qdequele qdequele merged commit 99dada8 into main Apr 10, 2026
3 of 4 checks passed
@qdequele qdequele deleted the ci-check-missing-redirects branch April 10, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a broen-link checker

2 participants