Skip to content

fork: re-align with upstream rtk-ai/rtk v0.38.0#14

Open
princess-myra wants to merge 309 commits intomainfrom
fork/upstream-realign-v0.38.0
Open

fork: re-align with upstream rtk-ai/rtk v0.38.0#14
princess-myra wants to merge 309 commits intomainfrom
fork/upstream-realign-v0.38.0

Conversation

@princess-myra
Copy link
Copy Markdown
Collaborator

What

Re-aligns the algolia fork with upstream rtk-ai/rtk v0.38.0 (300+
commits absorbed, last sync was v0.34.2 in March 2026).

This PR follows the re-fork strategy documented in CLAUDE.md
(restored in this same PR — see commit 7): instead of merging upstream
into main, the branch is created off upstream/master and our fork
patches are re-applied as small focused commits on top.

How to review

The PR diff against main is large (300+ commits absorbed = lots of
file moves, dep bumps, doc reorgs from upstream). That's not the diff
to look at.
Look at the diff against upstream/master:

git diff upstream/master..HEAD

This is the actual algolia-specific delta: 53 files, +401 / -1980
(net negative — telemetry deletion dominates).

The 8 commits (each independently reviewable)

# Commit Why
1 fork: strip telemetry No phone-home in algolia builds. Removes src/core/telemetry.rs, telemetry_cmd.rs, the Telemetry CLI variant, the maybe_ping() call, the ureq dep, the RTK_TELEMETRY_* env vars in release.yml, and all telemetry docs.
2 fork: skip rewrite for curl/wget pipes Re-applied patch. Upstream's is_pipe_incompatible list still excludes curl/wget, so piping their output to jq/python would be broken by rtk curl's JSON compression.
3 fork: skip rewrite for shell function definitions Re-applied patch. Upstream still has no guard against rewriting commands inside name() { … } bodies, which produces invalid rtk curl references at function-invocation time. Bug report preserved at bug-reports/2026-03-31-…md.
4 fork: bump version to 0.38.0-algolia.1 Distinguishes this build from upstream's v0.38.0.
5 fork: tighten semgrep rules — network access banned Removed the ureq carve-out (it had no caller anymore). Now any network primitive is a hard error.
6 fork: rebrand and purge upstream references Sed-style rewrite of rtk-ai/rtkalgolia/rtk in install instructions, rtk-ai.app → algolia repo, removed Homebrew section / Star History / Core team blocks, deleted upstream's docs-pipeline-contract.md. CHANGELOG historical references intentionally untouched.
7 docs: restore Fork Policy + Upstream Catchup Procedure in CLAUDE.md These sections were originally added in 653a29e (March 2026) but lost during a previous catchup. Restoring them prevents the next catchup from accidentally going merge-based.
8 fork: post-rebuild cleanup Cargo.lock auto-pruned 387 lines of stale ureq transitive deps; #[allow(dead_code)] on 16 telemetry-orphan helpers in tracking.rs; cargo fmt fix in config.rs.

Patches re-validated against upstream v0.38.0

Per the policy, every catchup re-tests fork patches against the new
base. Status of our 2 bug-fix patches:

  • curl/wget pipe rewrite skip: Upstream's tokenizer-based rewriter
    (src/discover/registry.rs) marks find and fd as
    pipe-incompatible. Curl and wget are not on that list. Reproducer
    from bug-reports/2026-03-31-curl-python3-not-found-in-shell-functions.md
    still fails on a clean upstream build → patch still required.
  • Shell function-definition skip: Upstream's rewrite_command() has
    no equivalent guard. Same reproducer also covers this case →
    patch still required.

Neither patch was made obsolete by upstream changes.

Verification

Built and tested locally on aarch64-apple-darwin with rustc 1.95.0
(matches CI's dtolnay/rust-toolchain@stable):

  • cargo build — passes; auto-pruned 387 lines of stale ureq
    transitive deps from Cargo.lock (now committed)
  • cargo test --all1679 passed, 0 failed, 6 ignored (the 6
    ignored are upstream's pre-existing #[ignore] tests)
  • All 8 fork-specific tests pass (4 curl/wget pipe + 4 shell-function
    detection)
  • cargo clippy --all-targets -- -D unsafe_code — passes (matches CI
    exactly)
  • cargo fmt --all --check — clean
  • Fork hygiene grep — clean (only historical CHANGELOG entry remains)

40 dead-code warnings remain. Verified that all 40 are present on a
clean upstream/master checkout — none introduced by this PR.

What I deliberately did NOT carry forward

  • src/parser/error.rs — was added in upstream commit edab4e1 and
    later dropped by upstream; was dead code (#[allow(dead_code)], no
    callers) in our main.
  • ROADMAP.md — algolia roadmap, but content was stale (talks about
    "establishing the fork as the standard if upstream remains inactive";
    upstream is very active). Drop in line with the "thin fork"
    principle. Easy to restore in a follow-up if desired.
  • docs/TROUBLESHOOTING.md — algolia-specific install URLs, but
    upstream now has docs/guide/resources/troubleshooting.md at the
    new canonical location (rebranded with algolia URLs in commit 6).
    One troubleshooting doc instead of two.
  • docs/filter-workflow.md, docs/images/gain-dashboard.jpg,
    .claude/skills/performance.md (replaced by performance/SKILL.md
    upstream).

If any of these turn out to matter, they're trivial to add back in a
follow-up PR.

Why this approach (vs merging upstream into main)

A previous draft of this PR used a 3-way merge from upstream/master
into main, producing 193 files / +22,374 / −6,964. Throwing it
away and re-forking from upstream/master (this PR) produced
53 files / +401 / −1,980 — same end state, ~30× smaller diff
against upstream, every patch in its own commit.

More importantly, the re-fork forced explicit patch re-validation
(each fork patch had to be re-applied with a justification), instead
of letting them be carried forward by inertia. That matches the
documented policy and the admin's note that "bugfixes are often
superseded by origin fixes".

The Fork Policy + Upstream Catchup Procedure restored in commit 7
documents this so the next catchup doesn't have to rediscover the
strategy.

CI gate

  • cargo build
  • cargo clippy --all-targets
  • cargo test --all — including the 8 new fork tests in
    src/discover/registry.rs
  • cargo fmt --check
  • Fork hygiene grep returns zero (CHANGELOG historical refs
    excluded):
    bash rg -i 'brew install rtk[^-]|rtk-ai\.app|contact@rtk|"rtk 0\.\d+\.\d+"' --glob '*.md' --glob '*.rb' --glob '!CHANGELOG.md'

aeppling and others added 30 commits April 3, 2026 09:03
fix(git): inherit stdin for commit and push to preserve SSH signing (rtk-ai#733)
Also fixes flaky test_rewrite_rtk_disabled_warns_on_stderr: the subprocess part relied on target/debug/rtk being current, but cargo test doesn't rebuild the standalone binary.

Added mtime guard -> skips when binary is older than test executable.
fix(grep): close subprocess stdin to prevent memory leak (rtk-ai#897)
Signed-off-by: em0t <10153971+em0t@users.noreply.github.com>
Signed-off-by: em0t <10153971+em0t@users.noreply.github.com>
Signed-off-by: em0t <10153971+em0t@users.noreply.github.com>
Telemetry details are already documented in the README privacy section.
No need to print them during init.

Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
…rics

Add 6 new fields to the anonymous daily telemetry ping to help identify
which commands need filters and which filters need improvement:

- passthrough_top: top 5 commands with 0% savings (missing filters)
- parse_failures_24h: count of parse failures (filter fragility)
- low_savings_commands: commands averaging <30% savings (weak filters)
- avg_savings_per_command: unweighted average savings
- hook_type: which AI agent hook is installed (claude/gemini/codex/etc)
- custom_toml_filters: count of user-defined TOML filter files

New tracking.rs queries: top_passthrough(), parse_failures_since(),
low_savings_commands(), avg_savings_per_command().

Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
Extend the daily anonymous ping with product-piloting metrics:

Retention: first_seen_days, active_days_30d, commands_total
Ecosystem: ecosystem_mix (category distribution percentages)
Economics: tokens_saved_30d, estimated_savings_usd_30d
Config: has_config_toml, exclude_commands_count, projects_count
Features: meta_usage (gain, discover, proxy, verify, learn counts)

Update README.md privacy section with full field-by-field table
explaining what is collected and why it helps improve RTK.

Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
Comprehensive telemetry documentation covering:
- Why we collect (roadmap prioritization, filter quality, value measurement)
- How it works (daily ping, background thread, fire-and-forget)
- Every field with example values and purpose
- What is NOT collected (explicit exclusion list)
- Opt-out instructions
- Data handling and privacy guarantees
- Contributor guide for adding new fields

Link added from README.md privacy section.

Signed-off-by: Patrick szymkowiak <patrick.szymkowiak@innovtech.eu>
fix(go): preserve failing test location context
fix(git): preserve full diff hunk headers
fix(init): remove opt-out instruction from telemetry message
Signed-off-by: em0t <10153971+em0t@users.noreply.github.com>
fix(gh): pass through gh pr merge instead of canned response (rtk-ai#938)
Signed-off-by: em0t <10153971+em0t@users.noreply.github.com>
fix(go): avoid false build errors from download logs
- reset_all() now clears both `commands` and `parse_failures` atomically
  via execute_batch (fixes incomplete reset reported by reviewer)
- Add [y/N] confirmation prompt before --reset; mirrors init.rs pattern
- Add --yes flag to skip prompt in scripts/CI
- Add unit test: test_reset_all_clears_both_tables verifies both tables
  are empty after reset

https://claude.ai/code/session_015WXbDdxEU3pXPBtheDkXcd
Avoids raw ANSI codes when stdout is piped, consistent with all
other output in gain.rs.

https://claude.ai/code/session_015WXbDdxEU3pXPBtheDkXcd
Tracker::new() uses a shared on-disk SQLite DB. Running reset_all()
in a parallel test suite wipes records inserted by other concurrent
tests, causing intermittent failures.

Add Tracker::new_in_memory() (cfg(test) only) backed by an isolated
Connection::open_in_memory(), and switch test_reset_all_clears_both_tables
to use it. Extracted init_schema() helper to avoid duplicating DDL.

https://claude.ai/code/session_015WXbDdxEU3pXPBtheDkXcd
Signed-off-by: Nicolas Le Cam <niko.lecam@gmail.com>
niklasmarderx and others added 27 commits April 26, 2026 13:21
fix(stream): add semgrep flag for sh tests
hotfix(cicd): add git app token for release please
hotfix(cicd): add git app for release please token
hotfix(cicd): github app token for write actions
…master--components--rtk

chore(master): release 0.38.0
Algolia fork policy: no phone-home, no remote calls.

Removed:
- src/core/telemetry.rs (the daily ping)
- src/core/telemetry_cmd.rs (rtk telemetry CLI subcommands)
- TelemetryConfig struct + tests in src/core/config.rs
- Telemetry CLI variant + match arm in src/main.rs
- maybe_ping() call at the top of run_cli()
- prompt_telemetry_consent() / save_telemetry_consent() in hooks/init.rs
  and the call from rtk init -g
- ureq dep from Cargo.toml
- RTK_TELEMETRY_URL / RTK_TELEMETRY_TOKEN env vars from release.yml
- docs/TELEMETRY.md, docs/guide/resources/telemetry.md
- Privacy & Telemetry sections in README.md, DISCLAIMER.md,
  docs/usage/FEATURES.md, docs/usage/TRACKING.md,
  docs/guide/getting-started/configuration.md, docs/guide/index.md
- telemetry references in src/core/README.md, docs/contributing/*,
  SECURITY.md, .claude/rules/search-strategy.md
When curl or wget output is piped to jq/python/grep, downstream consumers
expect raw JSON/HTML/bytes. rtk's curl/wget filter compresses output and
breaks those pipes.

Extend the pipe-incompatible list (which already protected find/fd) to
include curl and wget. Without a pipe, both still get rewritten normally.

Verified against upstream/master v0.38.0: upstream's pipe-incompatible
list still excludes curl/wget, so this patch is still required.
When the AI tool's bash command defines a shell function that uses curl,
python3, or other rewritable commands, rtk's rewriter rewrites them
inside the function body. The rewritten reference (e.g. \`rtk curl\`)
is then invalid at function-invocation time because rtk-as-a-rewriter
only intercepts top-level commands.

Detect three forms of function definition and bail early:
  - name() {       (POSIX form, single-line)
  - name() \n {    (POSIX form, multi-line)
  - function name  (bash/zsh keyword form)

Original bug report preserved at:
  bug-reports/2026-03-31-curl-python3-not-found-in-shell-functions.md

Verified against upstream/master v0.38.0: upstream has no equivalent
guard, so this patch is still required.
Distinguishes the algolia fork build from upstream rtk-ai/rtk v0.38.0.
The .release-please-manifest.json is updated to match.

Note: the lockfile still has stale ureq + transitive entries; cargo will
prune them on the first build (the rtk package no longer depends on
ureq after the telemetry strip).
Upstream's semgrep rules carved out an exception for ureq inside the
telemetry module. With telemetry stripped from the fork, that exception
is dead and confusing. Tighten the rules:

- raw-socket-usage / reqwest-forbidden: rephrase rationale to "network
  access is banned in this fork".
- ureq-outside-telemetry → ureq-forbidden: ureq has no legitimate
  caller in the fork, so any usage is now a hard error (not just
  "outside telemetry").
Apply the fork-hygiene table (CLAUDE.md):
- rtk-ai/rtk → algolia/rtk in install instructions, repo URLs, badges,
  CI links, hook scripts, install.sh, openclaw plugin metadata, hook
  fallback URLs, .rtk/filters.toml docs link
- rtk-ai.app website link → algolia/rtk repo URL or local doc reference
- contact@rtk-ai.app → #proj-internal-skills on Slack
- security@rtk-ai.app → security@rtk-ai.dev (existing fork convention)
- Hardcoded "rtk 0.28.2" → reference Cargo.toml version
- master branch references → main (algolia default branch)

Removals:
- README.md: Homebrew install section + badge (no algolia tap),
  Star History + StarMapper widgets (upstream community), upstream
  Core team listing, upstream Privacy & Telemetry section
- localized READMEs: rtk-ai.app website link in navbar
- .github/docs-pipeline-contract.md: upstream's docs website pipeline,
  doesn't apply to the fork

CHANGELOG.md is intentionally untouched — historical commit references
to rtk-ai/rtk are accurate attribution and don't violate the policy
(the rule bans upstream URLs in install instructions, not in history).

Source code references in src/* (test fixtures using "rtk-ai/rtk" as
test data, comments referencing upstream issue numbers) are also kept
as-is — they're not user-facing install paths.

Hygiene grep clean: only the historical CHANGELOG entry remains.
These sections were originally added in 653a29e (March 2026) but were
lost during a previous catchup. Restoring them prevents the next
catchup from accidentally reverting to merge-based sync (which
accumulates conflict-resolution noise and silently re-applies patches
upstream has fixed).

Sections added:
- Fork Policy (what we change / what we don't)
- Upstream Catchup Procedure (the re-fork strategy: branch off
  upstream/master, strip telemetry, rebrand, re-test bug-reports,
  bump version, then PR/force-push)

Sections preserved/restored:
- Fork Hygiene (Mandatory) — the pre-commit grep + banned patterns
  table that already existed in origin/main but is not in upstream
- Plan Execution Protocol — kept as-is from upstream
Three follow-ups discovered during cargo build/test verification:

- Cargo.lock: auto-pruned 387 lines of ureq + transitive deps
  (base64, displaydoc, form_urlencoded, icu_*, etc.) on first
  cargo build — the rtk package no longer depends on ureq after
  the telemetry strip in commit 1.

- src/core/tracking.rs: #[allow(dead_code)] on 16 telemetry-orphan
  helpers (count_commands_since, top_commands, overall_savings_pct,
  total_tokens_saved, tokens_saved_24h, top_passthrough,
  parse_failures_since, low_savings_commands, avg_savings_per_command,
  count_meta_command, first_seen_days, active_days_30d,
  commands_total, ecosystem_mix, tokens_saved_30d, projects_count)
  plus the free fn categorize_command. Matches the previous strip
  commit's policy ("Add #[allow(dead_code)] to tracking methods
  only called by telemetry").

- src/core/config.rs: cargo fmt fix — one stray blank line at the
  end of the tests module, left over from deleting the telemetry
  tests in commit 1.

Verification on this branch: 1679 tests pass, 0 failures, 0
unsafe_code clippy errors, cargo fmt clean.
@princess-myra princess-myra marked this pull request as draft May 7, 2026 16:28
@princess-myra
Copy link
Copy Markdown
Collaborator Author

Once reviewed
we'll need to override main with this !

RTK_DISABLED=1 git fetch origin
RTK_DISABLED=1 git checkout main
RTK_DISABLED=1 git reset --hard origin/fork/upstream-realign-v0.38.0
RTK_DISABLED=1 git push --force-with-lease origin main

@princess-myra princess-myra marked this pull request as ready for review May 7, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.