Replace dead wikitext s3 link in quicktour with HF dataset mirror#2064
Open
adityasingh2400 wants to merge 1 commit into
Open
Replace dead wikitext s3 link in quicktour with HF dataset mirror#2064adityasingh2400 wants to merge 1 commit into
adityasingh2400 wants to merge 1 commit into
Conversation
The wikitext-103-raw-v1.zip URL in docs/source-doc-builder/quicktour.mdx has 301-redirected to a dead host since the research.metamind.io takedown; PR huggingface#1846 fixed the related blog link but not the actual download flow. Switch the example to pull the dataset from Salesforce/wikitext on the Hub so the quicktour runs end-to-end again. Fixes huggingface#1625 Fixes huggingface#1683
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
wgetURL indocs/source-doc-builder/quicktour.mdxpoints ats3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip,which has been dead since the
research.metamind.iotakedown. It now301-redirects to a host that no longer responds (or returns 403 directly
on the S3 path, as reported in #1683). Anyone following the quicktour
hits the dead link on the very first command.
PR #1846 fixed the related Salesforce blog link in the surrounding prose
but not the actual download command, so both #1625 and #1683 are still
open.
This swaps the URL to the canonical
wikitext-103-raw-v1.zipmirroredon the Hub at
huggingface.co/datasets/mattdangerw/wikitext-103-raw.The archive contents and layout are identical, so the existing
unzip wikitext-103-raw-v1.zipstep and all downstream paths(
data/wikitext-103-raw/wiki.{train,test,valid}.rawintest_quicktour.py,test_pipeline.py, anddocumentation.rs) keepworking without further changes.
Verification
The Hub redirects to its CDN and serves the full 192 MB zip with the
expected filename.
Fixes #1625
Fixes #1683