feat(video): translate a shared video's captions to the learner's language (v1.5)#647
Open
mircealungu wants to merge 2 commits into
Open
feat(video): translate a shared video's captions to the learner's language (v1.5)#647mircealungu wants to merge 2 commits into
mircealungu wants to merge 2 commits into
Conversation
Tables to hold per-(video, target_language, target_cefr) translated subtitles for a shared video. Per-segment translation preserves the original Caption.time_start/time_end so the reader's timing/sync logic is unchanged — only the rendered text is in the learner's language. - caption_translation_set: the bundle, with status (pending/translating/ready/error) for the async job, error_message, and a UNIQUE(video_id, target_language_id, cefr_level) so a second request for the same target deduplicates instead of re-translating. - caption_translation: one row per original Caption inside a set, pointing at a NewText row for the translated content. UNIQUE(set_id, caption_id) so retried jobs resume cleanly. Mirrors the DailyAudioLesson ↔ DailyAudioLessonSegment shape already in the codebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guage
Per the v1.5 plan: when a learner shares a YouTube video whose captions are in a different
language, offer to translate the captions to the learner's language at their CEFR level,
preserving the original per-segment timing so the existing interactive reader (tap-to-translate,
bookmarks, time-synced highlight) keeps working unchanged. Audio is unaffected; only the reading
surface changes.
- New service core/llm_services/caption_translation_service.translate_set(set_id):
batches ~30 captions per Haiku call with structured JSON output (numeric markers), falls
back to per-caption translation when a batch's parsing or alignment fails so partial LLM
failures degrade gracefully instead of zeroing the set. Reuses the existing haiku_client.
- New endpoints in api/endpoints/caption_translation.py:
- POST /video/<id>/translate_captions — find_or_create the set, kick off the background
job via run_in_background, return 202 + set dict. Idempotent.
- GET /video/<id>/translate_captions/status?set_id= — for the reader's polling loop.
- Extended /user_video to accept optional caption_set_id; when the set is ready and belongs
to the requested video, Video.video_info substitutes translated text + retokenises in the
target language. context_identifier still references the original caption id so bookmark
anchoring is stable across track switches. If the set isn't ready, we silently serve the
original captions — the reader's separate status poll drives the eventual refetch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



v1.5 of share-to-video — when a shared YouTube video's captions are in a different language than what the learner is studying, offer to translate them in place at the learner's CEFR level. Per-segment translation preserves the original
time_start/time_end, so the existing interactive reader (tap-to-translate, bookmarks, time-synced highlight) keeps working unchanged. Audio is unaffected; only the reading surface changes. This is the video analogue of the article share-flow's translate-and-adapt option.Pairs with zeeguu/web feat/translated-subtitles which adds the banner / progress / Original-Translated switcher to
VideoPlayer.js. Independently revertible from the upstream v1 PR #635.Changes
Commit 1 — data model + migration
caption_translation_set(UNIQUE(video_id, target_language_id, cefr_level)) holds the async job's status;caption_translation(one row per originalCaption) holds the translatedNewText. Timings stay on the parentCaptionso we don't duplicate them.DailyAudioLesson↔DailyAudioLessonSegmentpattern already in the codebase.Commit 2 — service + endpoints +
/user_videoextensioncore/llm_services/caption_translation_service.translate_set(set_id): batches ~30 captions per Haiku call with structured JSON output (numeric markers), falls back per-caption when a batch's parsing or alignment fails so partial LLM failures degrade gracefully (untranslated lines fall back to the original text in the reader, instead of zeroing the whole set). Reuses the existinghaiku_client.POST /video/<id>/translate_captions— idempotent find_or_create +run_in_background(translate_set, ...), returns202+ set dict.GET /video/<id>/translate_captions/status?set_id=— for the reader's polling loop.GET /user_videoto accept optionalcaption_set_id. When the set is ready and belongs to the requested video,Video.video_infosubstitutes translated text and retokenises in the target language.context_identifierstill references the original caption id so bookmark anchoring is stable across track switches. If the set isn't ready, we silently serve the original captions — the reader's status poll drives the eventual refetch (no 4xx during a known-async wait).Migration
tools/migrations/26-05-31-a--add_caption_translation.sql— creates both tables with the right FKs and unique keys.Out of scope (captured for later)
A more speculative idea — generating TTS audio in the learner's language over a muted YouTube embed — was analysed and deferred: see
docs/future-work/dubbed-audio-from-shared-video.mdfor the full feasibility + copyright write-up. Translated subtitles alone avoid the derivative-work question entirely and capture most of the UX win.Testing
from zeeguu.core.model import CaptionTranslationSet, CaptionTranslation.🤖 Generated with Claude Code