feat(video): translate a shared video's captions to the learner's language (v1.5) by mircealungu · Pull Request #647 · zeeguu/api

mircealungu · 2026-05-31T19:09:47Z

v1.5 of share-to-video — when a shared YouTube video's captions are in a different language than what the learner is studying, offer to translate them in place at the learner's CEFR level. Per-segment translation preserves the original time_start/time_end, so the existing interactive reader (tap-to-translate, bookmarks, time-synced highlight) keeps working unchanged. Audio is unaffected; only the reading surface changes. This is the video analogue of the article share-flow's translate-and-adapt option.

Pairs with zeeguu/web feat/translated-subtitles which adds the banner / progress / Original-Translated switcher to VideoPlayer.js. Independently revertible from the upstream v1 PR #635.

Changes

Commit 1 — data model + migration

caption_translation_set (UNIQUE(video_id, target_language_id, cefr_level)) holds the async job's status; caption_translation (one row per original Caption) holds the translated NewText. Timings stay on the parent Caption so we don't duplicate them.
Mirrors the DailyAudioLesson ↔ DailyAudioLessonSegment pattern already in the codebase.

Commit 2 — service + endpoints + /user_video extension

core/llm_services/caption_translation_service.translate_set(set_id): batches ~30 captions per Haiku call with structured JSON output (numeric markers), falls back per-caption when a batch's parsing or alignment fails so partial LLM failures degrade gracefully (untranslated lines fall back to the original text in the reader, instead of zeroing the whole set). Reuses the existing haiku_client.
POST /video/<id>/translate_captions — idempotent find_or_create + run_in_background(translate_set, ...), returns 202 + set dict.
GET /video/<id>/translate_captions/status?set_id= — for the reader's polling loop.
Extended GET /user_video to accept optional caption_set_id. When the set is ready and belongs to the requested video, Video.video_info substitutes translated text and retokenises in the target language. context_identifier still references the original caption id so bookmark anchoring is stable across track switches. If the set isn't ready, we silently serve the original captions — the reader's status poll drives the eventual refetch (no 4xx during a known-async wait).

Migration

tools/migrations/26-05-31-a--add_caption_translation.sql — creates both tables with the right FKs and unique keys.

Out of scope (captured for later)

A more speculative idea — generating TTS audio in the learner's language over a muted YouTube embed — was analysed and deferred: see docs/future-work/dubbed-audio-from-shared-video.md for the full feasibility + copyright write-up. Translated subtitles alone avoid the derivative-work question entirely and capture most of the UX win.

Testing

Compiles cleanly; models register via from zeeguu.core.model import CaptionTranslationSet, CaptionTranslation.
Not yet exercised end-to-end on a real video (no client wired in this PR). The companion web PR + a local run will close the loop.

🤖 Generated with Claude Code

Tables to hold per-(video, target_language, target_cefr) translated subtitles for a shared video. Per-segment translation preserves the original Caption.time_start/time_end so the reader's timing/sync logic is unchanged — only the rendered text is in the learner's language. - caption_translation_set: the bundle, with status (pending/translating/ready/error) for the async job, error_message, and a UNIQUE(video_id, target_language_id, cefr_level) so a second request for the same target deduplicates instead of re-translating. - caption_translation: one row per original Caption inside a set, pointing at a NewText row for the translated content. UNIQUE(set_id, caption_id) so retried jobs resume cleanly. Mirrors the DailyAudioLesson ↔ DailyAudioLessonSegment shape already in the codebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…guage Per the v1.5 plan: when a learner shares a YouTube video whose captions are in a different language, offer to translate the captions to the learner's language at their CEFR level, preserving the original per-segment timing so the existing interactive reader (tap-to-translate, bookmarks, time-synced highlight) keeps working unchanged. Audio is unaffected; only the reading surface changes. - New service core/llm_services/caption_translation_service.translate_set(set_id): batches ~30 captions per Haiku call with structured JSON output (numeric markers), falls back to per-caption translation when a batch's parsing or alignment fails so partial LLM failures degrade gracefully instead of zeroing the set. Reuses the existing haiku_client. - New endpoints in api/endpoints/caption_translation.py: - POST /video/<id>/translate_captions — find_or_create the set, kick off the background job via run_in_background, return 202 + set dict. Idempotent. - GET /video/<id>/translate_captions/status?set_id= — for the reader's polling loop. - Extended /user_video to accept optional caption_set_id; when the set is ready and belongs to the requested video, Video.video_info substitutes translated text + retokenises in the target language. context_identifier still references the original caption id so bookmark anchoring is stable across track switches. If the set isn't ready, we silently serve the original captions — the reader's separate status poll drives the eventual refetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T19:10:37Z

ArchLens detected architectural changes in the following views:

mircealungu and others added 2 commits May 31, 2026 21:08

mircealungu mentioned this pull request May 31, 2026

feat(video): translated subtitles control on shared-video reader (v1.5) zeeguu/web#1155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(video): translate a shared video's captions to the learner's language (v1.5)#647

feat(video): translate a shared video's captions to the learner's language (v1.5)#647
mircealungu wants to merge 2 commits into
masterfrom
feat/translated-subtitles

mircealungu commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mircealungu commented May 31, 2026

Changes

Migration

Out of scope (captured for later)

Testing

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant