Last Updated: 2025-11-15 Current Phase: Phase 4 Complete - Core Application Functional
SERVERLESS ARCHITECTURE This is a fully serverless application with NO backend server. All resources, progress tracking, vocabulary data, and user settings are stored in the browser (LocalStorage/IndexedDB) and persist across browser sessions.
Smart Nihongo Learner is a fully functional Japanese vocabulary learning app that teaches words through collocations (word pairs) using spaced repetition. The app is deployed as a static web application with all data stored in the browser.
- 1,342 JLPT words (703 N5 + 663 N4) with frequency data
- Pre-computed collocation database with 1,000+ word pairs
- 4 game modes (Verb→Noun, Adjective→Noun, Noun→Verb, Noun→Adjective)
- Intelligent SRS algorithm with 6-tier priority system
- Context hints for every collocation
- Progress tracking with maturity levels
- Study list filtering by JLPT level (N5 or N54)
Status: 100% Complete
Completed Items:
- ✅ Vocabulary collection: 703 N5 + 663 N4 = 1,342 unique words
- ✅ Manual grammatical type classification for all words
- ✅ Frequency data integration using wordfreq library (100% coverage)
- ✅ CSV files created: N5.csv, N4.csv, N54.csv
- ✅ Collocation database generation (1,000+ pairs)
- ✅ Context hints created for forward mode (verb/adj → noun)
- ✅ Context hints created for reverse mode (noun → verb/adj)
- ✅ Data validation and quality assurance
Files Created:
data-preparation/input/N5.csv- 703 words with frequencydata-preparation/input/N4.csv- 663 words with frequencydata-preparation/input/N54.csv- 1,342 combined unique wordspublic/data/vocabulary.json- Full vocabulary databasepublic/data/collocations_complete.json- Collocation pairspublic/data/collocation_hints.json- Forward mode hintspublic/data/reverse_hints.json- Reverse mode hints
Python Scripts:
scrape_kanshudo_frequency.py- Routledge 5000 scraperadd_wordfreq_data.py- Frequency integrationfrequency_summary.py- Statistical analysisfix_hint_quotes.py- JSON escaping fixes (fixed 101+ malformed entries)
Status: 100% Complete
Completed Items:
- ✅ React 18.3.1 + Vite project initialized
- ✅ Material UI 6.3.0 with dark theme configured
- ✅ Storage service implemented (IndexedDB via Dexie + LocalStorage)
- ✅ Data models created (Vocabulary, Collocation, Progress)
- ✅ Data loader service with cache-busting for hints
- ✅ Furigana display component
- ✅ Japanese text input component (system IME)
- ✅ API key management (encrypted storage)
Key Files:
src/App.jsx- Main application component with routingsrc/services/storage.js- Browser-based persistencesrc/services/dataLoader.js- Load vocabulary and hintssrc/components/ui/FuriganaText.jsx- Furigana renderingsrc/components/ui/AnswerInput.jsx- Japanese input handlingsrc/theme.js- Material UI dark theme
Technical Achievements:
- All data persists across browser sessions (even when shut down)
- Encrypted API key storage using browser fingerprint
- Cache-busting for updated hints files
- IndexedDB for large datasets, LocalStorage for settings
Status: 100% Complete
Completed Items:
- ✅ Anki-style SM-2 spaced repetition algorithm implemented
- ✅ Word-level progress tracking (reviewCount, correctCount, interval)
- ✅ Collocation-level progress tracking (correct/incorrect pairs)
- ✅ 6-tier priority system for word selection:
- Failed (interval=0 or correctCount=0) - Highest priority
- Learning (correctCount < 3 or interval < 3)
- Due (past nextReview date)
- Young (level < 5)
- Mature (level ≥ 5)
- New (never practiced) - Lowest priority
- ✅ Maturity levels based on consecutive correct answers:
- Learning: <3 consecutive correct
- Young: 3-5 consecutive correct
- Mature: 6-10 consecutive correct
- Mastered: >10 consecutive correct
- ✅ Statistics tracking (total reviews, correct answers, mastery breakdown)
- ✅ SRS statistics count words in ANY context (main word OR collocation answer)
Key Files:
src/services/srs.js- SM-2 algorithm implementationsrc/services/collocation.js- Word selection logic with SRS prioritygetRecommendedPracticeWords()- 6-tier priority selectiongetSRSStatisticsForLevel()- Statistics with any-context counting
Technical Achievements:
- Words practiced as main word OR in collocations both count toward mastery
- Priority system ensures struggling words appear more frequently
- Consecutive correct count (not total) determines maturity
- Main word reviewCount takes precedence over collocation counts
Status: 100% Complete
Completed Items:
- ✅ Game component fully implemented with all modes
- ✅ 4 game modes supported:
- Forward: Verb→Noun, Adjective→Noun
- Reverse: Noun→Verb, Noun→Adjective
- ✅ Answer validation with reading/kanji matching
- ✅ Furigana display for all Japanese text
- ✅ Context hints system (forward and reverse)
- ✅ Skip functionality for difficult words
- ✅ Results screen with detailed breakdown:
- Correct answers (green chips)
- Skipped words (yellow chips with hints)
- Bonus words discovered (blue chips)
- ✅ Progress tracking integration (word + collocation level)
- ✅ Study list filtering by JLPT level (N5 or N54)
- ✅ Game completion logic (found + skipped = total)
- ✅ Smart duplicate reading handling (prioritizes unfound words)
- ✅ Reset SRS functionality (clear all progress)
Key Files:
src/components/games/WhatCouldMatch.jsx- Main game componentsrc/services/collocation.js- Collocation matching logicgetLimitedNounMatchesWithProgress()- Get matches with SRS filteringgetLimitedVerbOrAdjectiveMatches()- Reverse mode matching
Game Flow:
- SRS selects word based on 6-tier priority
- Game loads collocation matches filtered by study list
- User enters answers using system Japanese IME
- Answer matching checks exact kanji, then reading (prioritizing unfound)
- Hints displayed for current word
- User can skip words they don't know
- Results show correct, skipped, and bonus words
- Progress updated for word and all collocation pairs
Technical Achievements:
- Removed wanakana auto-conversion (users use system IME)
- Cache-busting for hints ensures latest data loaded
- Study list filtering applied to ALL collocation matches
- Smart reading matching prevents duplicate marks
- Game completion checks found + skipped (not just found)
Status: Not Implemented
Planned Features:
- Sentence generation using OpenAI API
- Blank insertion for target word
- Answer validation against collocation database
- Template sentences as fallback
- SRS integration
- ✅ Fixed: Count words in any context (main OR collocation)
- ✅ Fixed: Maturity based on consecutive correct (not total practice)
- ✅ Fixed: Main word reviewCount takes precedence over collocation counts
- ✅ Implemented: 6-tier SRS priority system
- ✅ Fixed: Game completion logic (found + skipped = total)
- ✅ Fixed: Duplicate readings marking multiple words
- ✅ Fixed: N5 filtering applied to collocation matches
- ✅ Fixed: Bonus words appearing twice on results page
- ✅ Removed: Wanakana auto-conversion (use system IME)
- ✅ Removed: Confusing collocation line from skipped words
- ✅ Fixed: 101+ JSON escaping issues in hints files
- ✅ Fixed: Hint truncation from malformed quotes
- ✅ Added: Cache-busting for hints files
- ✅ Fixed: Compilation errors from empty else blocks
- ✅ Updated: Maturity level labels reflect correct-count system
- ✅ Added: Extensive debug logging for game flow
- ✅ Improved: Continue button functionality
- ✅ Fixed: Results page not appearing at game end
- React 18.3.1 with hooks (useState, useEffect)
- Vite 6.0.3 for dev server and building
- Material UI 6.3.0 with dark theme
- Wanakana 5.3.1 for answer matching only (not input conversion)
LocalStorage:
├── openai_key (encrypted)
└── No other localStorage usage
IndexedDB (via Dexie):
├── vocabulary (id, japanese, type, frequency)
│ └── 1,342 words loaded on first run
├── collocations (++id, word, type, matches)
│ └── 1,000+ pairs loaded on first run
├── wordProgress (wordId, level, nextReview, correctCount)
│ └── Tracks SRS for main words
├── collocationProgress (pairId, correct, incorrect)
│ └── Tracks performance for word pairs
└── settings (key, value)
└── Statistics and user preferences
App.jsx
├── Loads vocabulary from IndexedDB (or initializes from JSON)
├── Loads collocations from IndexedDB (or initializes from JSON)
├── Loads hints from JSON (cached in memory)
├── Calculates SRS statistics
└── Renders WhatCouldMatch game component
WhatCouldMatch.jsx
├── Receives word from SRS selection
├── Loads collocation matches filtered by study list
├── Displays word with furigana
├── Accepts user answers (system IME)
├── Validates answers (exact kanji → reading → unfound priority)
├── Shows hints for current word
├── Tracks found/skipped/bonus words
├── Shows results when found + skipped = total
└── Updates progress in IndexedDB
- Initial page load: ~1-2 seconds
- Data initialization (first run): ~3-4 seconds (loading 1,342 words + 1,000+ collocations)
- Subsequent loads: <1 second (from IndexedDB)
- Game word selection: <100ms
- Answer validation: <50ms
- vocabulary.json: ~200KB
- collocations_complete.json: ~150KB
- collocation_hints.json: ~100KB
- reverse_hints.json: ~80KB
- IndexedDB storage: ~2-3MB (includes progress)
- No Fill the Blanks game yet - Phase 5 not implemented
- OpenAI integration optional - API key not required for core gameplay
- No audio pronunciation - Text-only learning
- No export/import progress - Manual backup not available yet
- Desktop-focused - Mobile UX could be improved
- Single study list - Can't create custom word lists yet
- ✅ Documentation updates (COMPLETE)
- Testing and bug fixes based on user feedback
- UI/UX polish and refinements
- Implement sentence generation (OpenAI or templates)
- Create blank insertion logic
- Build answer validation
- Integrate with collocation database
- Add SRS tracking
- Export/import progress functionality
- Audio pronunciation integration
- Custom word list creation
- Mobile app development (React Native)
- Community features (shared lists)
- Additional JLPT levels (N3, N2, N1)
- Node.js 16.x or higher
- npm 7.x or higher
- Modern browser (Chrome, Firefox, Safari, Edge)
git clone https://github.com/yourusername/SmartNihongoLearner.git
cd SmartNihongoLearner
npm installnpm run dev # Starts on http://localhost:5173
npm run dev -- --port 3000 # Custom portnpm run build # Creates dist/ folder
npm run preview # Preview production build- wordfreq Library - Multi-corpus frequency data
- JLPT Official Word Lists - N5 and N4 vocabulary
- Kanshudo Routledge Collection - Japanese frequency database
- SM-2 Algorithm - Spaced repetition
- Material UI - React component library
- Dexie.js - IndexedDB wrapper
- Wanakana - Japanese text utilities
- Virtual environment:
/home/alessandro/.virtualenvs/SmartNihongoLearner - Dependencies:
requests,beautifulsoup4,wordfreq[cjk]
a73f1e8 - Fix homophone collision in answer matching
422ec44 - Add reverse game modes, reset SRS button, and comprehensive hint system
1f2555f - Fix critical bugs and improve error handling throughout the app
b819c8e - Implement What Could Match game with full gameplay flow
e218793 - Implement core services and data models for serverless architecture
- Alessandro - Project creator and main developer
- Claude Code - AI assistant for implementation
Project Status: ✅ Core Application Functional
The app is fully functional with vocabulary learning through collocations, SRS-based word selection, 4 game modes, progress tracking, and comprehensive hints system. Ready for testing and user feedback.