Skip to content

Latest commit

 

History

History
389 lines (307 loc) · 13.7 KB

File metadata and controls

389 lines (307 loc) · 13.7 KB

Project Status - Smart Nihongo Learner

Last Updated: 2025-11-15 Current Phase: Phase 4 Complete - Core Application Functional

SERVERLESS ARCHITECTURE This is a fully serverless application with NO backend server. All resources, progress tracking, vocabulary data, and user settings are stored in the browser (LocalStorage/IndexedDB) and persist across browser sessions.


Project Overview

Smart Nihongo Learner is a fully functional Japanese vocabulary learning app that teaches words through collocations (word pairs) using spaced repetition. The app is deployed as a static web application with all data stored in the browser.

Current Capabilities

  • 1,342 JLPT words (703 N5 + 663 N4) with frequency data
  • Pre-computed collocation database with 1,000+ word pairs
  • 4 game modes (Verb→Noun, Adjective→Noun, Noun→Verb, Noun→Adjective)
  • Intelligent SRS algorithm with 6-tier priority system
  • Context hints for every collocation
  • Progress tracking with maturity levels
  • Study list filtering by JLPT level (N5 or N54)

Implementation Status

✅ Phase 1: Data Preparation (COMPLETE)

Status: 100% Complete

Completed Items:

  • ✅ Vocabulary collection: 703 N5 + 663 N4 = 1,342 unique words
  • ✅ Manual grammatical type classification for all words
  • ✅ Frequency data integration using wordfreq library (100% coverage)
  • ✅ CSV files created: N5.csv, N4.csv, N54.csv
  • ✅ Collocation database generation (1,000+ pairs)
  • ✅ Context hints created for forward mode (verb/adj → noun)
  • ✅ Context hints created for reverse mode (noun → verb/adj)
  • ✅ Data validation and quality assurance

Files Created:

  • data-preparation/input/N5.csv - 703 words with frequency
  • data-preparation/input/N4.csv - 663 words with frequency
  • data-preparation/input/N54.csv - 1,342 combined unique words
  • public/data/vocabulary.json - Full vocabulary database
  • public/data/collocations_complete.json - Collocation pairs
  • public/data/collocation_hints.json - Forward mode hints
  • public/data/reverse_hints.json - Reverse mode hints

Python Scripts:

  • scrape_kanshudo_frequency.py - Routledge 5000 scraper
  • add_wordfreq_data.py - Frequency integration
  • frequency_summary.py - Statistical analysis
  • fix_hint_quotes.py - JSON escaping fixes (fixed 101+ malformed entries)

✅ Phase 2: Core Infrastructure (COMPLETE)

Status: 100% Complete

Completed Items:

  • ✅ React 18.3.1 + Vite project initialized
  • ✅ Material UI 6.3.0 with dark theme configured
  • ✅ Storage service implemented (IndexedDB via Dexie + LocalStorage)
  • ✅ Data models created (Vocabulary, Collocation, Progress)
  • ✅ Data loader service with cache-busting for hints
  • ✅ Furigana display component
  • ✅ Japanese text input component (system IME)
  • ✅ API key management (encrypted storage)

Key Files:

  • src/App.jsx - Main application component with routing
  • src/services/storage.js - Browser-based persistence
  • src/services/dataLoader.js - Load vocabulary and hints
  • src/components/ui/FuriganaText.jsx - Furigana rendering
  • src/components/ui/AnswerInput.jsx - Japanese input handling
  • src/theme.js - Material UI dark theme

Technical Achievements:

  • All data persists across browser sessions (even when shut down)
  • Encrypted API key storage using browser fingerprint
  • Cache-busting for updated hints files
  • IndexedDB for large datasets, LocalStorage for settings

✅ Phase 3: SRS System (COMPLETE)

Status: 100% Complete

Completed Items:

  • ✅ Anki-style SM-2 spaced repetition algorithm implemented
  • ✅ Word-level progress tracking (reviewCount, correctCount, interval)
  • ✅ Collocation-level progress tracking (correct/incorrect pairs)
  • 6-tier priority system for word selection:
    • Failed (interval=0 or correctCount=0) - Highest priority
    • Learning (correctCount < 3 or interval < 3)
    • Due (past nextReview date)
    • Young (level < 5)
    • Mature (level ≥ 5)
    • New (never practiced) - Lowest priority
  • Maturity levels based on consecutive correct answers:
    • Learning: <3 consecutive correct
    • Young: 3-5 consecutive correct
    • Mature: 6-10 consecutive correct
    • Mastered: >10 consecutive correct
  • ✅ Statistics tracking (total reviews, correct answers, mastery breakdown)
  • ✅ SRS statistics count words in ANY context (main word OR collocation answer)

Key Files:

  • src/services/srs.js - SM-2 algorithm implementation
  • src/services/collocation.js - Word selection logic with SRS priority
    • getRecommendedPracticeWords() - 6-tier priority selection
    • getSRSStatisticsForLevel() - Statistics with any-context counting

Technical Achievements:

  • Words practiced as main word OR in collocations both count toward mastery
  • Priority system ensures struggling words appear more frequently
  • Consecutive correct count (not total) determines maturity
  • Main word reviewCount takes precedence over collocation counts

✅ Phase 4: Game 1 - "What Could Match" (COMPLETE)

Status: 100% Complete

Completed Items:

  • ✅ Game component fully implemented with all modes
  • 4 game modes supported:
    • Forward: Verb→Noun, Adjective→Noun
    • Reverse: Noun→Verb, Noun→Adjective
  • ✅ Answer validation with reading/kanji matching
  • ✅ Furigana display for all Japanese text
  • ✅ Context hints system (forward and reverse)
  • ✅ Skip functionality for difficult words
  • ✅ Results screen with detailed breakdown:
    • Correct answers (green chips)
    • Skipped words (yellow chips with hints)
    • Bonus words discovered (blue chips)
  • ✅ Progress tracking integration (word + collocation level)
  • Study list filtering by JLPT level (N5 or N54)
  • ✅ Game completion logic (found + skipped = total)
  • ✅ Smart duplicate reading handling (prioritizes unfound words)
  • ✅ Reset SRS functionality (clear all progress)

Key Files:

  • src/components/games/WhatCouldMatch.jsx - Main game component
  • src/services/collocation.js - Collocation matching logic
    • getLimitedNounMatchesWithProgress() - Get matches with SRS filtering
    • getLimitedVerbOrAdjectiveMatches() - Reverse mode matching

Game Flow:

  1. SRS selects word based on 6-tier priority
  2. Game loads collocation matches filtered by study list
  3. User enters answers using system Japanese IME
  4. Answer matching checks exact kanji, then reading (prioritizing unfound)
  5. Hints displayed for current word
  6. User can skip words they don't know
  7. Results show correct, skipped, and bonus words
  8. Progress updated for word and all collocation pairs

Technical Achievements:

  • Removed wanakana auto-conversion (users use system IME)
  • Cache-busting for hints ensures latest data loaded
  • Study list filtering applied to ALL collocation matches
  • Smart reading matching prevents duplicate marks
  • Game completion checks found + skipped (not just found)

❌ Phase 5: Game 2 - "Fill the Blanks" (NOT STARTED)

Status: Not Implemented

Planned Features:

  • Sentence generation using OpenAI API
  • Blank insertion for target word
  • Answer validation against collocation database
  • Template sentences as fallback
  • SRS integration

Recent Bug Fixes & Improvements

Statistics & Progress Tracking

  • ✅ Fixed: Count words in any context (main OR collocation)
  • ✅ Fixed: Maturity based on consecutive correct (not total practice)
  • ✅ Fixed: Main word reviewCount takes precedence over collocation counts
  • ✅ Implemented: 6-tier SRS priority system

Game Mechanics

  • ✅ Fixed: Game completion logic (found + skipped = total)
  • ✅ Fixed: Duplicate readings marking multiple words
  • ✅ Fixed: N5 filtering applied to collocation matches
  • ✅ Fixed: Bonus words appearing twice on results page
  • ✅ Removed: Wanakana auto-conversion (use system IME)
  • ✅ Removed: Confusing collocation line from skipped words

Data Quality

  • ✅ Fixed: 101+ JSON escaping issues in hints files
  • ✅ Fixed: Hint truncation from malformed quotes
  • ✅ Added: Cache-busting for hints files
  • ✅ Fixed: Compilation errors from empty else blocks

UI/UX

  • ✅ Updated: Maturity level labels reflect correct-count system
  • ✅ Added: Extensive debug logging for game flow
  • ✅ Improved: Continue button functionality
  • ✅ Fixed: Results page not appearing at game end

Current Architecture

Frontend Stack

  • React 18.3.1 with hooks (useState, useEffect)
  • Vite 6.0.3 for dev server and building
  • Material UI 6.3.0 with dark theme
  • Wanakana 5.3.1 for answer matching only (not input conversion)

Storage Architecture

LocalStorage:
├── openai_key (encrypted)
└── No other localStorage usage

IndexedDB (via Dexie):
├── vocabulary (id, japanese, type, frequency)
│   └── 1,342 words loaded on first run
├── collocations (++id, word, type, matches)
│   └── 1,000+ pairs loaded on first run
├── wordProgress (wordId, level, nextReview, correctCount)
│   └── Tracks SRS for main words
├── collocationProgress (pairId, correct, incorrect)
│   └── Tracks performance for word pairs
└── settings (key, value)
    └── Statistics and user preferences

Data Flow

App.jsx
├── Loads vocabulary from IndexedDB (or initializes from JSON)
├── Loads collocations from IndexedDB (or initializes from JSON)
├── Loads hints from JSON (cached in memory)
├── Calculates SRS statistics
└── Renders WhatCouldMatch game component

WhatCouldMatch.jsx
├── Receives word from SRS selection
├── Loads collocation matches filtered by study list
├── Displays word with furigana
├── Accepts user answers (system IME)
├── Validates answers (exact kanji → reading → unfound priority)
├── Shows hints for current word
├── Tracks found/skipped/bonus words
├── Shows results when found + skipped = total
└── Updates progress in IndexedDB

Performance Metrics

Load Times

  • Initial page load: ~1-2 seconds
  • Data initialization (first run): ~3-4 seconds (loading 1,342 words + 1,000+ collocations)
  • Subsequent loads: <1 second (from IndexedDB)
  • Game word selection: <100ms
  • Answer validation: <50ms

Data Sizes

  • vocabulary.json: ~200KB
  • collocations_complete.json: ~150KB
  • collocation_hints.json: ~100KB
  • reverse_hints.json: ~80KB
  • IndexedDB storage: ~2-3MB (includes progress)

Known Limitations

  1. No Fill the Blanks game yet - Phase 5 not implemented
  2. OpenAI integration optional - API key not required for core gameplay
  3. No audio pronunciation - Text-only learning
  4. No export/import progress - Manual backup not available yet
  5. Desktop-focused - Mobile UX could be improved
  6. Single study list - Can't create custom word lists yet

Next Steps

Immediate Priorities

  1. ✅ Documentation updates (COMPLETE)
  2. Testing and bug fixes based on user feedback
  3. UI/UX polish and refinements

Phase 5: Fill the Blanks Game

  1. Implement sentence generation (OpenAI or templates)
  2. Create blank insertion logic
  3. Build answer validation
  4. Integrate with collocation database
  5. Add SRS tracking

Future Enhancements

  • Export/import progress functionality
  • Audio pronunciation integration
  • Custom word list creation
  • Mobile app development (React Native)
  • Community features (shared lists)
  • Additional JLPT levels (N3, N2, N1)

Development Environment

Prerequisites

  • Node.js 16.x or higher
  • npm 7.x or higher
  • Modern browser (Chrome, Firefox, Safari, Edge)

Installation

git clone https://github.com/yourusername/SmartNihongoLearner.git
cd SmartNihongoLearner
npm install

Running Locally

npm run dev              # Starts on http://localhost:5173
npm run dev -- --port 3000  # Custom port

Building

npm run build            # Creates dist/ folder
npm run preview          # Preview production build

Resources & References

Data Sources

Technical Documentation

Python Environment

  • Virtual environment: /home/alessandro/.virtualenvs/SmartNihongoLearner
  • Dependencies: requests, beautifulsoup4, wordfreq[cjk]

Recent Commits

a73f1e8 - Fix homophone collision in answer matching
422ec44 - Add reverse game modes, reset SRS button, and comprehensive hint system
1f2555f - Fix critical bugs and improve error handling throughout the app
b819c8e - Implement What Could Match game with full gameplay flow
e218793 - Implement core services and data models for serverless architecture

Contributors

  • Alessandro - Project creator and main developer
  • Claude Code - AI assistant for implementation

Project Status: ✅ Core Application Functional

The app is fully functional with vocabulary learning through collocations, SRS-based word selection, 4 game modes, progress tracking, and comprehensive hints system. Ready for testing and user feedback.