Skip to content

Add 10 preloaded datasets on first app start#670

Open
Irozuku wants to merge 6 commits into
developfrom
feat/seed-datasets
Open

Add 10 preloaded datasets on first app start#670
Irozuku wants to merge 6 commits into
developfrom
feat/seed-datasets

Conversation

@Irozuku
Copy link
Copy Markdown
Collaborator

@Irozuku Irozuku commented Jun 3, 2026

Summary

Adds automatic dataset seeding on first app start. On the first run, DashAI copies 10 pre-processed datasets (Arrow format) from a bundled zip into the local datasets directory and registers them in the database (no user action required). Subsequent starts skip seeding via a .seeded sentinel file.


Type of Change

Check all that apply like this [x]:

  • Backend change
  • Frontend change
  • CI / Workflow change
  • Build / Packaging change
  • Bug fix
  • Documentation

Changes (by file)

  • DashAI/back/seeds/__init__.py: seeding logic - checks sentinel, extracts zip, copies dataset folders, inserts DB rows with FINISHED status
  • DashAI/back/seeds/manifest.json: maps dataset names to row/column counts used for DB registration
  • DashAI/back/seeds/seed_datasets.zip: 10 pre-processed datasets in Arrow format (ai-vs-human, languages, urdu-depression-data, students, spanish-mnist, food-menu, energy, possum, auditory-skills, cifar10-subset)
  • DashAI/back/app.py: calls seed_datasets_if_first_run() after DB migrations on startup
  • dashai.spec: includes DashAI/back/seeds in PyInstaller datas so the zip ships with the executable
  • MANIFEST.in: includes DashAI/back/seeds so the zip ships with the pip package

Testing

  • Install via pip install on Windows, macOS and Linux, verify 10 datasets appear on first start
  • Run executable (PyInstaller build) on Windows, verify 10 datasets appear on first start
  • Run executable (PyInstaller build) on macOS, verify 10 datasets appear on first start
  • Restart app after first run, verify seeding is skipped (no duplicates)
  • Upload a dataset with the same name as a seed dataset before first run, verify no conflict

Notes

  • Seeding only runs once - sentinel file ~/.DashAI/.seeded is written after completion
  • If zip is missing (e.g. dev environment without assets), startup continues normally with a warning
  • Datasets with the same name already in the DB are skipped without touching the filesystem

@Irozuku Irozuku added enhancement New feature or request back Backend work labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

back Backend work enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant