Skip to content

Replace MaterialReactTable with lean native tables for large dataset performance#669

Merged
cristian-tamblay merged 11 commits into
developfrom
feat/lean-dataset
Jun 5, 2026
Merged

Replace MaterialReactTable with lean native tables for large dataset performance#669
cristian-tamblay merged 11 commits into
developfrom
feat/lean-dataset

Conversation

@Irozuku
Copy link
Copy Markdown
Collaborator

@Irozuku Irozuku commented Jun 1, 2026

Summary

Replaces MaterialReactTable (MRT) in all dataset facing tables with lightweight native HTML implementations. MRT mounts hundreds of React fiber nodes and Emotion CSS in js rules per cell - on datasets with many columns this caused multisecond freezes when switching datasets, scrolling, and adding manual prediction rows. The new tables use plain <table>/<td> elements with a single static CSS stylesheet, eliminating per cell React/Emotion overhead while keeping all features: server side pagination, sorting, filtering with operator selectors, column visibility, column rename, encoder selector, search highlighting, and CSV export.


Type of Change

  • Backend change
  • Frontend change
  • CI / Workflow change
  • Build / Packaging change
  • Bug fix
  • Documentation

Changes (by file)

New shared component - shared/leanDatasetTable/

  • LeanDatasetTable.jsx: orchestrator - pagination, sort, filter, search, debounced highlights, column visibility, rename, export state.
  • LeanHeaderCell.jsx: sticky header cell with sort arrows, double click rename input, type label, encoder chip.
  • LeanCell.jsx: plain <td> body cell with search match highlighting via <mark>.
  • LeanFilterCell.jsx: per column filter input with operator selector (equals, between, contains, starts/ends with, empty/notEmpty) - one <Menu> per column, only mounted on click.
  • LeanEncoderChip.jsx: encoder selector wired to PATCH /dataset/{id}/columns/{name}/encoder.
  • EncoderChipBase.jsx: presentational chip + menu - self-contained anchor state so clicking it does not re render sibling headers. Shared by LeanEncoderChip and PreviewDatasetTable.
  • LeanToolbar.jsx: export button (left), column visibility icon, filter toggle, search bar (right). Memoized with stable callbacks.
  • LeanColumnsMenu.jsx: show/hide columns dropdown.
  • leanDatasetTable.css: single static stylesheet - no Emotion per cell.
  • operators.js: filter operator constants + backend operator mapping.

Refactored callers

  • DatasetTable.jsx: reduced from ~620 lines to ~55. Now a thin passthrough to LeanDatasetTable that maps existing props. All callers unchanged.
  • PreviewDatasetTable.jsx (notebook creation): replaced MRT with plain <table>. Type selector is a memoized plain <select>. Column rename uses plain <input>. Encoder chip uses EncoderChipBase. All callbacks stabilized with useCallback.
  • DatasetPreviewTable.jsx (legacy upload modal): replaced MRT with plain <table>. Row rebuild only triggered by previewData changes, not every columnsSpec update. handleChange stabilized with useCallback + refs.
  • ManualInputForm.jsx: replaced MUI Table/TableCell with plain <table>/<td> - eliminates per cell Emotion cost on wide datasets.
  • InputField.jsx: replaced MUI TextField/Select with plain <input>/<select>. Added custom styled dropdown arrow. Wrapped in React.memo. handleChange stabilized in parent via useCallback.

i18n

  • locales/{en,es,de,pt}/datasets.json: added datasets:table.* keys covering all user visible strings in the new table UI (toolbar, filter operators, sort tooltips, rename messages, error snackbars).

Testing

  • Open a dataset with many columns (>50). Switch to another - should respond in under 100ms instead of freezing.
  • Scroll a wide dataset preview - smooth, no jank.
  • Verify sort (click column header arrows), filters (funnel icon, per column operator picker), search highlight, column rename (double-click), encoder selector (Categorical columns), export button.
  • Open manual prediction with many input columns and add a row, it should be instant.

Notes

  • DatasetTable prop surface is unchanged - editableColumns, baseBackgroundColor, showBorder are accepted but silently ignored (the lean table doesn't need them). They can be removed in a follow up cleanup.
  • EncoderChipBase is shared between the lean table and the preview tables. Any styling change to the chip should be made there.
  • The lean-th-type > span:first-child CSS selector is intentional - it dims the type label text but not the encoder chip's Tooltip wrapper span, which was previously getting the same opacity by accident.
  • Tables that don't come from a real backend dataset (e.g. PreviewDatasetTable, DatasetPreviewTable, ManualInputForm) keep their own implementations. They are controlled inputs with a bounded, small column count so MRT overhead was never an issue there.

Irozuku added 9 commits June 1, 2026 14:53
Adds datasets:table.* namespace covering all user-visible strings in the
new LeanDatasetTable components: toolbar labels (show/hide columns,
filters, export, search), filter operator names (equals, between,
greater/less than, contains, starts/ends with, empty/notEmpty), sort
tooltips, column rename messages, and error snackbars.
Covers English, Spanish, German and Portuguese.
Replaces MaterialReactTable with a native <table> implementation that
avoids per-cell React fiber + Emotion CSS-in-JS overhead. On wide datasets
(100+ columns) MaterialReactTable froze the main thread for several seconds
on dataset switch because it mounted thousands of MUI component trees
synchronously; native <td> cells are painted by the browser without JS.

Component breakdown (shared/leanDatasetTable/):
- LeanDatasetTable  orchestrator: pagination, sort, filter, search state
- LeanHeaderCell    sticky header cell with sort arrows and encoder chip
- LeanCell          body cell with text highlight on search
- LeanFilterCell    per-column filter input with operator selector menu
- LeanEncoderChip   Categorical encoder picker (one-hot / label)
- LeanColumnNameEditor  inline rename input with conflict validation
- LeanToolbar       export, column-visibility, filter toggle, search
- LeanColumnsMenu   show/hide columns dropdown
- operators.js      filter operator definitions + backend mapping
- leanDatasetTable.css  single static stylesheet (no runtime CSS-in-JS)

Features: server-side pagination/sort/filter, column visibility,
per-column filter operators (equals/between/contains/etc.), text search
with in-cell highlighting, column rename, encoder selector, CSV export
with active-filter detection, i18n via datasets:table.* keys.
DatasetTable was ~620 lines of MaterialReactTable wiring: column
definitions, client-side filter state, session-storage persistence,
server-side pagination, export logic, EditableColumnHeader per column.
All of that is now provided by LeanDatasetTable, which implements the
same feature set without per-cell React/Emotion overhead.

DatasetTable is now a 55-line passthrough that maps its existing prop
surface (fetchPage, deps, datasetId, datasetPath, columnTypes, etc.)
to LeanDatasetTable. All call-sites remain unchanged.

Removed: EditableColumnHeader, MRT config, client-side filter/sort
state machine, inline export handler, unused imports (useTheme,
useTranslation, useMaterialReactTable, EditableColumnHeader, etc.).
MUI TableCell mounts an Emotion-styled React component per cell. With
many input columns every row change triggered a full re-render of all
those styled cells. Replacing Table/TableHead/TableRow/TableCell with
plain <table>/<thead>/<tr>/<td> using inline styles eliminates the
per-cell Emotion overhead while keeping InputField untouched. Visual
output is identical.
InputField called useTheme() on every cell, subscribing each to the theme
context. With 100 columns that meant 100 theme subscriptions and
commonStyles object re-creation on every render.

Changes:
- Removed useTheme from InputField; sx props resolve theme tokens natively.
- Wrapped InputField in React.memo so unchanged cells skip re-render.
- useCallback + functional setState on handleChange so the callback
  reference is stable across row updates, letting React.memo bail out.
MUI TableCell, TextField and Select each mount several nested components.
With many columns, adding one row caused hundreds of synchronous component
mounts, freezing the main thread.

Changes:
- ManualInputForm: replaced Table/TableRow/TableCell with plain <table>/<tr>/<td>
  using inline styles computed from the theme. Eliminates per-cell Emotion cost.
- InputField: replaced TextField/Select with plain <input>/<select>. Added
  custom styled dropdown arrow for the categorical selector. Wrapped in
  React.memo to skip re-renders for unchanged cells.
- Stabilized handleChange with useCallback + functional setState so React.memo
  actually bails out when only one cell value changes.
Extracts a shared EncoderChipBase component (self-contained anchor state)
so toggling the encoder menu no longer re-renders sibling column headers.

- EncoderChipBase (shared/leanDatasetTable/): presentational chip + menu,
  accepts encoder, onSelect, encoderLabel. Used by both LeanEncoderChip
  (wraps with API call) and PreviewDatasetTable (wraps with local callback).
- LeanEncoderChip: refactored to use EncoderChipBase; removes duplicated
  chip/menu JSX.
- PreviewDatasetTable (notebook creation): replaced MaterialReactTable with
  plain <table>/<td>. Type selector is a memoized plain <select>. Rename
  input is a plain <input>. All callbacks wrapped in useCallback. Encoder
  chip uses EncoderChipBase directly.
- DatasetPreviewTable (legacy upload): replaced MaterialReactTable with
  plain <table>/<td>. Row rebuild no longer triggered by columnsSpec
  changes (only previewData). handleChange stabilized with useCallback +
  refs to avoid stale closure and keep TypeSelect memo effective.
@Irozuku Irozuku added bug Something isn't working front Frontend work labels Jun 1, 2026
@Irozuku Irozuku marked this pull request as draft June 2, 2026 14:35
@Irozuku Irozuku marked this pull request as ready for review June 2, 2026 16:49
Base automatically changed from feat/compute-metadata-toggle to develop June 5, 2026 16:38
@cristian-tamblay cristian-tamblay merged commit 17861a0 into develop Jun 5, 2026
19 checks passed
@cristian-tamblay cristian-tamblay deleted the feat/lean-dataset branch June 5, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working front Frontend work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants