Skip to content

Develop#221

Merged
YWHyuk merged 203 commits into
masterfrom
develop
Apr 25, 2026
Merged

Develop#221
YWHyuk merged 203 commits into
masterfrom
develop

Conversation

@YWHyuk
Copy link
Copy Markdown
Collaborator

@YWHyuk YWHyuk commented Apr 7, 2026

Changelog — developmaster

TOGSim (simulator)

  • Memory backend: updated to Ramulator 2.1.
  • Config format: Configuration files have migrated from JSON to YAML format.
  • Stats & robustness: Clearer DRAM bandwidth reporting, safer idle-stat handling, fixes for local/remote memory stats.
  • Scheduling: Internal graph API cleanup (non-breaking, no user-facing API changes).Trace files support comments; improved CLI help.

Compiler & runtime (PyTorchSim / MLIR)

  • PyTorch version: 2.1 → 2.8 (PyTorch version update #196)
  • Operators: SDPA can now be routed to a dedicated NPU kernel via torch.nn.attention.sdpa_kernel([SDPBackend.FLASH_ATTENTION]) context manager; TopK, Bitonic sort, Cat added. ([BUG]Support for repeat_interleave operation to enable Grouped Query Attention (GQA) #198)
  • CNNs: MobileNet CI and 1×1 spatial conv as linear; baseline group convolution decomposition + tests. ([BUG] Cannot schedule MobileNet-SSDLite model #205)
  • Dtypes / codegen: Fixed float16 codegen in MLIR templates; worked around gem5 lmul8 widening issue by avoiding the problematic vector-width in codegen.
  • TOGSim session: Run kernels under with TOGSimulator(config_path=...): so config and simulator lifecycle are scoped to the block.
  • Multi-tenant launch: Call torch.npu.launch_model(opt_fn, *args, stream_index=..., timestamp=..., **kwargs) inside that block.
  • Cleanup: Removed legacy scheduler code; standardized on the TOGSimulator-oriented API.

Device (OpenReg / NPU)

  • Device API: Use torch.device("npu") (and torch.device("npu:0"), etc.) like any built-in device type — no extra package import beyond import torch; the NPU backend registers with PyTorch's device system.
  • Eager mode: CPU fallback is applied automatically when graph compilation is not available.

⚠️ Breaking Changes

  • Config format migration: Configuration files must be converted from JSON to YAML format. Existing .json config files are no longer supported.
  • Multi-tenant API redesign: The scheduler-based multi-tenant launch pattern has been replaced. The old API required manual Scheduler instantiation, Request object construction, and a while not scheduler.is_finished(): loop. The new API uses a with TOGSimulator(config_path=...): context and torch.npu.launch_model(..., stream_index=..., timestamp=...) calls directly. See test_scheduler.py for the updated usage pattern.

CI, tests, experiments

  • Added or tightened tests for DeepSeek, YOLOv5, MobileNet; CI image updated for PyTorch 2.8.

Other

  • Misc. codegen, indexing, and matmul-related bugfixes and small refactors.

YWHyuk added 30 commits December 5, 2025 13:05
[Frontend] Use ops instead of raw assembly code
YunseonShin and others added 25 commits April 20, 2026 01:29
- Remove cell execution timestamps from metadata
- Simplify path setup: remove base_dir/sys.path.append, use absolute paths
- Replace extension_config.CONFIG_TOGSIM_CONFIG with direct config paths
- Update log file paths to latest run timestamps
- Adjust tensor sizes and minor wording fixes
- gen_configs: use JSONEncoder to emit more compact JSON (regenerated yaml files)
- Simulator: read Ramulator2 config with ifstream and log text instead of YAML::Dump
@YWHyuk YWHyuk merged commit f595cef into master Apr 25, 2026
202 of 204 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants