feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406
feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406justinchuby wants to merge 13 commits intomainfrom
Conversation
Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.
Changes:
- Introduces
olive/passes/onnx/mobius_model_builder.pyimplementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough). - Registers the pass in
olive/olive_config.jsonand adds two Gemma4 example run configs. - Adds unit tests for single-component, multi-component, EP selection, and error paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
olive/passes/onnx/mobius_model_builder.py |
New pass wrapping mobius.build() and emitting Olive model handlers. |
olive/olive_config.json |
Registers MobiusModelBuilder and declares extras for its dependencies. |
examples/gemma4/gemma4_int4_pipeline.json |
Example pipeline: mobius export (fp16 CUDA) then INT4 quantization. |
examples/gemma4/gemma4_fp32_cpu.json |
Example pipeline: mobius export (fp32 CPU). |
test/passes/onnx/test_mobius_model_builder.py |
New unit tests for config, handler types, EP mapping, and missing dependency behavior. |
- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]]
(keys are enum instances, not plain strings)
- olive_config.json: add onnx-ir (correct pip hyphenated name) to both
the pass extra_dependencies and the top-level extra_dependencies map;
was previously using wrong underscore spelling 'onnx_ir'
- Rename examples/gemma4/gemma4_int4_pipeline.json ->
gemma4_int4_cuda.json so both example configs follow the same
{precision}_{device}.json naming pattern
- _patch_build: expand docstring explaining why 'mobius.build' is the
correct patch target (lazy import inside function body, not module-level)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- Add module-scoped _stub_mobius_module fixture that injects a fake
'mobius' stub into sys.modules when the package is not installed,
ensuring patch('mobius.build') works in Olive CI without mobius-genai
- Add '# pylint: disable=protected-access' on _default_config test line
(PYLINT W0212 — intentional test access to a pass internals method)
- Add '# noqa: PLC0415' on lazy 'from mobius import build' inside
_run_for_config — import is intentionally deferred to surface a clear
ImportError only when the pass actually runs
- Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches
on mobius_model_builder.py, test file, and both example configs
- 14/14 tests pass
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
There was a problem hiding this comment.
we don't keep examples in this repo anymore. can you create an accompanying PR in microsoft/olive-recipes?
There was a problem hiding this comment.
I am just using this PR to iterate the files. Could you comment on whether there are errors or changes needed? I will move the files over once stable.
There was a problem hiding this comment.
some comments:
- have never tried adding an unused field like "comment" in the config. don't remember if the config accepts unknown field.
- engine level nesting is optional. most examples we have now don't use it for cleaner configs
- cpu ep is the default the system doesn't need to be added explicitly
| ), | ||
| ), | ||
| "execution_provider": PassConfigParam( | ||
| type_=str, |
There was a problem hiding this comment.
we could create an enum of the supported eps for automatic validation like in
Olive/olive/passes/pytorch/autoawq.py
Line 27 in 8b1957e
unless you think the options might keep growing and it would be hard to keep it in sync across versions
| }, | ||
| "passes": { | ||
| "mobius_build": { "type": "MobiusModelBuilder", "precision": "fp16" }, | ||
| "int4_quantize": { "type": "GptqQuantizer", "bits": 4, "group_size": 128, "sym": true } |
There was a problem hiding this comment.
the quantization pass works in a pytorch model and should be run before mobius
There was a problem hiding this comment.
Is there a pass I can use to quantize after the model? I would like to use that as a an example for now
There was a problem hiding this comment.
you can use the rtn pass
{
"type": "rtn",
"bits": 4,
"sym": false,
"group_size": 32,
"embeds": true,
"lm_head": true
}There was a problem hiding this comment.
oh, i misread the comment. you can use the blockwise quantizer pass:
{
"type": "OnnxBlockWiseRtnQuantization",
"block_size": 128,
"is_symmetric": true,
"accuracy_level": 4,
"save_as_external_data": true
}There was a problem hiding this comment.
Thanks. And it will process all components together?
There was a problem hiding this comment.
if the model is composite it should run the quantizer on each component and return a new composite model.
…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
| # Licensed under the MIT License. | ||
| # -------------------------------------------------------------------------- | ||
|
|
||
| # ruff: noqa: T201 |
There was a problem hiding this comment.
nit: maybe we could do this removal in a different PR? these make the PR seem bigger than it actually is.
There was a problem hiding this comment.
sorry I ran lintrunner in the wrong repo 😅 Looks like Olive's ruff version can be updated
Summary
Adds a new
MobiusModelBuilderOlive pass that wraps the mobius package'sbuild()function to produce ONNX models directly from HuggingFace model IDs.What this does
ONNXModelHandlerCompositeModelHandlerwith oneONNXModelHandlerper componentcpu/cuda/dml/webgpu)fp32(default),fp16,bf16Files
olive/passes/onnx/mobius_model_builder.pyolive/olive_config.jsonMobiusModelBuilderexamples/gemma4/gemma4_int4_pipeline.jsonexamples/gemma4/gemma4_fp32_cpu.jsontest/passes/onnx/test_mobius_model_builder.pyE2E Test
Verified with
HuggingFaceTB/SmolLM2-135M-Instruct(135M llama-type, locally cached):Result:
model.onnx(289KB) +model.onnx.data(1.2GB) producedonnxruntime.InferenceSessionloaded successfullyinput_ids/attention_mask/position_ids + 30×past_kv→logits + 30×present_kvTest coverage
Tests cover: single-component, multi-component, EP auto-detection, EP override, precision mapping, non-HF model rejection, missing mobius error, all 4 EP map entries.
Notes