microsoft/olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
Optimizes ML models for deployment by applying conversions, quantization, and other transformations
Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions.
Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.
A 10-component ml training. 519 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions.
- Parse CLI parameters into RunConfig — CLI commands like 'olive quantize' parse user parameters (model path, target device, precision) and generate a comprehensive RunConfig with default passes and evaluators for the specified optimization type
- Load and validate input model — The WorkflowRunner loads the model using format-specific handlers (PyTorchModelHandler, ONNXModelHandler) that wrap the model with metadata and provide unified interfaces for transformation [RunConfig → OliveModelHandler]
- Execute optimization passes — The engine applies each configured pass (OnnxQuantization, OnnxConversion, etc.) to transform the model - each pass produces a PassRunResult containing the transformed model and metrics about the optimization [OliveModelHandler → PassRunResult]
- Evaluate model quality — Evaluators measure the optimized model against accuracy metrics (using validation datasets) and performance metrics (latency, throughput) on the target system to ensure optimization constraints are met [OliveModelHandler → MetricResult]
- Cache and rank results — The engine caches PassRunResults to avoid recomputation and ranks optimization candidates by their metric scores, selecting the best models that meet accuracy thresholds and performance targets [PassRunResult]
- Package and save optimized models — The best model candidates are packaged (merging adapters if present) and saved to the output directory with their metrics and configuration metadata [OliveModelHandler]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
olive/model/handler/base.pyAbstract base class with model_path: Union[str, Path], model_file_format: ModelFileFormat, model_attributes: dict, adapter_path: Optional[str]
Created when a model is loaded, passed between optimization passes as the container for model state and metadata, saved as the final optimized output
olive/workflows/run/config.pyPydantic model with input_model: Union[InputModel, str], systems: List[OliveSystem], evaluators: Dict[str, OliveEvaluatorConfig], passes: Dict[str, PassConfig], engine: EngineConfig
Generated from CLI parameters or loaded from JSON config file, validates all optimization settings, drives the entire workflow execution
olive/passes/olive_pass.pydict with output_model: OliveModelHandler, metrics: Dict[str, MetricResult], input_model_id: str, output_model_id: str
Created after each pass execution with the transformed model and performance metrics, cached to avoid recomputation, used for pass chaining decisions
olive/model/config/model_config.pyPydantic model with type: ModelType, config: Dict, load_kwargs: Dict, io_config: Optional[Dict], dummy_inputs_func: Optional[str]
Specifies how to load and configure a model from various sources (HuggingFace, local files, custom loaders)
olive/evaluator/metric.pydict with value: Union[float, int, str, dict], priority: int, higher_is_better: bool, data_config: str
Computed by evaluators to measure model quality (accuracy, latency, throughput), used to rank optimization candidates and make pass selection decisions
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Python executable path returned by sys.executable is stable and available for subprocess creation throughout job execution lifecycle
If this fails: If the current Python interpreter moves or becomes unavailable after job starts, venv creation fails silently and jobs crash with obscure errors about missing Python
mcp/src/olive_mcp/jobs.py:_get_or_create_venv
System can handle 3 concurrent model optimization jobs without memory exhaustion or thrashing
If this fails: Large models (>1GB) running 3 concurrent optimizations can exhaust system memory, causing OOM kills or swap thrashing that makes all jobs fail
mcp/src/olive_mcp/jobs.py:_MAX_CONCURRENT_JOBS
All optimization jobs complete within 1 hour or users don't need results after that time
If this fails: Long-running optimizations (large model quantization can take 2-4 hours) get their results purged before users can retrieve them, losing hours of computation
mcp/src/olive_mcp/jobs.py:_JOB_TTL_SECONDS
The mapping from implementation names to pass types stays synchronized with olive/cli/quantize.py selection logic
If this fails: When olive/cli/quantize.py adds new quantization methods or changes selection logic, MCP routes to wrong passes, causing optimization failures or suboptimal results
mcp/src/olive_mcp/packages.py:_IMPL_TO_PASS_TYPES
User home directory (~/.olive-mcp/venvs) is writable and has sufficient disk space for multiple virtual environments
If this fails: In containerized or restricted environments where home directory is read-only or has quota limits, venv creation fails and no optimization jobs can run
olive_mcp/constants.py:VENV_BASE
olive_config.json structure matches expected schema with pass types correctly categorized
If this fails: If olive_config.json format changes or pass types get reorganized, package resolution fails silently, causing optimization jobs to fail with missing dependency errors
mcp/src/olive_mcp/packages.py:_load_olive_config
Virtual environments not used for 14 days are safe to delete and users won't need them again
If this fails: Users returning to projects after vacation or long development cycles find their optimized environments purged, forcing expensive package reinstallation
mcp/src/olive_mcp/constants.py:_VENV_MAX_AGE_DAYS
RunConfig JSON schema can be serialized to string and fits in memory without size limits
If this fails: As Olive adds more passes and configuration options, the schema grows large enough to cause memory issues or hit string size limits during documentation generation
docs/source/dump_schema.py:RunConfig.schema_json
Azure DevOps organization 'aiinfra' remains accessible and API endpoints don't change
If this fails: If organization moves, gets renamed, or API versions change, the commit bisection script fails and debugging CI issues becomes manual
.azure_pipelines/scripts/find_failed_commit.py:_organization_url
Package config pass names map correctly to module names and the import_pass_module method exists
If this fails: Documentation generation fails when pass modules can't be imported, leaving configuration docs incomplete or with broken references
docs/source/exts/auto_config_doc/__init__.py:import_class
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Stores PassRunResult objects keyed by input model hash and pass configuration to avoid recomputing expensive transformations
Maps model types to their corresponding handler classes for loading different formats (PyTorch, ONNX, HuggingFace)
Persists MetricResult objects with model performance measurements across different systems and configurations
Stores final optimized models, their metrics, and configuration metadata as deployment artifacts
Feedback Loops
- Pass search optimization (convergence, reinforcing) — Trigger: Multiple pass configurations available. Action: Engine evaluates different pass parameter combinations and selects the best performing variants. Exit: All pass combinations evaluated or search budget exhausted.
- Accuracy threshold validation (self-correction, balancing) — Trigger: Model accuracy drops below configured threshold. Action: Engine rejects the optimization result and tries alternative pass configurations. Exit: Model meets accuracy requirements or no more alternatives.
- Model conversion retry (retry, balancing) — Trigger: ONNX conversion fails due to unsupported operators. Action: Pass retries with fallback options like operator version changes or export parameters. Exit: Conversion succeeds or all retry attempts exhausted.
Delays
- Model evaluation inference (async-processing, ~Varies by model size and dataset) — Each optimization candidate must be evaluated before ranking, creating bottlenecks in pass selection
- Pass result caching (cache-ttl, ~Persistent until cache cleared) — Subsequent runs with identical configurations skip expensive transformations
- Model loading and conversion (compilation, ~Varies by model complexity) — Initial model transformations like PyTorch to ONNX conversion can take minutes for large models
Control Points
- Target execution provider (architecture-switch) — Controls: Which inference backend (ONNX Runtime, OpenVINO, DirectML) is used for model execution and evaluation. Default: Configurable per system
- Quantization precision (precision-mode) — Controls: Model weight and activation precision (int4, int8, fp16) affecting model size and accuracy trade-offs. Default: Pass-specific configuration
- Accuracy tolerance threshold (threshold) — Controls: Minimum acceptable accuracy loss during optimization - models below this threshold are rejected. Default: User-configurable per evaluator
- Engine search strategy (hyperparameter) — Controls: How the engine explores pass configurations - exhaustive search vs early stopping based on performance. Default: Configurable in RunConfig
Technology Stack
Source model format and training framework for fine-tuning workflows
Primary inference engine for optimized models and performance evaluation
HuggingFace integration for loading pre-trained models and tokenizers
Configuration validation and schema generation for RunConfig and model specifications
MCP server framework for conversational model optimization interface
Intel inference optimization and execution provider for CPU/NPU targets
Efficient fine-tuning techniques for large language models with reduced memory usage
Key Components
- WorkflowRunner (orchestrator) — Coordinates the execution of optimization passes according to the RunConfig, manages model state progression, handles caching and checkpointing
olive/workflows/run/run.py - Pass (transformer) — Base class for all model transformations - each pass takes a model and produces an optimized variant with metrics about the transformation quality
olive/passes/olive_pass.py - OliveEvaluator (validator) — Measures model quality through accuracy and performance metrics, providing feedback to guide optimization decisions and validate constraints
olive/evaluator/olive_evaluator.py - OliveSystem (adapter) — Abstracts target hardware (CPU, GPU, NPU) and execution providers (ONNX Runtime, OpenVINO), providing unified interface for model execution and evaluation
olive/systems/olive_system.py - LocalSystem (executor) — Executes model inference on the local machine, handles device placement (CPU/GPU) and provider selection (ONNX Runtime, OpenVINO, DirectML)
olive/systems/local.py - OliveEngine (scheduler) — Manages pass execution strategy, handles parallel execution, caching policies, and optimization search across different pass configurations
olive/engine/engine.py - ModelPackaging (serializer) — Handles model serialization, adapter merging, and packaging for deployment - converts optimized models to deployable artifacts
olive/model/handler/mixin/packaging.py - AccuracyEvaluator (validator) — Measures model accuracy using various metrics (classification accuracy, BLEU score, perplexity), ensuring optimization doesn't degrade model quality below thresholds
olive/evaluator/accuracy.py - LatencyEvaluator (monitor) — Measures model inference latency and throughput across different batch sizes and sequence lengths, optimizing for deployment performance targets
olive/evaluator/latency.py - OnnxConversion (transformer) — Converts models from PyTorch/TensorFlow to ONNX format using torch.onnx.export, handling dynamic shapes and operator compatibility
olive/passes/onnx/conversion.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is Olive used for?
Optimizes ML models for deployment by applying conversions, quantization, and other transformations microsoft/olive is a 10-component ml training written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 519 files.
How is Olive architected?
Olive is organized into 5 architecture layers: CLI Layer, Engine Layer, Pass Layer, Model Layer, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through Olive?
Data moves through 6 stages: Parse CLI parameters into RunConfig → Load and validate input model → Execute optimization passes → Evaluate model quality → Cache and rank results → .... Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions. This pipeline design reflects a complex multi-stage processing system.
What technologies does Olive use?
The core stack includes PyTorch (Source model format and training framework for fine-tuning workflows), ONNX Runtime (Primary inference engine for optimized models and performance evaluation), Transformers (HuggingFace integration for loading pre-trained models and tokenizers), Pydantic (Configuration validation and schema generation for RunConfig and model specifications), FastMCP (MCP server framework for conversational model optimization interface), OpenVINO (Intel inference optimization and execution provider for CPU/NPU targets), and 1 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does Olive have?
Olive exhibits 4 data pools (Pass result cache, Model registry), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle convergence and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does Olive use?
4 design patterns detected: Pass-based Pipeline, Handler Pattern, Evaluation Feedback Loop, System Abstraction.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.