microsoft/olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

2,297 stars Python 10 components

Optimizes ML models for deployment by applying conversions, quantization, and other transformations

Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 10-component ml training. 519 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Parse CLI parameters into RunConfig — CLI commands like 'olive quantize' parse user parameters (model path, target device, precision) and generate a comprehensive RunConfig with default passes and evaluators for the specified optimization type
Load and validate input model — The WorkflowRunner loads the model using format-specific handlers (PyTorchModelHandler, ONNXModelHandler) that wrap the model with metadata and provide unified interfaces for transformation [RunConfig → OliveModelHandler]
Execute optimization passes — The engine applies each configured pass (OnnxQuantization, OnnxConversion, etc.) to transform the model - each pass produces a PassRunResult containing the transformed model and metrics about the optimization [OliveModelHandler → PassRunResult]
Evaluate model quality — Evaluators measure the optimized model against accuracy metrics (using validation datasets) and performance metrics (latency, throughput) on the target system to ensure optimization constraints are met [OliveModelHandler → MetricResult]
Cache and rank results — The engine caches PassRunResults to avoid recomputation and ranks optimization candidates by their metric scores, selecting the best models that meet accuracy thresholds and performance targets [PassRunResult]
Package and save optimized models — The best model candidates are packaged (merging adapters if present) and saved to the output directory with their metrics and configuration metadata [OliveModelHandler]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

OliveModelHandler olive/model/handler/base.py
Abstract base class with model_path: Union[str, Path], model_file_format: ModelFileFormat, model_attributes: dict, adapter_path: Optional[str]
Created when a model is loaded, passed between optimization passes as the container for model state and metadata, saved as the final optimized output

RunConfig olive/workflows/run/config.py
Pydantic model with input_model: Union[InputModel, str], systems: List[OliveSystem], evaluators: Dict[str, OliveEvaluatorConfig], passes: Dict[str, PassConfig], engine: EngineConfig
Generated from CLI parameters or loaded from JSON config file, validates all optimization settings, drives the entire workflow execution

PassRunResult olive/passes/olive_pass.py
dict with output_model: OliveModelHandler, metrics: Dict[str, MetricResult], input_model_id: str, output_model_id: str
Created after each pass execution with the transformed model and performance metrics, cached to avoid recomputation, used for pass chaining decisions

ModelConfig olive/model/config/model_config.py
Pydantic model with type: ModelType, config: Dict, load_kwargs: Dict, io_config: Optional[Dict], dummy_inputs_func: Optional[str]
Specifies how to load and configure a model from various sources (HuggingFace, local files, custom loaders)

MetricResult olive/evaluator/metric.py
dict with value: Union[float, int, str, dict], priority: int, higher_is_better: bool, data_config: str
Computed by evaluators to measure model quality (accuracy, latency, throughput), used to rank optimization candidates and make pass selection decisions

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

Python executable path returned by sys.executable is stable and available for subprocess creation throughout job execution lifecycle

If this fails: If the current Python interpreter moves or becomes unavailable after job starts, venv creation fails silently and jobs crash with obscure errors about missing Python

mcp/src/olive_mcp/jobs.py:_get_or_create_venv

critical Resource weakly guarded

System can handle 3 concurrent model optimization jobs without memory exhaustion or thrashing

If this fails: Large models (>1GB) running 3 concurrent optimizations can exhaust system memory, causing OOM kills or swap thrashing that makes all jobs fail

mcp/src/olive_mcp/jobs.py:_MAX_CONCURRENT_JOBS

critical Temporal unguarded

All optimization jobs complete within 1 hour or users don't need results after that time

If this fails: Long-running optimizations (large model quantization can take 2-4 hours) get their results purged before users can retrieve them, losing hours of computation

mcp/src/olive_mcp/jobs.py:_JOB_TTL_SECONDS

critical Contract unguarded

The mapping from implementation names to pass types stays synchronized with olive/cli/quantize.py selection logic

If this fails: When olive/cli/quantize.py adds new quantization methods or changes selection logic, MCP routes to wrong passes, causing optimization failures or suboptimal results

mcp/src/olive_mcp/packages.py:_IMPL_TO_PASS_TYPES

critical Environment unguarded

User home directory (~/.olive-mcp/venvs) is writable and has sufficient disk space for multiple virtual environments

If this fails: In containerized or restricted environments where home directory is read-only or has quota limits, venv creation fails and no optimization jobs can run

olive_mcp/constants.py:VENV_BASE

warning Domain weakly guarded

olive_config.json structure matches expected schema with pass types correctly categorized

If this fails: If olive_config.json format changes or pass types get reorganized, package resolution fails silently, causing optimization jobs to fail with missing dependency errors

mcp/src/olive_mcp/packages.py:_load_olive_config

warning Temporal unguarded

Virtual environments not used for 14 days are safe to delete and users won't need them again

If this fails: Users returning to projects after vacation or long development cycles find their optimized environments purged, forcing expensive package reinstallation

mcp/src/olive_mcp/constants.py:_VENV_MAX_AGE_DAYS

warning Scale unguarded

RunConfig JSON schema can be serialized to string and fits in memory without size limits

If this fails: As Olive adds more passes and configuration options, the schema grows large enough to cause memory issues or hit string size limits during documentation generation

docs/source/dump_schema.py:RunConfig.schema_json

warning Environment unguarded

Azure DevOps organization 'aiinfra' remains accessible and API endpoints don't change

If this fails: If organization moves, gets renamed, or API versions change, the commit bisection script fails and debugging CI issues becomes manual

.azure_pipelines/scripts/find_failed_commit.py:_organization_url

warning Contract weakly guarded

Package config pass names map correctly to module names and the import_pass_module method exists

If this fails: Documentation generation fails when pass modules can't be imported, leaving configuration docs incomplete or with broken references

docs/source/exts/auto_config_doc/__init__.py:import_class

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Pass result cache (cache)
Stores PassRunResult objects keyed by input model hash and pass configuration to avoid recomputing expensive transformations

Model registry (registry)
Maps model types to their corresponding handler classes for loading different formats (PyTorch, ONNX, HuggingFace)

Evaluation results (database)
Persists MetricResult objects with model performance measurements across different systems and configurations

Workflow outputs (file-store)
Stores final optimized models, their metrics, and configuration metadata as deployment artifacts

Feedback Loops

Pass search optimization (convergence, reinforcing) — Trigger: Multiple pass configurations available. Action: Engine evaluates different pass parameter combinations and selects the best performing variants. Exit: All pass combinations evaluated or search budget exhausted.
Accuracy threshold validation (self-correction, balancing) — Trigger: Model accuracy drops below configured threshold. Action: Engine rejects the optimization result and tries alternative pass configurations. Exit: Model meets accuracy requirements or no more alternatives.
Model conversion retry (retry, balancing) — Trigger: ONNX conversion fails due to unsupported operators. Action: Pass retries with fallback options like operator version changes or export parameters. Exit: Conversion succeeds or all retry attempts exhausted.

Delays

Model evaluation inference (async-processing, ~Varies by model size and dataset) — Each optimization candidate must be evaluated before ranking, creating bottlenecks in pass selection
Pass result caching (cache-ttl, ~Persistent until cache cleared) — Subsequent runs with identical configurations skip expensive transformations
Model loading and conversion (compilation, ~Varies by model complexity) — Initial model transformations like PyTorch to ONNX conversion can take minutes for large models

Control Points

Target execution provider (architecture-switch) — Controls: Which inference backend (ONNX Runtime, OpenVINO, DirectML) is used for model execution and evaluation. Default: Configurable per system
Quantization precision (precision-mode) — Controls: Model weight and activation precision (int4, int8, fp16) affecting model size and accuracy trade-offs. Default: Pass-specific configuration
Accuracy tolerance threshold (threshold) — Controls: Minimum acceptable accuracy loss during optimization - models below this threshold are rejected. Default: User-configurable per evaluator
Engine search strategy (hyperparameter) — Controls: How the engine explores pass configurations - exhaustive search vs early stopping based on performance. Default: Configurable in RunConfig

Technology Stack

PyTorch (framework)
Source model format and training framework for fine-tuning workflows

ONNX Runtime (runtime)
Primary inference engine for optimized models and performance evaluation

Transformers (library)
HuggingFace integration for loading pre-trained models and tokenizers

Pydantic (serialization)
Configuration validation and schema generation for RunConfig and model specifications

FastMCP (framework)
MCP server framework for conversational model optimization interface

OpenVINO (runtime)
Intel inference optimization and execution provider for CPU/NPU targets

LoRA/QLoRA (library)
Efficient fine-tuning techniques for large language models with reduced memory usage

Key Components

WorkflowRunner (orchestrator) — Coordinates the execution of optimization passes according to the RunConfig, manages model state progression, handles caching and checkpointing olive/workflows/run/run.py
Pass (transformer) — Base class for all model transformations - each pass takes a model and produces an optimized variant with metrics about the transformation quality olive/passes/olive_pass.py
OliveEvaluator (validator) — Measures model quality through accuracy and performance metrics, providing feedback to guide optimization decisions and validate constraints olive/evaluator/olive_evaluator.py
OliveSystem (adapter) — Abstracts target hardware (CPU, GPU, NPU) and execution providers (ONNX Runtime, OpenVINO), providing unified interface for model execution and evaluation olive/systems/olive_system.py
LocalSystem (executor) — Executes model inference on the local machine, handles device placement (CPU/GPU) and provider selection (ONNX Runtime, OpenVINO, DirectML) olive/systems/local.py
OliveEngine (scheduler) — Manages pass execution strategy, handles parallel execution, caching policies, and optimization search across different pass configurations olive/engine/engine.py
ModelPackaging (serializer) — Handles model serialization, adapter merging, and packaging for deployment - converts optimized models to deployable artifacts olive/model/handler/mixin/packaging.py
AccuracyEvaluator (validator) — Measures model accuracy using various metrics (classification accuracy, BLEU score, perplexity), ensuring optimization doesn't degrade model quality below thresholds olive/evaluator/accuracy.py
LatencyEvaluator (monitor) — Measures model inference latency and throughput across different batch sizes and sequence lengths, optimizing for deployment performance targets olive/evaluator/latency.py
OnnxConversion (transformer) — Converts models from PyTorch/TensorFlow to ONNX format using torch.onnx.export, handling dynamic shapes and operator compatibility olive/passes/onnx/conversion.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is Olive used for?

Optimizes ML models for deployment by applying conversions, quantization, and other transformations microsoft/olive is a 10-component ml training written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 519 files.

How is Olive architected?

Olive is organized into 5 architecture layers: CLI Layer, Engine Layer, Pass Layer, Model Layer, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through Olive?

Data moves through 6 stages: Parse CLI parameters into RunConfig → Load and validate input model → Execute optimization passes → Evaluate model quality → Cache and rank results → .... Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions. This pipeline design reflects a complex multi-stage processing system.

What technologies does Olive use?

The core stack includes PyTorch (Source model format and training framework for fine-tuning workflows), ONNX Runtime (Primary inference engine for optimized models and performance evaluation), Transformers (HuggingFace integration for loading pre-trained models and tokenizers), Pydantic (Configuration validation and schema generation for RunConfig and model specifications), FastMCP (MCP server framework for conversational model optimization interface), OpenVINO (Intel inference optimization and execution provider for CPU/NPU targets), and 1 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does Olive have?

Olive exhibits 4 data pools (Pass result cache, Model registry), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle convergence and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does Olive use?

4 design patterns detected: Pass-based Pipeline, Handler Pattern, Evaluation Feedback Loop, System Abstraction.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.