microsoft/olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

2,297 stars Python 10 components

Optimizes ML models for deployment by applying conversions, quantization, and other transformations

Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 10-component ml training. 519 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions.

  1. Parse CLI parameters into RunConfig — CLI commands like 'olive quantize' parse user parameters (model path, target device, precision) and generate a comprehensive RunConfig with default passes and evaluators for the specified optimization type
  2. Load and validate input model — The WorkflowRunner loads the model using format-specific handlers (PyTorchModelHandler, ONNXModelHandler) that wrap the model with metadata and provide unified interfaces for transformation [RunConfig → OliveModelHandler]
  3. Execute optimization passes — The engine applies each configured pass (OnnxQuantization, OnnxConversion, etc.) to transform the model - each pass produces a PassRunResult containing the transformed model and metrics about the optimization [OliveModelHandler → PassRunResult]
  4. Evaluate model quality — Evaluators measure the optimized model against accuracy metrics (using validation datasets) and performance metrics (latency, throughput) on the target system to ensure optimization constraints are met [OliveModelHandler → MetricResult]
  5. Cache and rank results — The engine caches PassRunResults to avoid recomputation and ranks optimization candidates by their metric scores, selecting the best models that meet accuracy thresholds and performance targets [PassRunResult]
  6. Package and save optimized models — The best model candidates are packaged (merging adapters if present) and saved to the output directory with their metrics and configuration metadata [OliveModelHandler]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

OliveModelHandler olive/model/handler/base.py
Abstract base class with model_path: Union[str, Path], model_file_format: ModelFileFormat, model_attributes: dict, adapter_path: Optional[str]
Created when a model is loaded, passed between optimization passes as the container for model state and metadata, saved as the final optimized output
RunConfig olive/workflows/run/config.py
Pydantic model with input_model: Union[InputModel, str], systems: List[OliveSystem], evaluators: Dict[str, OliveEvaluatorConfig], passes: Dict[str, PassConfig], engine: EngineConfig
Generated from CLI parameters or loaded from JSON config file, validates all optimization settings, drives the entire workflow execution
PassRunResult olive/passes/olive_pass.py
dict with output_model: OliveModelHandler, metrics: Dict[str, MetricResult], input_model_id: str, output_model_id: str
Created after each pass execution with the transformed model and performance metrics, cached to avoid recomputation, used for pass chaining decisions
ModelConfig olive/model/config/model_config.py
Pydantic model with type: ModelType, config: Dict, load_kwargs: Dict, io_config: Optional[Dict], dummy_inputs_func: Optional[str]
Specifies how to load and configure a model from various sources (HuggingFace, local files, custom loaders)
MetricResult olive/evaluator/metric.py
dict with value: Union[float, int, str, dict], priority: int, higher_is_better: bool, data_config: str
Computed by evaluators to measure model quality (accuracy, latency, throughput), used to rank optimization candidates and make pass selection decisions

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

Python executable path returned by sys.executable is stable and available for subprocess creation throughout job execution lifecycle

If this fails: If the current Python interpreter moves or becomes unavailable after job starts, venv creation fails silently and jobs crash with obscure errors about missing Python

mcp/src/olive_mcp/jobs.py:_get_or_create_venv
critical Resource weakly guarded

System can handle 3 concurrent model optimization jobs without memory exhaustion or thrashing

If this fails: Large models (>1GB) running 3 concurrent optimizations can exhaust system memory, causing OOM kills or swap thrashing that makes all jobs fail

mcp/src/olive_mcp/jobs.py:_MAX_CONCURRENT_JOBS
critical Temporal unguarded

All optimization jobs complete within 1 hour or users don't need results after that time

If this fails: Long-running optimizations (large model quantization can take 2-4 hours) get their results purged before users can retrieve them, losing hours of computation

mcp/src/olive_mcp/jobs.py:_JOB_TTL_SECONDS
critical Contract unguarded

The mapping from implementation names to pass types stays synchronized with olive/cli/quantize.py selection logic

If this fails: When olive/cli/quantize.py adds new quantization methods or changes selection logic, MCP routes to wrong passes, causing optimization failures or suboptimal results

mcp/src/olive_mcp/packages.py:_IMPL_TO_PASS_TYPES
critical Environment unguarded

User home directory (~/.olive-mcp/venvs) is writable and has sufficient disk space for multiple virtual environments

If this fails: In containerized or restricted environments where home directory is read-only or has quota limits, venv creation fails and no optimization jobs can run

olive_mcp/constants.py:VENV_BASE
warning Domain weakly guarded

olive_config.json structure matches expected schema with pass types correctly categorized

If this fails: If olive_config.json format changes or pass types get reorganized, package resolution fails silently, causing optimization jobs to fail with missing dependency errors

mcp/src/olive_mcp/packages.py:_load_olive_config
warning Temporal unguarded

Virtual environments not used for 14 days are safe to delete and users won't need them again

If this fails: Users returning to projects after vacation or long development cycles find their optimized environments purged, forcing expensive package reinstallation

mcp/src/olive_mcp/constants.py:_VENV_MAX_AGE_DAYS
warning Scale unguarded

RunConfig JSON schema can be serialized to string and fits in memory without size limits

If this fails: As Olive adds more passes and configuration options, the schema grows large enough to cause memory issues or hit string size limits during documentation generation

docs/source/dump_schema.py:RunConfig.schema_json
warning Environment unguarded

Azure DevOps organization 'aiinfra' remains accessible and API endpoints don't change

If this fails: If organization moves, gets renamed, or API versions change, the commit bisection script fails and debugging CI issues becomes manual

.azure_pipelines/scripts/find_failed_commit.py:_organization_url
warning Contract weakly guarded

Package config pass names map correctly to module names and the import_pass_module method exists

If this fails: Documentation generation fails when pass modules can't be imported, leaving configuration docs incomplete or with broken references

docs/source/exts/auto_config_doc/__init__.py:import_class

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Pass result cache (cache)
Stores PassRunResult objects keyed by input model hash and pass configuration to avoid recomputing expensive transformations
Model registry (registry)
Maps model types to their corresponding handler classes for loading different formats (PyTorch, ONNX, HuggingFace)
Evaluation results (database)
Persists MetricResult objects with model performance measurements across different systems and configurations
Workflow outputs (file-store)
Stores final optimized models, their metrics, and configuration metadata as deployment artifacts

Feedback Loops

Delays

Control Points

Technology Stack

PyTorch (framework)
Source model format and training framework for fine-tuning workflows
ONNX Runtime (runtime)
Primary inference engine for optimized models and performance evaluation
Transformers (library)
HuggingFace integration for loading pre-trained models and tokenizers
Pydantic (serialization)
Configuration validation and schema generation for RunConfig and model specifications
FastMCP (framework)
MCP server framework for conversational model optimization interface
OpenVINO (runtime)
Intel inference optimization and execution provider for CPU/NPU targets
LoRA/QLoRA (library)
Efficient fine-tuning techniques for large language models with reduced memory usage

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is Olive used for?

Optimizes ML models for deployment by applying conversions, quantization, and other transformations microsoft/olive is a 10-component ml training written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 519 files.

How is Olive architected?

Olive is organized into 5 architecture layers: CLI Layer, Engine Layer, Pass Layer, Model Layer, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through Olive?

Data moves through 6 stages: Parse CLI parameters into RunConfig → Load and validate input model → Execute optimization passes → Evaluate model quality → Cache and rank results → .... Users specify a model and target hardware through CLI commands, which generate a RunConfig that drives the optimization pipeline. The engine loads the model through format-specific handlers, applies a sequence of transformation passes (quantization, conversion, pruning), evaluates each result against accuracy and performance metrics, and outputs the best optimized model variants. Each pass transforms the model while the evaluator provides feedback to guide optimization decisions. This pipeline design reflects a complex multi-stage processing system.

What technologies does Olive use?

The core stack includes PyTorch (Source model format and training framework for fine-tuning workflows), ONNX Runtime (Primary inference engine for optimized models and performance evaluation), Transformers (HuggingFace integration for loading pre-trained models and tokenizers), Pydantic (Configuration validation and schema generation for RunConfig and model specifications), FastMCP (MCP server framework for conversational model optimization interface), OpenVINO (Intel inference optimization and execution provider for CPU/NPU targets), and 1 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does Olive have?

Olive exhibits 4 data pools (Pass result cache, Model registry), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle convergence and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does Olive use?

4 design patterns detected: Pass-based Pipeline, Handler Pattern, Evaluation Feedback Loop, System Abstraction.

Analyzed on April 20, 2026 by CodeSea. Written by .