microsoft/olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
ML optimization toolkit for ONNX model finetuning, quantization and hardware-specific optimization
Models flow through configurable optimization passes managed by the Engine, with data containers providing training/evaluation datasets and caching preserving intermediate results across the pipeline stages
Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.
Structural Verdict
A 10-component ml training with 22 connections. 484 files analyzed. Highly interconnected — components depend on each other heavily.
How Data Flows Through the System
Models flow through configurable optimization passes managed by the Engine, with data containers providing training/evaluation datasets and caching preserving intermediate results across the pipeline stages
- Input Model Loading — Load PyTorch, ONNX, or HuggingFace models through ModelConfig with resource path resolution (config: input_model, model_loader)
- Data Preparation — Initialize training and evaluation datasets through DataContainer with optional preprocessing (config: data_configs, datasets)
- Pass Execution — Execute optimization passes sequentially (quantization, conversion, pruning) with caching (config: passes, pass_flows)
- Model Evaluation — Evaluate optimized models using metrics and benchmarks on target hardware (config: evaluators, metrics)
- Search Optimization — Use search algorithms to find optimal hyperparameters across pass configurations (config: search_strategy, search_space)
- Output Generation — Package final optimized models with metadata and performance metrics (config: output_dir, output_name)
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Optimization results and intermediate models stored with hash-based keys
Shared cache for model artifacts and optimization results
Preprocessed training and evaluation datasets with lazy loading
Feedback Loops
- Search Optimization (convergence, balancing) — Trigger: Search algorithm initialization. Action: Generate and evaluate pass configurations. Exit: Convergence criteria met or max iterations.
- Pass Validation Retry (retry, balancing) — Trigger: Pass execution failure. Action: Retry with fallback configuration. Exit: Success or max retries exceeded.
- Cache Lookup (cache-invalidation, balancing) — Trigger: Model or config change. Action: Invalidate cached results and recompute. Exit: Cache consistency restored.
Delays & Async Processing
- Model Loading (async-processing, ~variable) — Models loaded on-demand to reduce memory usage
- Data Preprocessing (batch-window, ~variable) — Dataset preprocessing deferred until first access
- Azure Cache Sync (eventual-consistency, ~network dependent) — Cache updates propagate asynchronously to blob storage
Control Points
- Cache Directory (env-var) — Controls: Where optimization results are stored locally. Default: cache_dir
- Pass Configuration (runtime-toggle) — Controls: Which optimization passes are enabled and their parameters. Default: passes
- Hardware Target (runtime-toggle) — Controls: Target accelerator type and optimization constraints. Default: systems
- Search Strategy (feature-flag) — Controls: Hyperparameter search algorithm and search space. Default: search_strategy
Technology Stack
Primary ML framework for model training and finetuning
Target runtime for optimized model inference
Configuration validation and schema generation
Hugging Face model support and utilities
Cloud storage for caching and model artifacts
Experiment tracking and model registry
Testing framework with 160+ test modules
Documentation generation with custom directives
Key Components
- run (function) — Main API function to execute optimization workflows from configuration
olive/cli/api.py - Engine (class) — Core execution engine that runs optimization passes on models through configured pipelines
olive/engine/engine.py - Pass (class) — Base class for all optimization passes defining the interface for model transformations
olive/passes/pass_base.py - ModelConfig (class) — Unified configuration for different model types with loading and conversion capabilities
olive/model/config/model_config.py - CacheManager (class) — Manages caching of optimization results with support for local and Azure blob storage
olive/cache.py - AcceleratorSpec (class) — Hardware specification abstraction for CPUs, GPUs, NPUs with optimization constraints
olive/hardware/accelerator.py - DataContainer (class) — Manages training and evaluation datasets with lazy loading and preprocessing pipelines
olive/data/container.py - Evaluator (class) — Evaluates model performance using various metrics and benchmarks on target hardware
olive/evaluator/olive_evaluator.py - SearchAlgorithm (class) — Base class for hyperparameter search algorithms to find optimal pass configurations
olive/search/search_algorithm.py - WorkflowConfig (class) — Configuration schema for complete optimization workflows including passes, data, and evaluation
olive/workflows/run/config.py
Sub-Modules
Command-line interface for running optimization workflows and utilities
Specialized optimization for Stable Diffusion models with LoRA adapters
Sphinx-based documentation with custom auto-config directive
Configuration
olive/cache.py (python-dataclass)
cache_dir(Path, unknown)runs(Path, unknown)evaluations(Path, unknown)resources(Path, unknown)mlflow(Path, unknown)
Science Pipeline
- Model Loading — Load model from HuggingFace, PyTorch, or ONNX format [variable → model dependent]
olive/model/config/model_config.py - Quantization — Apply INT8/INT4 quantization with calibration dataset [model dependent → same shape, reduced precision]
olive/passes/pytorch/quantization.py - ONNX Conversion — Convert PyTorch model to ONNX with dynamic axes [PyTorch tensors → ONNX graph]
olive/passes/onnx/conversion.py - Hardware Optimization — Apply target-specific optimizations (graph fusion, kernel selection) [ONNX graph → optimized ONNX graph]
olive/passes/onnx/optimization.py - Evaluation — Benchmark latency, accuracy, memory on target hardware [optimized model → performance metrics]
olive/evaluator/olive_evaluator.py
Assumptions & Constraints
- [warning] Assumes UNet input tensor shape (2*batch_size, 4, 128, 128) for latent diffusion but shape is hardcoded (shape)
- [critical] Quantization passes assume specific tensor dtypes but may not validate input model compatibility (dtype)
- [warning] Evaluation assumes models can be moved to target device without checking memory constraints (device)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is Olive used for?
ML optimization toolkit for ONNX model finetuning, quantization and hardware-specific optimization microsoft/olive is a 10-component ml training written in Python. Highly interconnected — components depend on each other heavily. The codebase contains 484 files.
How is Olive architected?
Olive is organized into 5 architecture layers: CLI/API Interface, Workflow Engine, Optimization Passes, Model Abstraction, and 1 more. Highly interconnected — components depend on each other heavily. This layered structure enables tight integration between components.
How does data flow through Olive?
Data moves through 6 stages: Input Model Loading → Data Preparation → Pass Execution → Model Evaluation → Search Optimization → .... Models flow through configurable optimization passes managed by the Engine, with data containers providing training/evaluation datasets and caching preserving intermediate results across the pipeline stages This pipeline design reflects a complex multi-stage processing system.
What technologies does Olive use?
The core stack includes PyTorch (Primary ML framework for model training and finetuning), ONNX Runtime (Target runtime for optimized model inference), Pydantic (Configuration validation and schema generation), Transformers (Hugging Face model support and utilities), Azure Storage (Cloud storage for caching and model artifacts), MLflow (Experiment tracking and model registry), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does Olive have?
Olive exhibits 3 data pools (Model Cache, Azure Blob Storage), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle convergence and retry. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does Olive use?
5 design patterns detected: Optimization Pass Pipeline, Resource Path Abstraction, Auto Configuration, Hardware Abstraction, Lazy Evaluation.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.