microsoft/olive

Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

2,284 stars Python 10 components 22 connections

ML optimization toolkit for ONNX model finetuning, quantization and hardware-specific optimization

Models flow through configurable optimization passes managed by the Engine, with data containers providing training/evaluation datasets and caching preserving intermediate results across the pipeline stages

Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

Structural Verdict

A 10-component ml training with 22 connections. 484 files analyzed. Highly interconnected — components depend on each other heavily.

How Data Flows Through the System

Models flow through configurable optimization passes managed by the Engine, with data containers providing training/evaluation datasets and caching preserving intermediate results across the pipeline stages

  1. Input Model Loading — Load PyTorch, ONNX, or HuggingFace models through ModelConfig with resource path resolution (config: input_model, model_loader)
  2. Data Preparation — Initialize training and evaluation datasets through DataContainer with optional preprocessing (config: data_configs, datasets)
  3. Pass Execution — Execute optimization passes sequentially (quantization, conversion, pruning) with caching (config: passes, pass_flows)
  4. Model Evaluation — Evaluate optimized models using metrics and benchmarks on target hardware (config: evaluators, metrics)
  5. Search Optimization — Use search algorithms to find optimal hyperparameters across pass configurations (config: search_strategy, search_space)
  6. Output Generation — Package final optimized models with metadata and performance metrics (config: output_dir, output_name)

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Model Cache (file-store)
Optimization results and intermediate models stored with hash-based keys
Azure Blob Storage (cache)
Shared cache for model artifacts and optimization results
Data Container (buffer)
Preprocessed training and evaluation datasets with lazy loading

Feedback Loops

Delays & Async Processing

Control Points

Technology Stack

PyTorch (framework)
Primary ML framework for model training and finetuning
ONNX Runtime (framework)
Target runtime for optimized model inference
Pydantic (library)
Configuration validation and schema generation
Transformers (library)
Hugging Face model support and utilities
Azure Storage (infra)
Cloud storage for caching and model artifacts
MLflow (library)
Experiment tracking and model registry
pytest (testing)
Testing framework with 160+ test modules
Sphinx (build)
Documentation generation with custom directives

Key Components

Sub-Modules

CLI Tools (independence: medium)
Command-line interface for running optimization workflows and utilities
Stable Diffusion LoRA (independence: high)
Specialized optimization for Stable Diffusion models with LoRA adapters
Documentation System (independence: high)
Sphinx-based documentation with custom auto-config directive

Configuration

olive/cache.py (python-dataclass)

Science Pipeline

  1. Model Loading — Load model from HuggingFace, PyTorch, or ONNX format [variable → model dependent] olive/model/config/model_config.py
  2. Quantization — Apply INT8/INT4 quantization with calibration dataset [model dependent → same shape, reduced precision] olive/passes/pytorch/quantization.py
  3. ONNX Conversion — Convert PyTorch model to ONNX with dynamic axes [PyTorch tensors → ONNX graph] olive/passes/onnx/conversion.py
  4. Hardware Optimization — Apply target-specific optimizations (graph fusion, kernel selection) [ONNX graph → optimized ONNX graph] olive/passes/onnx/optimization.py
  5. Evaluation — Benchmark latency, accuracy, memory on target hardware [optimized model → performance metrics] olive/evaluator/olive_evaluator.py

Assumptions & Constraints

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is Olive used for?

ML optimization toolkit for ONNX model finetuning, quantization and hardware-specific optimization microsoft/olive is a 10-component ml training written in Python. Highly interconnected — components depend on each other heavily. The codebase contains 484 files.

How is Olive architected?

Olive is organized into 5 architecture layers: CLI/API Interface, Workflow Engine, Optimization Passes, Model Abstraction, and 1 more. Highly interconnected — components depend on each other heavily. This layered structure enables tight integration between components.

How does data flow through Olive?

Data moves through 6 stages: Input Model Loading → Data Preparation → Pass Execution → Model Evaluation → Search Optimization → .... Models flow through configurable optimization passes managed by the Engine, with data containers providing training/evaluation datasets and caching preserving intermediate results across the pipeline stages This pipeline design reflects a complex multi-stage processing system.

What technologies does Olive use?

The core stack includes PyTorch (Primary ML framework for model training and finetuning), ONNX Runtime (Target runtime for optimized model inference), Pydantic (Configuration validation and schema generation), Transformers (Hugging Face model support and utilities), Azure Storage (Cloud storage for caching and model artifacts), MLflow (Experiment tracking and model registry), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does Olive have?

Olive exhibits 3 data pools (Model Cache, Azure Blob Storage), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle convergence and retry. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does Olive use?

5 design patterns detected: Optimization Pass Pipeline, Resource Path Abstraction, Auto Configuration, Hardware Abstraction, Lazy Evaluation.

Analyzed on March 31, 2026 by CodeSea. Written by .