microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

19,709 stars C++ 10 components 9 connections

Cross-platform ML inference engine for running ONNX models with hardware acceleration

Model loading, session creation, input tensor preparation, inference execution, and output retrieval

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

Structural Verdict

A 10-component ml training with 9 connections. 6752 files analyzed. Well-connected — clear data flow between components.

How Data Flows Through the System

Model loading, session creation, input tensor preparation, inference execution, and output retrieval

  1. Model Loading — Load ONNX model from file or buffer
  2. Session Creation — Create inference session with execution providers and optimization settings
  3. Input Preparation — Convert user data to tensors with appropriate data types and shapes
  4. Backend Selection — Choose optimal execution provider based on hardware and priority
  5. Graph Execution — Execute model operations using selected backend (CPU, GPU, WebGL, etc.)
  6. Output Retrieval — Extract results as tensors and convert to user format

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

TextureDataCache (cache)
Caches GPU textures for packed and unpacked tensor data
ProgramArtifactCache (cache)
Compiled WebGL shaders indexed by input shapes and operations
BackendRegistry (state-store)
Registered execution providers with priority ordering

Feedback Loops

Delays & Async Processing

Control Points

Technology Stack

C++ (framework)
Core inference engine and operator implementations
TypeScript (framework)
JavaScript/Web API bindings
C# (framework)
.NET API bindings
Python (framework)
Python API bindings and training tools
WebGL/WebGPU (framework)
Browser-based GPU acceleration
CUDA (framework)
NVIDIA GPU acceleration
CMake (build)
Cross-platform build system
Node.js (framework)
Server-side JavaScript runtime
WebAssembly (framework)
High-performance web execution
ONNX (framework)
Model format specification

Key Components

Sub-Modules

JavaScript Runtime (independence: high)
Cross-platform JavaScript inference with WebGL, WebGPU, and WebAssembly backends
C# Runtime (independence: high)
.NET bindings for inference with native interop and memory management
Python Tools (independence: medium)
Model optimization, quantization, and transformer-specific utilities
Training Runtime (independence: medium)
PyTorch training acceleration with multi-GPU support

Configuration

lgtm.yml (yaml)

onnxruntime/python/tools/microbench/attention.py (python-dataclass)

onnxruntime/python/tools/microbench/cast.py (python-dataclass)

onnxruntime/python/tools/microbench/cast.py (python-dataclass)

Science Pipeline

  1. Model Parsing — Parse ONNX protobuf and build computation graph [null → Graph structure] js/web/lib/onnxjs/graph.ts
  2. Input Preparation — Convert user data to tensor format with shape validation [User-defined → Tensor with validated dims] js/common/lib/tensor.ts
  3. Backend Dispatch — Route operations to WebGL, WebGPU, or CPU backend [Tensor dims → Backend-specific representation] js/web/lib/onnxjs/backends/webgl/inference-handler.ts
  4. Texture Encoding — Pack tensor data into GPU textures with layout strategy [(N, C, H, W) → 2D texture with packed channels] js/web/lib/onnxjs/backends/webgl/texture-layout.ts
  5. Shader Execution — Run WebGL/WebGPU compute shaders for each operation [Texture dimensions → Output texture] js/web/lib/onnxjs/backends/webgl/inference-handler.ts
  6. Result Extraction — Read GPU textures back to CPU tensors [Output texture → Final tensor dims] js/web/lib/onnxjs/backends/webgl/inference-handler.ts

Assumptions & Constraints

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is onnxruntime used for?

Cross-platform ML inference engine for running ONNX models with hardware acceleration microsoft/onnxruntime is a 10-component ml training written in C++. Well-connected — clear data flow between components. The codebase contains 6752 files.

How is onnxruntime architected?

onnxruntime is organized into 4 architecture layers: Language Bindings, Core Runtime, Execution Providers, Training Support. Well-connected — clear data flow between components. This layered structure enables tight integration between components.

How does data flow through onnxruntime?

Data moves through 6 stages: Model Loading → Session Creation → Input Preparation → Backend Selection → Graph Execution → .... Model loading, session creation, input tensor preparation, inference execution, and output retrieval This pipeline design reflects a complex multi-stage processing system.

What technologies does onnxruntime use?

The core stack includes C++ (Core inference engine and operator implementations), TypeScript (JavaScript/Web API bindings), C# (.NET API bindings), Python (Python API bindings and training tools), WebGL/WebGPU (Browser-based GPU acceleration), CUDA (NVIDIA GPU acceleration), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does onnxruntime have?

onnxruntime exhibits 3 data pools (TextureDataCache, ProgramArtifactCache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle retry and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does onnxruntime use?

5 design patterns detected: Multi-Platform Bindings, Backend Registration, Tensor View Pattern, Program/Shader Compilation, Disposable Resources.

Analyzed on March 31, 2026 by CodeSea. Written by .