microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Cross-platform ML inference engine for running ONNX models with hardware acceleration
Model loading, session creation, input tensor preparation, inference execution, and output retrieval
Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.
Structural Verdict
A 10-component ml training with 9 connections. 6752 files analyzed. Well-connected — clear data flow between components.
How Data Flows Through the System
Model loading, session creation, input tensor preparation, inference execution, and output retrieval
- Model Loading — Load ONNX model from file or buffer
- Session Creation — Create inference session with execution providers and optimization settings
- Input Preparation — Convert user data to tensors with appropriate data types and shapes
- Backend Selection — Choose optimal execution provider based on hardware and priority
- Graph Execution — Execute model operations using selected backend (CPU, GPU, WebGL, etc.)
- Output Retrieval — Extract results as tensors and convert to user format
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Caches GPU textures for packed and unpacked tensor data
Compiled WebGL shaders indexed by input shapes and operations
Registered execution providers with priority ordering
Feedback Loops
- Backend Fallback (retry, balancing) — Trigger: Backend execution failure. Action: Try next backend in priority order. Exit: Successful execution or all backends exhausted.
- Shader Compilation (cache-invalidation, reinforcing) — Trigger: New tensor shape combination. Action: Compile and cache new shader program. Exit: Shader cached for future use.
Delays & Async Processing
- Model Loading (async-processing, ~Variable) — Session creation blocked until model parsed
- Shader Compilation (async-processing, ~~100ms) — First inference slower, subsequent calls cached
- WebAssembly Initialization (async-processing, ~Variable) — Runtime setup before inference can begin
Control Points
- GraphOptimizationLevel (feature-flag) — Controls: Model graph optimization aggressiveness. Default: null
- ExecutionProviders (runtime-toggle) — Controls: Which backends are enabled and their priority. Default: null
- IntraOpNumThreads (threshold) — Controls: Parallelism within operations. Default: null
- BUILD_DEFS flags (feature-flag) — Controls: Which backends are compiled into web build. Default: null
Technology Stack
Core inference engine and operator implementations
JavaScript/Web API bindings
.NET API bindings
Python API bindings and training tools
Browser-based GPU acceleration
NVIDIA GPU acceleration
Cross-platform build system
Server-side JavaScript runtime
High-performance web execution
Model format specification
Key Components
- InferenceSession (class) — Main API for loading ONNX models and running inference
js/common/lib/inference-session.ts - Tensor (class) — Multi-dimensional array abstraction for model inputs/outputs
js/common/lib/tensor.ts - WebGLInferenceHandler (class) — Executes operations on WebGL backend with texture-based computation
js/web/lib/onnxjs/backends/webgl/inference-handler.ts - TensorView (class) — Lightweight view over tensor data without ownership for WebAssembly backend
js/web/lib/wasm/jsep/tensor-view.ts - BroadcastUtil (utility) — Calculates broadcasting shapes for tensor operations
js/web/lib/wasm/jsep/util.ts - SessionOptions (class) — Configuration for inference sessions including optimization levels and execution providers
csharp/src/Microsoft.ML.OnnxRuntime/SessionOptions.shared.cs - OrtValue (class) — Wrapper for native tensor data with memory management
csharp/src/Microsoft.ML.OnnxRuntime/OrtValue.shared.cs - Graph (class) — Represents ONNX model computation graph with nodes and values
js/web/lib/onnxjs/graph.ts - DataType (type-def) — Enum mapping tensor data types to ONNX specification
js/web/lib/wasm/wasm-common.ts - registerBackend (function) — Registers execution backends with priority system
onnxruntime-common
Sub-Modules
Cross-platform JavaScript inference with WebGL, WebGPU, and WebAssembly backends
.NET bindings for inference with native interop and memory management
Model optimization, quantization, and transformer-specific utilities
PyTorch training acceleration with multi-GPU support
Configuration
lgtm.yml (yaml)
path_classifiers.library(array, unknown) — default: [object Object]queries(array, unknown) — default: [object Object]extraction.cpp.prepare.packages(array, unknown) — default: ninja-buildextraction.cpp.after_prepare(array, unknown) — default: mkdir custom_cmake,wget --quiet -O - "https://github.com/Kitware/CMake/releases/download/v3.24.3/cmake-3.24.3-linux-x86_64.tar.gz" | tar --strip-components=1 -xz -C custom_cmake,export PATH=$(pwd)/custom_cmake/bin:${PATH}extraction.cpp.index.build_command(array, unknown) — default: ./build.sh --cmake_generator Ninja --config Debug --skip_submodule_sync --build_shared_lib --parallel --skip_tests --minimal_build --disable_exceptions --enable_training_opsextraction.csharp.index.solution(array, unknown) — default: csharp/OnnxRuntime.CSharp.slnextraction.csharp.index.buildless(boolean, unknown) — default: trueextraction.csharp.index.nuget_restore(boolean, unknown) — default: true
onnxruntime/python/tools/microbench/attention.py (python-dataclass)
batch_size(int, unknown)seq_len(int, unknown)hidden_size(int, unknown)length(int, unknown)data_type(type, unknown)
onnxruntime/python/tools/microbench/cast.py (python-dataclass)
x(int, unknown)y(int, unknown)m(int, unknown)n(int, unknown)input_data_type(type, unknown)output_data_type(type, unknown)
onnxruntime/python/tools/microbench/cast.py (python-dataclass)
token_type_ids_dim0(int, unknown)input_ids_dim1(int, unknown)
Science Pipeline
- Model Parsing — Parse ONNX protobuf and build computation graph [null → Graph structure]
js/web/lib/onnxjs/graph.ts - Input Preparation — Convert user data to tensor format with shape validation [User-defined → Tensor with validated dims]
js/common/lib/tensor.ts - Backend Dispatch — Route operations to WebGL, WebGPU, or CPU backend [Tensor dims → Backend-specific representation]
js/web/lib/onnxjs/backends/webgl/inference-handler.ts - Texture Encoding — Pack tensor data into GPU textures with layout strategy [(N, C, H, W) → 2D texture with packed channels]
js/web/lib/onnxjs/backends/webgl/texture-layout.ts - Shader Execution — Run WebGL/WebGPU compute shaders for each operation [Texture dimensions → Output texture]
js/web/lib/onnxjs/backends/webgl/inference-handler.ts - Result Extraction — Read GPU textures back to CPU tensors [Output texture → Final tensor dims]
js/web/lib/onnxjs/backends/webgl/inference-handler.ts
Assumptions & Constraints
- [warning] Assumes input tensors follow numpy-style broadcasting rules but doesn't validate rank compatibility (shape)
- [info] Assumes WebGL texture size limits but relies on layout strategy to handle overflow (device)
- [critical] Maps string types to ONNX enum values but throws on unsupported types (dtype)
- [warning] Validates data length matches tensor size but assumes dims are consistent (shape)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is onnxruntime used for?
Cross-platform ML inference engine for running ONNX models with hardware acceleration microsoft/onnxruntime is a 10-component ml training written in C++. Well-connected — clear data flow between components. The codebase contains 6752 files.
How is onnxruntime architected?
onnxruntime is organized into 4 architecture layers: Language Bindings, Core Runtime, Execution Providers, Training Support. Well-connected — clear data flow between components. This layered structure enables tight integration between components.
How does data flow through onnxruntime?
Data moves through 6 stages: Model Loading → Session Creation → Input Preparation → Backend Selection → Graph Execution → .... Model loading, session creation, input tensor preparation, inference execution, and output retrieval This pipeline design reflects a complex multi-stage processing system.
What technologies does onnxruntime use?
The core stack includes C++ (Core inference engine and operator implementations), TypeScript (JavaScript/Web API bindings), C# (.NET API bindings), Python (Python API bindings and training tools), WebGL/WebGPU (Browser-based GPU acceleration), CUDA (NVIDIA GPU acceleration), and 4 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does onnxruntime have?
onnxruntime exhibits 3 data pools (TextureDataCache, ProgramArtifactCache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle retry and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does onnxruntime use?
5 design patterns detected: Multi-Platform Bindings, Backend Registration, Tensor View Pattern, Program/Shader Compilation, Disposable Resources.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.