zenml-io/zenml

ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.

5,302 stars Python 10 components 12 connections

MLOps platform orchestrating AI/ML pipelines from classical ML to agentic workflows

ML data flows through ingestion, preprocessing, training, evaluation, and deployment stages orchestrated by ZenML pipelines

Under the hood, the system uses 3 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.

A 10-component ml training with 12 connections. 2156 files analyzed. Data flows through 5 distinct pipeline stages.

How Data Flows Through the System

ML data flows through ingestion, preprocessing, training, evaluation, and deployment stages orchestrated by ZenML pipelines

Data Ingestion — Load raw datasets from various sources including LakeFS, S3, or local files
Preprocessing — Clean, transform, and prepare data using sklearn pipelines or custom transformers
Model Training — Train ML models using frameworks like HuggingFace Transformers with LoRA fine-tuning
Evaluation — Compute metrics and validate model performance using test datasets
Deployment — Deploy trained models as web services using FastAPI runners with dashboard interfaces

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Artifact Store (file-store)
Versioned ML artifacts, models, and datasets stored across pipeline runs

Metadata Database (database)
Pipeline run metadata, experiment tracking, and lineage information

LakeFS Repository (database)
Git-like versioned data lake with branch and commit semantics

Feedback Loops

Pipeline Retry Logic (retry, balancing) — Trigger: Step failure or infrastructure error. Action: Retry failed pipeline steps with exponential backoff. Exit: Max retries reached or step succeeds.
Model Performance Monitoring (training-loop, balancing) — Trigger: Scheduled evaluation runs. Action: Compare model metrics against thresholds. Exit: Performance degradation detected triggering retraining.
Agent Outer Loop (recursive, reinforcing) — Trigger: Agent receives task. Action: Plan, execute tools, evaluate results, and iterate. Exit: Task completed or max iterations reached.

Delays

Model Training (async-processing, ~minutes to hours) — Pipeline blocks until training completes with artifacts stored
Container Build (async-processing, ~1-10 minutes) — Pipeline execution waits for custom environment containerization
LakeFS Commit (eventual-consistency, ~seconds) — Data changes become visible after commit operation completes

Control Points

Pipeline Configuration (env-var) — Controls: Model hyperparameters, data paths, and infrastructure settings
Integration Registry (runtime-toggle) — Controls: Which ML framework integrations are active
Deployment Settings (feature-flag) — Controls: Endpoint configuration, middleware, and security settings for deployed models

Technology Stack

FastAPI (framework)
Web framework for deployment services and REST APIs

Pydantic (library)
Data validation and settings management throughout the codebase

SQLAlchemy (database)
Database ORM for metadata and experiment tracking

Click (library)
Command-line interface framework

Docker (infra)
Containerization for pipeline execution environments

HuggingFace Transformers (library)
Language model training and inference in examples

Gradio (library)
Web UI for model demos and interfaces

LakeFS (database)
Data versioning and branch management for ML datasets

Key Components

Client (class) — Main entry point for interacting with ZenML services, managing pipelines and artifacts src/zenml/client.py
BaseDeploymentAppRunner (class) — Abstract base for creating web applications that serve ML models and pipelines src/zenml/deployers/server/app.py
FastAPIDeploymentAppRunner (class) — FastAPI implementation of deployment runner with CORS, static files, and templating src/zenml/deployers/server/fastapi/app.py
EndpointAdapter (class) — Converts framework-agnostic endpoint specifications to framework-specific endpoints src/zenml/deployers/server/adapters.py
MiddlewareAdapter (class) — Adapts middleware specifications for different web frameworks src/zenml/deployers/server/adapters.py
LakeFSRef (class) — Lightweight JSON-serializable pointer to datasets stored in LakeFS for data versioning examples/lakefs_data_versioning/utils/lakefs_ref.py
lakefs_utils (module) — Helper functions for LakeFS interaction via SDK and S3-compatible gateway examples/lakefs_data_versioning/utils/lakefs_utils.py
compute_metrics (function) — Computes accuracy metrics for NLP model evaluation using HuggingFace datasets examples/e2e_nlp/utils/misc.py
sentiment_analysis (cli-command) — Launches Gradio interface for text classification sentiment analysis examples/e2e_nlp/gradio/app.py
load_base_model (function) — Loads and configures base language models for fine-tuning with quantization options examples/llm_finetuning/utils/loaders.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is zenml used for?

MLOps platform orchestrating AI/ML pipelines from classical ML to agentic workflows zenml-io/zenml is a 10-component ml training written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 2156 files.

How is zenml architected?

zenml is organized into 4 architecture layers: Core SDK, Server Backend, Integrations, Deployment Framework. Data flows through 5 distinct pipeline stages. This layered structure enables tight integration between components.

How does data flow through zenml?

Data moves through 5 stages: Data Ingestion → Preprocessing → Model Training → Evaluation → Deployment. ML data flows through ingestion, preprocessing, training, evaluation, and deployment stages orchestrated by ZenML pipelines This pipeline design reflects a complex multi-stage processing system.

What technologies does zenml use?

The core stack includes FastAPI (Web framework for deployment services and REST APIs), Pydantic (Data validation and settings management throughout the codebase), SQLAlchemy (Database ORM for metadata and experiment tracking), Click (Command-line interface framework), Docker (Containerization for pipeline execution environments), HuggingFace Transformers (Language model training and inference in examples), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does zenml have?

zenml exhibits 3 data pools (Artifact Store, Metadata Database), 3 feedback loops, 3 control points, 3 delays. The feedback loops handle retry and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does zenml use?

4 design patterns detected: Plugin Architecture, Pipeline as Code, Framework Abstraction, Data Versioning.

Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.

zenml-io/zenml

How Data Flows Through the System

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Related Ml Training Repositories

tensorflow/tensorflow

automatic1111/stable-diffusion-webui

huggingface/transformers

ggml-org/llama.cpp

pytorch/pytorch

openai/whisper

Frequently Asked Questions