kedro-org/kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

10,801 stars Python 10 components 17 connections

Production-ready data science framework with modular pipelines and reproducible workflows

Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

A 10-component ml inference with 17 connections. 198 files analyzed. Data flows through 5 distinct pipeline stages.

How Data Flows Through the System

Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters

Load Configuration — ConfigLoader reads YAML/JSON configuration files with environment-specific overrides (config: catalog.*, parameters.*)
Initialize Catalog — DataCatalog instantiates dataset objects based on catalog configuration (config: catalog.*)
Resolve Pipeline — Pipeline dependencies are resolved and nodes are ordered for execution
Execute Nodes — Runner executes nodes with automatic dataset loading/saving and parameter injection (config: parameters.*)
Persist Results — Node outputs are saved to configured datasets in the catalog (config: catalog.*)

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

DataCatalog Registry (in-memory)
Runtime registry of dataset configurations and instances

Configuration Cache (in-memory)
Cached configuration objects loaded from files

Plugin Registry (in-memory)
Discovered and registered plugin hooks

Feedback Loops

Pipeline Validation (recursive, balancing) — Trigger: Pipeline creation or modification. Action: Validates node dependencies and detects circular references. Exit: All dependencies resolved or error raised.
Configuration Reload (polling, balancing) — Trigger: Development mode or configuration change detection. Action: Reloads configuration files and reinitializes context. Exit: Configuration stabilizes.

Delays

Dataset Lazy Loading (async-processing, ~varies by dataset size) — Datasets only loaded when accessed by nodes, improving memory efficiency
Plugin Discovery (async-processing, ~startup time) — Plugins discovered at application startup, affecting initial load time
Template Generation (async-processing, ~varies by template complexity) — Project scaffolding occurs during kedro new command execution

Control Points

Environment Configuration (env-var) — Controls: Which configuration environment to load (base, local, etc.). Default: KEDRO_ENV
Plugin Discovery Toggle (env-var) — Controls: Whether to automatically discover and load plugins
Runner Selection (runtime-toggle) — Controls: Which pipeline runner to use (sequential, parallel, etc.). Default: SequentialRunner
Logging Level (env-var) — Controls: Verbosity of framework logging output

Technology Stack

Click (framework)
Command-line interface framework

OmegaConf (library)
Configuration management and YAML parsing

Pluggy (framework)
Plugin system and hook management

Cookiecutter (library)
Project template engine

FSSpec (library)
Filesystem abstraction for various backends

Rich (library)
Terminal formatting and progress display

PyYAML (library)
YAML configuration file parsing

Pytest (testing)
Testing framework

Behave (testing)
Behavior-driven development testing

Key Components

KedroContext (class) — Central project context managing configuration, catalogs, and pipelines kedro/framework/context/context.py
Pipeline (class) — Directed acyclic graph representing data processing workflows kedro/pipeline/pipeline.py
Node (class) — Individual computational unit that processes inputs to produce outputs kedro/pipeline/node.py
DataCatalog (class) — Registry managing datasets and their load/save operations kedro/io/data_catalog.py
AbstractDataset (class) — Base interface for all dataset implementations with load/save/exists methods kedro/io/core.py
AbstractRunner (class) — Base class for pipeline execution strategies kedro/runner/runner.py
ConfigLoader (class) — Loads and manages configuration from YAML/JSON files with environment support kedro/config/config.py
PluginManager (class) — Discovers and manages Kedro plugins using pluggy hooks kedro/framework/plugins/manager.py
find_run_command (function) — Discovers and returns the main run command for a Kedro project kedro/framework/cli/utils.py
register_pipelines (function) — Project-level function to register all available pipelines kedro/framework/project/__init__.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Inference Repositories

Frequently Asked Questions

What is kedro used for?

Production-ready data science framework with modular pipelines and reproducible workflows kedro-org/kedro is a 10-component ml inference written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 198 files.

How is kedro architected?

kedro is organized into 4 architecture layers: CLI & Templates, Framework Core, Pipeline Engine, I/O Layer. Data flows through 5 distinct pipeline stages. This layered structure enables tight integration between components.

How does data flow through kedro?

Data moves through 5 stages: Load Configuration → Initialize Catalog → Resolve Pipeline → Execute Nodes → Persist Results. Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters This pipeline design reflects a complex multi-stage processing system.

What technologies does kedro use?

The core stack includes Click (Command-line interface framework), OmegaConf (Configuration management and YAML parsing), Pluggy (Plugin system and hook management), Cookiecutter (Project template engine), FSSpec (Filesystem abstraction for various backends), Rich (Terminal formatting and progress display), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does kedro have?

kedro exhibits 3 data pools (DataCatalog Registry, Configuration Cache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does kedro use?

5 design patterns detected: Plugin Architecture, Template-based Scaffolding, Configuration by Convention, Abstract I/O Layer, Dependency Injection.

Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.

kedro-org/kedro

How Data Flows Through the System

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Related Ml Inference Repositories

significant-gravitas/autogpt

ollama/ollama

langflow-ai/langflow

langchain-ai/langchain

ggml-org/llama.cpp

instructkr/claw-code

Frequently Asked Questions