kedro-org/kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

10,801 stars Python 10 components 17 connections

Production-ready data science framework with modular pipelines and reproducible workflows

Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

Structural Verdict

A 10-component ml inference with 17 connections. 198 files analyzed. Highly interconnected — components depend on each other heavily.

How Data Flows Through the System

Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters

  1. Load Configuration — ConfigLoader reads YAML/JSON configuration files with environment-specific overrides (config: catalog.*, parameters.*)
  2. Initialize Catalog — DataCatalog instantiates dataset objects based on catalog configuration (config: catalog.*)
  3. Resolve Pipeline — Pipeline dependencies are resolved and nodes are ordered for execution
  4. Execute Nodes — Runner executes nodes with automatic dataset loading/saving and parameter injection (config: parameters.*)
  5. Persist Results — Node outputs are saved to configured datasets in the catalog (config: catalog.*)

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

DataCatalog Registry (in-memory)
Runtime registry of dataset configurations and instances
Configuration Cache (in-memory)
Cached configuration objects loaded from files
Plugin Registry (in-memory)
Discovered and registered plugin hooks

Feedback Loops

Delays & Async Processing

Control Points

Technology Stack

Click (framework)
Command-line interface framework
OmegaConf (library)
Configuration management and YAML parsing
Pluggy (framework)
Plugin system and hook management
Cookiecutter (library)
Project template engine
FSSpec (library)
Filesystem abstraction for various backends
Rich (library)
Terminal formatting and progress display
PyYAML (library)
YAML configuration file parsing
Pytest (testing)
Testing framework
Behave (testing)
Behavior-driven development testing

Key Components

Configuration

asv.conf.json (json)

kedro/pipeline/preview_contract.py (python-dataclass)

kedro/pipeline/preview_contract.py (python-dataclass)

kedro/pipeline/preview_contract.py (python-dataclass)

Science Pipeline

  1. Configuration Loading — YAML/JSON parsing with OmegaConf then merge with environment overrides kedro/config/config.py
  2. Dataset Resolution — Instantiate dataset objects from catalog configuration using factory pattern kedro/io/data_catalog.py
  3. Pipeline Graph Construction — Build DAG from nodes and validate dependencies kedro/pipeline/pipeline.py
  4. Node Execution — Load inputs from catalog, execute node function, save outputs back to catalog kedro/runner/runner.py

Assumptions & Constraints

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Inference Repositories

Frequently Asked Questions

What is kedro used for?

Production-ready data science framework with modular pipelines and reproducible workflows kedro-org/kedro is a 10-component ml inference written in Python. Highly interconnected — components depend on each other heavily. The codebase contains 198 files.

How is kedro architected?

kedro is organized into 4 architecture layers: CLI & Templates, Framework Core, Pipeline Engine, I/O Layer. Highly interconnected — components depend on each other heavily. This layered structure enables tight integration between components.

How does data flow through kedro?

Data moves through 5 stages: Load Configuration → Initialize Catalog → Resolve Pipeline → Execute Nodes → Persist Results. Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters This pipeline design reflects a complex multi-stage processing system.

What technologies does kedro use?

The core stack includes Click (Command-line interface framework), OmegaConf (Configuration management and YAML parsing), Pluggy (Plugin system and hook management), Cookiecutter (Project template engine), FSSpec (Filesystem abstraction for various backends), Rich (Terminal formatting and progress display), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does kedro have?

kedro exhibits 3 data pools (DataCatalog Registry, Configuration Cache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does kedro use?

5 design patterns detected: Plugin Architecture, Template-based Scaffolding, Configuration by Convention, Abstract I/O Layer, Dependency Injection.

Analyzed on March 31, 2026 by CodeSea. Written by .