kedro-org/kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
Production-ready data science framework with modular pipelines and reproducible workflows
Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters
Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.
Structural Verdict
A 10-component ml inference with 17 connections. 198 files analyzed. Highly interconnected — components depend on each other heavily.
How Data Flows Through the System
Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters
- Load Configuration — ConfigLoader reads YAML/JSON configuration files with environment-specific overrides (config: catalog.*, parameters.*)
- Initialize Catalog — DataCatalog instantiates dataset objects based on catalog configuration (config: catalog.*)
- Resolve Pipeline — Pipeline dependencies are resolved and nodes are ordered for execution
- Execute Nodes — Runner executes nodes with automatic dataset loading/saving and parameter injection (config: parameters.*)
- Persist Results — Node outputs are saved to configured datasets in the catalog (config: catalog.*)
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Runtime registry of dataset configurations and instances
Cached configuration objects loaded from files
Discovered and registered plugin hooks
Feedback Loops
- Pipeline Validation (recursive, balancing) — Trigger: Pipeline creation or modification. Action: Validates node dependencies and detects circular references. Exit: All dependencies resolved or error raised.
- Configuration Reload (polling, balancing) — Trigger: Development mode or configuration change detection. Action: Reloads configuration files and reinitializes context. Exit: Configuration stabilizes.
Delays & Async Processing
- Dataset Lazy Loading (async-processing, ~varies by dataset size) — Datasets only loaded when accessed by nodes, improving memory efficiency
- Plugin Discovery (async-processing, ~startup time) — Plugins discovered at application startup, affecting initial load time
- Template Generation (async-processing, ~varies by template complexity) — Project scaffolding occurs during kedro new command execution
Control Points
- Environment Configuration (env-var) — Controls: Which configuration environment to load (base, local, etc.). Default: KEDRO_ENV
- Plugin Discovery Toggle (env-var) — Controls: Whether to automatically discover and load plugins
- Runner Selection (runtime-toggle) — Controls: Which pipeline runner to use (sequential, parallel, etc.). Default: SequentialRunner
- Logging Level (env-var) — Controls: Verbosity of framework logging output
Technology Stack
Command-line interface framework
Configuration management and YAML parsing
Plugin system and hook management
Project template engine
Filesystem abstraction for various backends
Terminal formatting and progress display
YAML configuration file parsing
Testing framework
Behavior-driven development testing
Key Components
- KedroContext (class) — Central project context managing configuration, catalogs, and pipelines
kedro/framework/context/context.py - Pipeline (class) — Directed acyclic graph representing data processing workflows
kedro/pipeline/pipeline.py - Node (class) — Individual computational unit that processes inputs to produce outputs
kedro/pipeline/node.py - DataCatalog (class) — Registry managing datasets and their load/save operations
kedro/io/data_catalog.py - AbstractDataset (class) — Base interface for all dataset implementations with load/save/exists methods
kedro/io/core.py - AbstractRunner (class) — Base class for pipeline execution strategies
kedro/runner/runner.py - ConfigLoader (class) — Loads and manages configuration from YAML/JSON files with environment support
kedro/config/config.py - PluginManager (class) — Discovers and manages Kedro plugins using pluggy hooks
kedro/framework/plugins/manager.py - find_run_command (function) — Discovers and returns the main run command for a Kedro project
kedro/framework/cli/utils.py - register_pipelines (function) — Project-level function to register all available pipelines
kedro/framework/project/__init__.py
Configuration
asv.conf.json (json)
version(number, unknown) — default: 1project(string, unknown) — default: Kedroproject_url(string, unknown) — default: https://kedro.org/repo(string, unknown) — default: .install_command(array, unknown) — default: pip install -e . kedro-datasets[pandas-csvdataset]branches(array, unknown) — default: mainenvironment_name(string, unknown) — default: kedroenvironment_type(string, unknown) — default: virtualenv- +5 more parameters
kedro/pipeline/preview_contract.py (python-dataclass)
content(str, unknown)
kedro/pipeline/preview_contract.py (python-dataclass)
content(str, unknown)
kedro/pipeline/preview_contract.py (python-dataclass)
renderer_key(str, unknown)content(JSONObject, unknown)
Science Pipeline
- Configuration Loading — YAML/JSON parsing with OmegaConf then merge with environment overrides
kedro/config/config.py - Dataset Resolution — Instantiate dataset objects from catalog configuration using factory pattern
kedro/io/data_catalog.py - Pipeline Graph Construction — Build DAG from nodes and validate dependencies
kedro/pipeline/pipeline.py - Node Execution — Load inputs from catalog, execute node function, save outputs back to catalog
kedro/runner/runner.py
Assumptions & Constraints
- [warning] Assumes node functions accept parameters matching their signature but no runtime validation of parameter types (dependency)
- [info] Assumes datasets can be loaded/saved consistently but format validation left to implementations (format)
- [warning] Assumes all node dependencies can be resolved but circular dependency detection only occurs at pipeline creation (dependency)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Inference Repositories
Frequently Asked Questions
What is kedro used for?
Production-ready data science framework with modular pipelines and reproducible workflows kedro-org/kedro is a 10-component ml inference written in Python. Highly interconnected — components depend on each other heavily. The codebase contains 198 files.
How is kedro architected?
kedro is organized into 4 architecture layers: CLI & Templates, Framework Core, Pipeline Engine, I/O Layer. Highly interconnected — components depend on each other heavily. This layered structure enables tight integration between components.
How does data flow through kedro?
Data moves through 5 stages: Load Configuration → Initialize Catalog → Resolve Pipeline → Execute Nodes → Persist Results. Data flows through pipelines where nodes transform datasets loaded from the catalog, with configuration controlling both data sources and processing parameters This pipeline design reflects a complex multi-stage processing system.
What technologies does kedro use?
The core stack includes Click (Command-line interface framework), OmegaConf (Configuration management and YAML parsing), Pluggy (Plugin system and hook management), Cookiecutter (Project template engine), FSSpec (Filesystem abstraction for various backends), Rich (Terminal formatting and progress display), and 3 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does kedro have?
kedro exhibits 3 data pools (DataCatalog Registry, Configuration Cache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does kedro use?
5 design patterns detected: Plugin Architecture, Template-based Scaffolding, Configuration by Convention, Abstract I/O Layer, Dependency Injection.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.