Data Pipeline Repositories

21 open-source data pipeline projects analyzed for architecture, code patterns, and dependencies.

Repository Language Components Stars
redis/redis
For developers, who are building real-time data-driven applications, Redis is.... 9 components, C. S
C 9 73,604
crewaiinc/crewai
Framework for orchestrating role-playing, autonomous AI agents. By fostering .... 10 components, Pyt
Python 10 47,716
dandavison/delta
A syntax-highlighting pager for git, diff, grep, and blame output. 12 components, Rust. See how delt
Rust 12 29,822
statelyai/xstate
State machines, statecharts, and actors for complex logic. 10 components, TypeScript. See how xstate
TypeScript 10 29,372
chroma-core/chroma
Data infrastructure for AI. 10 components, Rust. See how chroma is built.
Rust 10 27,516
kestra-io/kestra
Event Driven Orchestration & Scheduling Platform for Mission Critical Applica.... 8 components, Java
Java 8 26,641
apache/flink
Apache Flink. 10 components, Java. See how flink is built.
Java 10 25,907
prefecthq/prefect
Prefect is a workflow orchestration framework for building resilient data pip.... 10 components, Pyt
Python 10 21,958
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs,.... 8 components, Pyth
Python 8 21,107
sveltejs/kit
web development, streamlined. 10 components, JavaScript. See how kit is built.
JavaScript 10 20,399
spotify/luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs.... 10 components, Pyt
Python 10 18,705
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, .... 8 components, Java
Java 8 12,674
dbt-labs/dbt-core
dbt enables data analysts and engineers to transform their data using the sam.... 10 components, Pyt
Python 10 12,462
openreplay/openreplay
Session replay, cobrowsing and product analytics you can self-host. Best for .... 10 components, Typ
TypeScript 10 11,894
kyegomez/swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. We.... 10 components, Pyt
Python 10 6,161
evidence-dev/evidence
Business intelligence as code: build fast, interactive data visualizations in.... 9 components, Java
JavaScript 9 6,115
lightdash/lightdash
Self-serve BI to 10x your data team ⚡️. 10 components, TypeScript. See how lightdash is built.
TypeScript 10 5,662
orchest/orchest
Build data pipelines, the easy way 🛠️. 10 components, TypeScript. See how orchest is built.
TypeScript 10 4,136
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your .... 8 components, Pyth
Python 8 2,392
sodadata/soda-core
Data Contracts engine for the modern data stack. https://www.soda.io. 10 components, Python. See how
Python 10 2,317
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Mo.... 12 components, HTM
HTML 12 2,290

Popular Data Pipeline Comparisons

Browse Other Categories

Backend Api Cli Tool Dashboard Dev Tool Fullstack Library Ml Inference Ml Training Nextjs App Python Ml Python Science Scientific Computing Weather Climate All Repositories

Analyze your own data pipeline

Paste a GitHub URL and get architecture maps, code patterns, and dependency analysis.

Try CodeSea