databendlabs/databend

Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.

9,220 stars Rust 10 components 6 connections

Cloud-native data warehouse built in Rust with vector search, full-text search, and agent orchestration

Data flows from external sources through the query engine's planning and execution layers, with metadata managed by the distributed meta layer, and results returned via Arrow format

Under the hood, the system uses 3 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.

Structural Verdict

A 10-component dashboard with 6 connections. 3733 files analyzed. Loosely coupled — components are relatively independent.

How Data Flows Through the System

Data flows from external sources through the query engine's planning and execution layers, with metadata managed by the distributed meta layer, and results returned via Arrow format

  1. SQL Input — SQL queries received via Python API or network protocols
  2. Query Planning — SQL parsed and optimized into execution plans by Planner
  3. Meta Lookup — Schema and cluster metadata retrieved from meta layer
  4. Data Execution — Query executed against storage backends with parallel processing
  5. Result Format — Results formatted as Arrow format or DataBlocks for return

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Meta Store (state-store)
Raft-based distributed metadata storage for schema and cluster state
Query Results Cache (cache)
Cached query results and intermediate data
Storage Backend (file-store)
Object storage for data files and backups

Feedback Loops

Delays & Async Processing

Control Points

Technology Stack

Rust (framework)
Core runtime and query engine
PyO3 (library)
Python binding generation
Tokio (framework)
Async runtime
Apache Arrow (library)
Columnar data format
OpenDAL (library)
Storage abstraction layer
Python (framework)
Test infrastructure and bindings
TOML (library)
Configuration file format

Key Components

Sub-Modules

bendpy (independence: medium)
Python bindings for embedded Databend execution with SessionContext API
bendsave (independence: high)
Backup and restore CLI tool for cluster data and metadata
databend_test_helper (independence: high)
Python library for managing Databend clusters during testing

Configuration

benchmark/benchmark_cloud.py (python-dataclass)

benchmark/benchmark_cloud.py (python-dataclass)

tests/nox/suites/copy/conftest.py (python-dataclass)

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Dashboard Repositories

Frequently Asked Questions

What is databend used for?

Cloud-native data warehouse built in Rust with vector search, full-text search, and agent orchestration databendlabs/databend is a 10-component dashboard written in Rust. Loosely coupled — components are relatively independent. The codebase contains 3733 files.

How is databend architected?

databend is organized into 5 architecture layers: Python API, Query Engine, Meta Layer, Common Libraries, and 1 more. Loosely coupled — components are relatively independent. This layered structure keeps concerns separated and modules independent.

How does data flow through databend?

Data moves through 5 stages: SQL Input → Query Planning → Meta Lookup → Data Execution → Result Format. Data flows from external sources through the query engine's planning and execution layers, with metadata managed by the distributed meta layer, and results returned via Arrow format This pipeline design reflects a complex multi-stage processing system.

What technologies does databend use?

The core stack includes Rust (Core runtime and query engine), PyO3 (Python binding generation), Tokio (Async runtime), Apache Arrow (Columnar data format), OpenDAL (Storage abstraction layer), Python (Test infrastructure and bindings), and 1 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does databend have?

databend exhibits 3 data pools (Meta Store, Query Results Cache), 3 feedback loops, 3 control points, 3 delays. The feedback loops handle convergence and retry. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does databend use?

5 design patterns detected: Cloud-Native Architecture, Python Bindings, Workspace Organization, Test Infrastructure, Async Runtime.

Analyzed on March 31, 2026 by CodeSea. Written by .