How dbt Works

dbt flipped the analytics engineering model: instead of extracting data out of the warehouse to transform it, you write transformations as SQL models that run inside the warehouse. The architecture that makes this work is simpler than you might expect.

12,646 stars Python 7 components 6-stage pipeline

What dbt-core Does

Transforms SQL analytics code into database models using dependency graphs and configuration

dbt-core is a data transformation framework that enables analysts to write SQL select statements and automatically manages their execution as database models. It builds dependency graphs from model references, executes transformations in proper order, and provides testing, documentation, and deployment capabilities through a command-line interface.

Architecture Overview

dbt-core is organized into 4 layers, with 7 components and 0 connections between them.

CLI Interface
Processes user commands and orchestrates the entire transformation pipeline
Resource Management
Parses and validates SQL models, tests, and configurations into standardized resource objects
Graph Engine
Builds dependency graphs from model references and determines execution order
Execution Layer
Compiles SQL with Jinja templating and executes against databases through adapters

How Data Flows Through dbt-core

dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.

1Parse CLI arguments and load environment

Click framework processes command-line arguments, loads .env files via load_dotenv(), and creates dbt context with manifest and callbacks

2Parse project resources

Scans dbt project directory for SQL models, tests, and configurations, creates BaseResource objects with FileHash checksums for change detection

3Build dependency graph

Analyzes ref() and source() calls in SQL models to construct parent_map and child_map in Manifest, determining execution order

4Compile Jinja templates

Processes SQL files through Jinja engine, resolves ref() calls to actual table names, applies NodeConfig settings, creates CompiledResource with executable SQL

Config: materialized, pre_hook, post_hook

5Execute transformations

Sends compiled SQL to database via adapter interface in dependency order, executes pre/post hooks, handles different materialization strategies (view, table, incremental)

Config: materialized, batch_size, incremental_strategy

6Generate artifacts

Writes manifest.json, run_results.json, and catalog.json files containing execution metadata, model lineage, and table schemas for documentation

Config: persist_docs

System Dynamics

Beyond the pipeline, dbt-core has runtime behaviors that shape how it responds to load, failures, and configuration changes.

Data Pools

Pool

manifest.json

Serialized dependency graph and resource metadata persisted between runs for incremental parsing

Type: file-store

Pool

run_results.json

Execution results and timing data from last run for debugging and CI/CD integration

Type: file-store

Pool

Resource registry

In-memory cache of parsed resource objects indexed by unique_id for graph traversal

Type: in-memory

Feedback Loops

Loop

Incremental parsing loop

Trigger: File content hash change detected → Re-parse only changed files and rebuild affected graph sections (exits when: All file hashes match cached versions)

Type: cache-invalidation

Loop

Dependency resolution loop

Trigger: Unresolved ref() or source() call encountered → Traverse parent_map to find referenced resource, compile dependencies first (exits when: All references resolved or circular dependency detected)

Type: recursive

Control Points

Control

materialized

Control

incremental_strategy

Control

batch_size

Control

enabled

Delays

Delay

Jinja compilation

Duration: 100ms-5s per model

Delay

Database execution

Duration: Variable per query

Delay

Manifest serialization

Duration: 1-10s for large projects

Technology Choices

dbt-core is built with 6 key technologies. Each serves a specific role in the system.

Click
Provides CLI framework for command parsing, argument validation, and help generation
Jinja2
Template engine for SQL compilation, ref() resolution, and macro expansion
Pydantic
Data validation and serialization for resource objects with v1/v2 compatibility shim
dotenv
Environment variable loading from .env files for configuration management
requests
HTTP client for GitHub API access in container tagging workflow
packaging
Version parsing and comparison for release management

Key Components

Who Should Read This

Analytics engineers building transformation pipelines, or data teams evaluating dbt for their warehouse.

This analysis was generated by CodeSea from the dbt-labs/dbt-core source code. For the full interactive visualization — including pipeline graph, architecture diagram, and system behavior map — see the complete analysis.

Explore Further

Frequently Asked Questions

What is dbt-core?

Transforms SQL analytics code into database models using dependency graphs and configuration

How does dbt-core's pipeline work?

dbt-core processes data through 6 stages: Parse CLI arguments and load environment, Parse project resources, Build dependency graph, Compile Jinja templates, Execute transformations, and more. dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.

What tech stack does dbt-core use?

dbt-core is built with Click (Provides CLI framework for command parsing, argument validation, and help generation), Jinja2 (Template engine for SQL compilation, ref() resolution, and macro expansion), Pydantic (Data validation and serialization for resource objects with v1/v2 compatibility shim), dotenv (Environment variable loading from .env files for configuration management), requests (HTTP client for GitHub API access in container tagging workflow), and 1 more technologies.

How does dbt-core handle errors and scaling?

dbt-core uses 2 feedback loops, 4 control points, 3 data pools to manage its runtime behavior. These mechanisms handle error recovery, load distribution, and configuration changes.

How does dbt-core compare to prefect?

CodeSea has detailed side-by-side architecture comparisons of dbt-core with prefect. These cover tech stack differences, pipeline design, and system behavior.

Visualize dbt-core yourself

See the interactive pipeline graph, architecture diagram, and system behavior map.

See Full Analysis