How dbt Works

dbt flipped the analytics engineering model: instead of extracting data out of the warehouse to transform it, you write transformations as SQL models that run inside the warehouse. The architecture that makes this work is simpler than you might expect.

12,646 stars Python 7 components 6-stage pipeline

What dbt-core Does

Transforms SQL analytics code into database models using dependency graphs and configuration

dbt-core is a data transformation framework that enables analysts to write SQL select statements and automatically manages their execution as database models. It builds dependency graphs from model references, executes transformations in proper order, and provides testing, documentation, and deployment capabilities through a command-line interface.

Architecture Overview

dbt-core is organized into 4 layers, with 7 components and 0 connections between them.

CLI Interface

Processes user commands and orchestrates the entire transformation pipeline

Resource Management

Parses and validates SQL models, tests, and configurations into standardized resource objects

Graph Engine

Builds dependency graphs from model references and determines execution order

Execution Layer

Compiles SQL with Jinja templating and executes against databases through adapters

How Data Flows Through dbt-core

dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.

1Parse CLI arguments and load environment

Click framework processes command-line arguments, loads .env files via load_dotenv(), and creates dbt context with manifest and callbacks

2Parse project resources

Scans dbt project directory for SQL models, tests, and configurations, creates BaseResource objects with FileHash checksums for change detection

3Build dependency graph

Analyzes ref() and source() calls in SQL models to construct parent_map and child_map in Manifest, determining execution order

4Compile Jinja templates

Processes SQL files through Jinja engine, resolves ref() calls to actual table names, applies NodeConfig settings, creates CompiledResource with executable SQL

Config: materialized, pre_hook, post_hook

5Execute transformations

Sends compiled SQL to database via adapter interface in dependency order, executes pre/post hooks, handles different materialization strategies (view, table, incremental)

Config: materialized, batch_size, incremental_strategy

6Generate artifacts

Writes manifest.json, run_results.json, and catalog.json files containing execution metadata, model lineage, and table schemas for documentation

Config: persist_docs

System Dynamics

Beyond the pipeline, dbt-core has runtime behaviors that shape how it responds to load, failures, and configuration changes.

Data Pools

Pool

manifest.json

Serialized dependency graph and resource metadata persisted between runs for incremental parsing

Type: file-store

Pool

run_results.json

Execution results and timing data from last run for debugging and CI/CD integration

Type: file-store

Pool

Resource registry

In-memory cache of parsed resource objects indexed by unique_id for graph traversal

Type: in-memory

Feedback Loops

Loop

Incremental parsing loop

Trigger: File content hash change detected → Re-parse only changed files and rebuild affected graph sections (exits when: All file hashes match cached versions)

Type: cache-invalidation

Loop

Dependency resolution loop

Trigger: Unresolved ref() or source() call encountered → Traverse parent_map to find referenced resource, compile dependencies first (exits when: All references resolved or circular dependency detected)

Type: recursive

Control Points

Control

materialized

Control

incremental_strategy

Control

batch_size

Control

enabled

Delays

Delay

Jinja compilation

Duration: 100ms-5s per model

Delay

Database execution

Duration: Variable per query

Delay

Manifest serialization

Duration: 1-10s for large projects

Technology Choices

dbt-core is built with 6 key technologies. Each serves a specific role in the system.

Click

Provides CLI framework for command parsing, argument validation, and help generation

Jinja2

Template engine for SQL compilation, ref() resolution, and macro expansion

Pydantic

Data validation and serialization for resource objects with v1/v2 compatibility shim

dotenv

Environment variable loading from .env files for configuration management

requests

HTTP client for GitHub API access in container tagging workflow

packaging

Version parsing and comparison for release management

Key Components

dbtRunner (orchestrator): Orchestrates complete dbt execution pipeline from CLI args through parsing, compilation, and execution
cli (dispatcher): Click-based command dispatcher that routes user commands to appropriate execution handlers
BaseResource (factory): Base factory for creating standardized resource objects from parsed SQL files and configurations
NodeConfig (validator): Validates and merges configuration from dbt_project.yml, model configs, and defaults using MergeBehavior
CompiledResource (transformer): Transforms parsed resources by compiling Jinja templates into executable SQL with resolved references
FileHash (validator): Creates content-based checksums for incremental parsing and change detection between runs
register_adapter (registry): Registers database adapter implementations for different warehouse types (Postgres, Snowflake, etc.)

Who Should Read This

Analytics engineers building transformation pipelines, or data teams evaluating dbt for their warehouse.

This analysis was generated by CodeSea from the dbt-labs/dbt-core source code. For the full interactive visualization — including pipeline graph, architecture diagram, and system behavior map — see the complete analysis.

Explore Further

Full Analysis

Interactive architecture map for dbt-core

dbt-core vs prefect

Side-by-side architecture comparison

How Apache Airflow Works

Data Pipelines

How Prefect Works

Data Pipelines

Frequently Asked Questions

What is dbt-core?

Transforms SQL analytics code into database models using dependency graphs and configuration

How does dbt-core's pipeline work?

dbt-core processes data through 6 stages: Parse CLI arguments and load environment, Parse project resources, Build dependency graph, Compile Jinja templates, Execute transformations, and more. dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.

What tech stack does dbt-core use?

dbt-core is built with Click (Provides CLI framework for command parsing, argument validation, and help generation), Jinja2 (Template engine for SQL compilation, ref() resolution, and macro expansion), Pydantic (Data validation and serialization for resource objects with v1/v2 compatibility shim), dotenv (Environment variable loading from .env files for configuration management), requests (HTTP client for GitHub API access in container tagging workflow), and 1 more technologies.

How does dbt-core handle errors and scaling?

dbt-core uses 2 feedback loops, 4 control points, 3 data pools to manage its runtime behavior. These mechanisms handle error recovery, load distribution, and configuration changes.

How does dbt-core compare to prefect?

CodeSea has detailed side-by-side architecture comparisons of dbt-core with prefect. These cover tech stack differences, pipeline design, and system behavior.

How dbt Works

What dbt-core Does

Architecture Overview

How Data Flows Through dbt-core

1Parse CLI arguments and load environment

2Parse project resources

3Build dependency graph

4Compile Jinja templates

5Execute transformations

6Generate artifacts

System Dynamics

Data Pools

manifest.json

run_results.json

Resource registry

Feedback Loops

Incremental parsing loop

Dependency resolution loop

Control Points

materialized

incremental_strategy

batch_size

enabled

Delays

Jinja compilation

Database execution

Manifest serialization

Technology Choices

Key Components

Who Should Read This

Explore Further

Full Analysis

dbt-core vs prefect

How Apache Airflow Works

How Prefect Works

Frequently Asked Questions

Visualize dbt-core yourself