How dbt Works
dbt flipped the analytics engineering model: instead of extracting data out of the warehouse to transform it, you write transformations as SQL models that run inside the warehouse. The architecture that makes this work is simpler than you might expect.
What dbt-core Does
Transforms SQL analytics code into database models using dependency graphs and configuration
dbt-core is a data transformation framework that enables analysts to write SQL select statements and automatically manages their execution as database models. It builds dependency graphs from model references, executes transformations in proper order, and provides testing, documentation, and deployment capabilities through a command-line interface.
Architecture Overview
dbt-core is organized into 4 layers, with 7 components and 0 connections between them.
How Data Flows Through dbt-core
dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.
1Parse CLI arguments and load environment
Click framework processes command-line arguments, loads .env files via load_dotenv(), and creates dbt context with manifest and callbacks
2Parse project resources
Scans dbt project directory for SQL models, tests, and configurations, creates BaseResource objects with FileHash checksums for change detection
3Build dependency graph
Analyzes ref() and source() calls in SQL models to construct parent_map and child_map in Manifest, determining execution order
4Compile Jinja templates
Processes SQL files through Jinja engine, resolves ref() calls to actual table names, applies NodeConfig settings, creates CompiledResource with executable SQL
Config: materialized, pre_hook, post_hook
5Execute transformations
Sends compiled SQL to database via adapter interface in dependency order, executes pre/post hooks, handles different materialization strategies (view, table, incremental)
Config: materialized, batch_size, incremental_strategy
6Generate artifacts
Writes manifest.json, run_results.json, and catalog.json files containing execution metadata, model lineage, and table schemas for documentation
Config: persist_docs
System Dynamics
Beyond the pipeline, dbt-core has runtime behaviors that shape how it responds to load, failures, and configuration changes.
Data Pools
manifest.json
Serialized dependency graph and resource metadata persisted between runs for incremental parsing
Type: file-store
run_results.json
Execution results and timing data from last run for debugging and CI/CD integration
Type: file-store
Resource registry
In-memory cache of parsed resource objects indexed by unique_id for graph traversal
Type: in-memory
Feedback Loops
Incremental parsing loop
Trigger: File content hash change detected → Re-parse only changed files and rebuild affected graph sections (exits when: All file hashes match cached versions)
Type: cache-invalidation
Dependency resolution loop
Trigger: Unresolved ref() or source() call encountered → Traverse parent_map to find referenced resource, compile dependencies first (exits when: All references resolved or circular dependency detected)
Type: recursive
Control Points
materialized
incremental_strategy
batch_size
enabled
Delays
Jinja compilation
Duration: 100ms-5s per model
Database execution
Duration: Variable per query
Manifest serialization
Duration: 1-10s for large projects
Technology Choices
dbt-core is built with 6 key technologies. Each serves a specific role in the system.
Key Components
- dbtRunner (orchestrator): Orchestrates complete dbt execution pipeline from CLI args through parsing, compilation, and execution
- cli (dispatcher): Click-based command dispatcher that routes user commands to appropriate execution handlers
- BaseResource (factory): Base factory for creating standardized resource objects from parsed SQL files and configurations
- NodeConfig (validator): Validates and merges configuration from dbt_project.yml, model configs, and defaults using MergeBehavior
- CompiledResource (transformer): Transforms parsed resources by compiling Jinja templates into executable SQL with resolved references
- FileHash (validator): Creates content-based checksums for incremental parsing and change detection between runs
- register_adapter (registry): Registers database adapter implementations for different warehouse types (Postgres, Snowflake, etc.)
Who Should Read This
Analytics engineers building transformation pipelines, or data teams evaluating dbt for their warehouse.
This analysis was generated by CodeSea from the dbt-labs/dbt-core source code. For the full interactive visualization — including pipeline graph, architecture diagram, and system behavior map — see the complete analysis.
Explore Further
Full Analysis
Interactive architecture map for dbt-core
dbt-core vs prefect
Side-by-side architecture comparison
How Apache Airflow Works
Data Pipelines
How Prefect Works
Data Pipelines
Frequently Asked Questions
What is dbt-core?
Transforms SQL analytics code into database models using dependency graphs and configuration
How does dbt-core's pipeline work?
dbt-core processes data through 6 stages: Parse CLI arguments and load environment, Parse project resources, Build dependency graph, Compile Jinja templates, Execute transformations, and more. dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.
What tech stack does dbt-core use?
dbt-core is built with Click (Provides CLI framework for command parsing, argument validation, and help generation), Jinja2 (Template engine for SQL compilation, ref() resolution, and macro expansion), Pydantic (Data validation and serialization for resource objects with v1/v2 compatibility shim), dotenv (Environment variable loading from .env files for configuration management), requests (HTTP client for GitHub API access in container tagging workflow), and 1 more technologies.
How does dbt-core handle errors and scaling?
dbt-core uses 2 feedback loops, 4 control points, 3 data pools to manage its runtime behavior. These mechanisms handle error recovery, load distribution, and configuration changes.
How does dbt-core compare to prefect?
CodeSea has detailed side-by-side architecture comparisons of dbt-core with prefect. These cover tech stack differences, pipeline design, and system behavior.