How dbt Works
dbt flipped the analytics engineering model: instead of extracting data out of the warehouse to transform it, you write transformations as SQL models that run inside the warehouse. The architecture that makes this work is simpler than you might expect.
What dbt-core Does
Data transformation framework that compiles SQL models into warehouse tables using software engineering practices
dbt (data build tool) is a command-line tool that enables data analysts and engineers to transform data using SQL SELECT statements while applying software engineering best practices like version control, testing, and documentation. It compiles SQL models into tables and views in data warehouses, manages dependencies between models, and provides testing and documentation capabilities.
Architecture Overview
dbt-core is organized into 4 layers, with 10 components and 6 connections between them.
How Data Flows Through dbt-core
dbt processes SQL models through parsing, compilation, and execution phases, managing dependencies and generating artifacts
1Parse Resources
Scan project files and parse SQL models, tests, and configuration into resource objects
2Build Manifest
Create dependency graph and validate resource relationships and configurations
3Compile SQL
Transform Jinja templates and refs into executable SQL with dependency resolution
4Execute Commands
Run compiled SQL against data warehouse based on command type (run, test, seed, etc.)
5Generate Artifacts
Produce manifest.json, catalog.json, and run results for downstream consumption
System Dynamics
Beyond the pipeline, dbt-core has runtime behaviors that shape how it responds to load, failures, and configuration changes.
Data Pools
Resource Manifest
Cached parsed representation of all project resources and their dependencies
Type: in-memory
Artifact Files
Serialized JSON artifacts (manifest.json, catalog.json, run_results.json) for external consumption
Type: file-store
Control Points
materialized
enabled
pydantic_major
Delays
Compilation Cache
Duration: session-based
Technology Choices
dbt-core is built with 7 key technologies. Each serves a specific role in the system.
Key Components
- dbtRunner (class): Programmatic interface for invoking dbt commands with manifest caching and callbacks
- BaseResource (class): Base class defining common fields for all dbt resources (models, tests, etc.)
- NodeType (type-def): Enum defining all supported dbt resource types (Model, Test, Seed, etc.)
- NodeConfig (class): Configuration schema for SQL nodes with materialization, hooks, and warehouse settings
- CompiledResource (class): Base class for resources that get compiled to SQL with dependency tracking
- Analysis (class): Resource type for analytical SQL queries that don't create warehouse objects
- Exposure (class): Resource representing downstream uses of dbt models like dashboards or ML models
- Function (class): Resource for user-defined functions deployed to the data warehouse
- GenericTest (class): Parameterizable test templates that can be applied to multiple models/columns
- IncompatibleSchemaError (class): Exception raised when artifact schema versions are incompatible between dbt versions
Who Should Read This
Analytics engineers building transformation pipelines, or data teams evaluating dbt for their warehouse.
This analysis was generated by CodeSea from the dbt-labs/dbt-core source code. For the full interactive visualization — including pipeline graph, architecture diagram, and system behavior map — see the complete analysis.
Explore Further
Full Analysis
Interactive architecture map for dbt-core
dbt-core vs prefect
Side-by-side architecture comparison
How Apache Airflow Works
Data Pipelines
How Prefect Works
Data Pipelines
Frequently Asked Questions
What is dbt-core?
Data transformation framework that compiles SQL models into warehouse tables using software engineering practices
How does dbt-core's pipeline work?
dbt-core processes data through 5 stages: Parse Resources, Build Manifest, Compile SQL, Execute Commands, Generate Artifacts. dbt processes SQL models through parsing, compilation, and execution phases, managing dependencies and generating artifacts
What tech stack does dbt-core use?
dbt-core is built with Click (CLI framework and command parsing), Pydantic (Data validation and settings management), Mashumaro (JSON schema generation and serialization), dbt_common (Shared contracts and utilities across dbt packages), dbt_semantic_interfaces (Semantic layer type definitions and interfaces), and 2 more technologies.
How does dbt-core handle errors and scaling?
dbt-core uses 3 control points, 2 data pools to manage its runtime behavior. These mechanisms handle error recovery, load distribution, and configuration changes.
How does dbt-core compare to prefect?
CodeSea has detailed side-by-side architecture comparisons of dbt-core with prefect. These cover tech stack differences, pipeline design, and system behavior.