dbt-labs/dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

12,646 stars Python 7 components

Transforms SQL analytics code into database models using dependency graphs and configuration

dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

A 7-component cli tool. 636 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Parse CLI arguments and load environment — Click framework processes command-line arguments, loads .env files via load_dotenv(), and creates dbt context with manifest and callbacks
Parse project resources — Scans dbt project directory for SQL models, tests, and configurations, creates BaseResource objects with FileHash checksums for change detection
Build dependency graph — Analyzes ref() and source() calls in SQL models to construct parent_map and child_map in Manifest, determining execution order [BaseResource → Manifest]
Compile Jinja templates — Processes SQL files through Jinja engine, resolves ref() calls to actual table names, applies NodeConfig settings, creates CompiledResource with executable SQL [ParsedResource → CompiledResource] (config: materialized, pre_hook, post_hook +1)
Execute transformations — Sends compiled SQL to database via adapter interface in dependency order, executes pre/post hooks, handles different materialization strategies (view, table, incremental) [CompiledResource → RunExecutionResult] (config: materialized, batch_size, incremental_strategy)
Generate artifacts — Writes manifest.json, run_results.json, and catalog.json files containing execution metadata, model lineage, and table schemas for documentation [RunExecutionResult → Artifact files] (config: persist_docs)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

BaseResource core/dbt/artifacts/resources/base.py
dataclass with name: str, resource_type: NodeType, package_name: str, path: str, original_file_path: str, unique_id: str
Created during parsing from SQL files, extended into specific resource types, and stored in manifest for graph construction

CompiledResource core/dbt/artifacts/resources/v1/components.py
dataclass extending ParsedResource with alias: str, relation_name: str, compiled_code: str, extra_ctes_injected: bool, extra_ctes: List[InjectedCTE], _event_status: Dict[str, Any]
Created by compiling ParsedResource with Jinja templating, contains executable SQL, used for database execution

NodeConfig core/dbt/artifacts/resources/v1/config.py
dataclass with materialized: str, incremental_strategy: str, batch_size: Any, persist_docs: Dict, pre_hook/post_hook: List[Hook], tags: List[str], meta: Dict[str, Any]
Parsed from dbt_project.yml and model configs, merged with defaults, applied during compilation and execution

Manifest core/dbt/contracts/graph/manifest.py
contains nodes: Dict[str, Union[ModelNode, TestNode, ...]], sources: Dict[str, SourceDefinition], macros: Dict[str, Macro], child_map/parent_map: Dict[str, List[str]]
Built from all parsed resources, contains complete dependency graph, persisted to manifest.json, loaded for subsequent runs

RunExecutionResult core/dbt/artifacts/schemas/run.py
contains results: List[RunResult], elapsed_time: float, generated_at: datetime, args: Dict[str, Any]
Created during transformation execution, aggregates individual node results, returned to user and written to run_results.json

dbtRunnerResult core/dbt/cli/main.py
dataclass with success: bool, exception: Optional[BaseException], result: Union[bool, CatalogArtifact, List[str], Manifest, RunExecutionResult]
Returned by dbtRunner.invoke() for programmatic access, wraps execution results with success status and exceptions

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

Environment variables INPUT_PACKAGE_NAME, INPUT_NEW_VERSION, INPUT_GITHUB_TOKEN, and GITHUB_OUTPUT are always present and non-empty strings

If this fails: KeyError crashes when any required environment variable is missing, or empty string passed to GitHub API causing authentication failures

.github/actions/latest-wrangler/main.py:main

critical Environment unguarded

GitHub API at https://api.github.com/orgs/dbt-labs/packages/container/{package_name}/versions is always available and returns valid JSON

If this fails: requests.exceptions.ConnectionError or JSONDecodeError crashes the process when GitHub API is down or returns non-JSON response

.github/actions/latest-wrangler/main.py:_package_metadata

critical Shape unguarded

GitHub API response.json() returns list of objects where each has metadata.container.tags as a list of strings

If this fails: KeyError or TypeError when API response structure changes, causing silent failures in version comparison logic

.github/actions/latest-wrangler/main.py:_published_versions

critical Domain unguarded

published_versions list is never empty when checking max(published_versions) and max(published_patches)

If this fails: ValueError: max() arg is an empty sequence crashes when no published versions exist for a new package

.github/actions/latest-wrangler/main.py:_new_version_tags

critical Resource unguarded

GITHUB_OUTPUT file path exists and is writable, and process has permissions to append to it

If this fails: FileNotFoundError or PermissionError when GitHub Actions runner environment doesn't provide writable output file

.github/actions/latest-wrangler/main.py:_register_tags

warning Environment weakly guarded

.env file format is valid and usecwd=True finds the correct working directory from user context

If this fails: load_dotenv silently fails or loads wrong .env file when called from subprocess with different working directory

core/dbt/cli/main.py:load_dotenv

warning Shape unguarded

kwargs keys match valid Click parameter names for the target command, and values are correct types

If this fails: Click parameter validation errors or silent parameter ignoring when kwargs contain invalid parameter names or wrong types

core/dbt/cli/main.py:dbtRunner.invoke

warning Contract weakly guarded

callbacks list contains callable objects that accept EventMsg parameter and don't raise exceptions

If this fails: TypeError or unhandled exceptions when callback functions have wrong signature or raise during event processing

core/dbt/cli/main.py:dbtRunner

critical Domain unguarded

path parameter is a valid file system path string suitable for use as a checksum

If this fails: Using path as checksum creates false cache hits when different files have same path string, breaking incremental parsing

core/dbt/artifacts/resources/base.py:FileHash.path

warning Shape weakly guarded

FileHash comparison is only used between FileHash instances with same hash algorithm (name field)

If this fails: False equality when comparing FileHashes with different algorithms but same checksum value, causing incorrect cache invalidation

core/dbt/artifacts/resources/base.py:FileHash.__eq__

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

manifest.json (file-store)
Serialized dependency graph and resource metadata persisted between runs for incremental parsing

run_results.json (file-store)
Execution results and timing data from last run for debugging and CI/CD integration

Resource registry (in-memory)
In-memory cache of parsed resource objects indexed by unique_id for graph traversal

Feedback Loops

Incremental parsing loop (cache-invalidation, balancing) — Trigger: File content hash change detected. Action: Re-parse only changed files and rebuild affected graph sections. Exit: All file hashes match cached versions.
Dependency resolution loop (recursive, reinforcing) — Trigger: Unresolved ref() or source() call encountered. Action: Traverse parent_map to find referenced resource, compile dependencies first. Exit: All references resolved or circular dependency detected.

Delays

Jinja compilation (compilation, ~100ms-5s per model) — Templates must be fully resolved before SQL execution can begin
Database execution (async-processing, ~Variable per query) — Transformation execution waits for database response, long-running queries block dependent models
Manifest serialization (checkpoint-save, ~1-10s for large projects) — Project metadata persisted to disk after parsing for next run's incremental parsing

Control Points

materialized (architecture-switch) — Controls: How models are created in database (view/table/incremental/ephemeral). Default: view
incremental_strategy (sampling-strategy) — Controls: Method for updating incremental models (append/merge/delete+insert)
batch_size (threshold) — Controls: Number of records processed in single incremental batch
enabled (feature-flag) — Controls: Whether resource is included in execution. Default: True

Technology Stack

Click (framework)
Provides CLI framework for command parsing, argument validation, and help generation

Jinja2 (library)
Template engine for SQL compilation, ref() resolution, and macro expansion

Pydantic (library)
Data validation and serialization for resource objects with v1/v2 compatibility shim

dotenv (library)
Environment variable loading from .env files for configuration management

requests (library)
HTTP client for GitHub API access in container tagging workflow

packaging (library)
Version parsing and comparison for release management

Key Components

dbtRunner (orchestrator) — Orchestrates complete dbt execution pipeline from CLI args through parsing, compilation, and execution core/dbt/cli/main.py
cli (dispatcher) — Click-based command dispatcher that routes user commands to appropriate execution handlers core/dbt/cli/main.py
BaseResource (factory) — Base factory for creating standardized resource objects from parsed SQL files and configurations core/dbt/artifacts/resources/base.py
NodeConfig (validator) — Validates and merges configuration from dbt_project.yml, model configs, and defaults using MergeBehavior core/dbt/artifacts/resources/v1/config.py
CompiledResource (transformer) — Transforms parsed resources by compiling Jinja templates into executable SQL with resolved references core/dbt/artifacts/resources/v1/components.py
FileHash (validator) — Creates content-based checksums for incremental parsing and change detection between runs core/dbt/artifacts/resources/base.py
register_adapter (registry) — Registers database adapter implementations for different warehouse types (Postgres, Snowflake, etc.) core/dbt/cli/main.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare dbt-core

Related Cli Tool Repositories

Frequently Asked Questions

What is dbt-core used for?

Transforms SQL analytics code into database models using dependency graphs and configuration dbt-labs/dbt-core is a 7-component cli tool written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 636 files.

How is dbt-core architected?

dbt-core is organized into 4 architecture layers: CLI Interface, Resource Management, Graph Engine, Execution Layer. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through dbt-core?

Data moves through 6 stages: Parse CLI arguments and load environment → Parse project resources → Build dependency graph → Compile Jinja templates → Execute transformations → .... dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging. This pipeline design reflects a complex multi-stage processing system.

What technologies does dbt-core use?

The core stack includes Click (Provides CLI framework for command parsing, argument validation, and help generation), Jinja2 (Template engine for SQL compilation, ref() resolution, and macro expansion), Pydantic (Data validation and serialization for resource objects with v1/v2 compatibility shim), dotenv (Environment variable loading from .env files for configuration management), requests (HTTP client for GitHub API access in container tagging workflow), packaging (Version parsing and comparison for release management). A focused set of dependencies that keeps the build manageable.

What system dynamics does dbt-core have?

dbt-core exhibits 3 data pools (manifest.json, run_results.json), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle cache-invalidation and recursive. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does dbt-core use?

4 design patterns detected: Adapter Pattern, Template Method, Builder Pattern, Command Pattern.

How does dbt-core compare to alternatives?

CodeSea has side-by-side architecture comparisons of dbt-core with prefect. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 19, 2026 by CodeSea. Written by Karolina Sarna.

dbt-labs/dbt-core

How Data Flows Through the System

Data Models

Hidden Assumptions

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Compare dbt-core

dbt-core vs Prefect

Related Cli Tool Repositories

redis/redis

crewaiinc/crewai

dandavison/delta

statelyai/xstate

chroma-core/chroma

kestra-io/kestra

Frequently Asked Questions