dbt-labs/dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Transforms SQL analytics code into database models using dependency graphs and configuration
dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.
Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.
A 7-component cli tool. 636 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging.
- Parse CLI arguments and load environment — Click framework processes command-line arguments, loads .env files via load_dotenv(), and creates dbt context with manifest and callbacks
- Parse project resources — Scans dbt project directory for SQL models, tests, and configurations, creates BaseResource objects with FileHash checksums for change detection
- Build dependency graph — Analyzes ref() and source() calls in SQL models to construct parent_map and child_map in Manifest, determining execution order [BaseResource → Manifest]
- Compile Jinja templates — Processes SQL files through Jinja engine, resolves ref() calls to actual table names, applies NodeConfig settings, creates CompiledResource with executable SQL [ParsedResource → CompiledResource] (config: materialized, pre_hook, post_hook +1)
- Execute transformations — Sends compiled SQL to database via adapter interface in dependency order, executes pre/post hooks, handles different materialization strategies (view, table, incremental) [CompiledResource → RunExecutionResult] (config: materialized, batch_size, incremental_strategy)
- Generate artifacts — Writes manifest.json, run_results.json, and catalog.json files containing execution metadata, model lineage, and table schemas for documentation [RunExecutionResult → Artifact files] (config: persist_docs)
Data Models
The data structures that flow between stages — the contracts that hold the system together.
core/dbt/artifacts/resources/base.pydataclass with name: str, resource_type: NodeType, package_name: str, path: str, original_file_path: str, unique_id: str
Created during parsing from SQL files, extended into specific resource types, and stored in manifest for graph construction
core/dbt/artifacts/resources/v1/components.pydataclass extending ParsedResource with alias: str, relation_name: str, compiled_code: str, extra_ctes_injected: bool, extra_ctes: List[InjectedCTE], _event_status: Dict[str, Any]
Created by compiling ParsedResource with Jinja templating, contains executable SQL, used for database execution
core/dbt/artifacts/resources/v1/config.pydataclass with materialized: str, incremental_strategy: str, batch_size: Any, persist_docs: Dict, pre_hook/post_hook: List[Hook], tags: List[str], meta: Dict[str, Any]
Parsed from dbt_project.yml and model configs, merged with defaults, applied during compilation and execution
core/dbt/contracts/graph/manifest.pycontains nodes: Dict[str, Union[ModelNode, TestNode, ...]], sources: Dict[str, SourceDefinition], macros: Dict[str, Macro], child_map/parent_map: Dict[str, List[str]]
Built from all parsed resources, contains complete dependency graph, persisted to manifest.json, loaded for subsequent runs
core/dbt/artifacts/schemas/run.pycontains results: List[RunResult], elapsed_time: float, generated_at: datetime, args: Dict[str, Any]
Created during transformation execution, aggregates individual node results, returned to user and written to run_results.json
core/dbt/cli/main.pydataclass with success: bool, exception: Optional[BaseException], result: Union[bool, CatalogArtifact, List[str], Manifest, RunExecutionResult]
Returned by dbtRunner.invoke() for programmatic access, wraps execution results with success status and exceptions
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Environment variables INPUT_PACKAGE_NAME, INPUT_NEW_VERSION, INPUT_GITHUB_TOKEN, and GITHUB_OUTPUT are always present and non-empty strings
If this fails: KeyError crashes when any required environment variable is missing, or empty string passed to GitHub API causing authentication failures
.github/actions/latest-wrangler/main.py:main
GitHub API at https://api.github.com/orgs/dbt-labs/packages/container/{package_name}/versions is always available and returns valid JSON
If this fails: requests.exceptions.ConnectionError or JSONDecodeError crashes the process when GitHub API is down or returns non-JSON response
.github/actions/latest-wrangler/main.py:_package_metadata
GitHub API response.json() returns list of objects where each has metadata.container.tags as a list of strings
If this fails: KeyError or TypeError when API response structure changes, causing silent failures in version comparison logic
.github/actions/latest-wrangler/main.py:_published_versions
published_versions list is never empty when checking max(published_versions) and max(published_patches)
If this fails: ValueError: max() arg is an empty sequence crashes when no published versions exist for a new package
.github/actions/latest-wrangler/main.py:_new_version_tags
GITHUB_OUTPUT file path exists and is writable, and process has permissions to append to it
If this fails: FileNotFoundError or PermissionError when GitHub Actions runner environment doesn't provide writable output file
.github/actions/latest-wrangler/main.py:_register_tags
.env file format is valid and usecwd=True finds the correct working directory from user context
If this fails: load_dotenv silently fails or loads wrong .env file when called from subprocess with different working directory
core/dbt/cli/main.py:load_dotenv
kwargs keys match valid Click parameter names for the target command, and values are correct types
If this fails: Click parameter validation errors or silent parameter ignoring when kwargs contain invalid parameter names or wrong types
core/dbt/cli/main.py:dbtRunner.invoke
callbacks list contains callable objects that accept EventMsg parameter and don't raise exceptions
If this fails: TypeError or unhandled exceptions when callback functions have wrong signature or raise during event processing
core/dbt/cli/main.py:dbtRunner
path parameter is a valid file system path string suitable for use as a checksum
If this fails: Using path as checksum creates false cache hits when different files have same path string, breaking incremental parsing
core/dbt/artifacts/resources/base.py:FileHash.path
FileHash comparison is only used between FileHash instances with same hash algorithm (name field)
If this fails: False equality when comparing FileHashes with different algorithms but same checksum value, causing incorrect cache invalidation
core/dbt/artifacts/resources/base.py:FileHash.__eq__
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Serialized dependency graph and resource metadata persisted between runs for incremental parsing
Execution results and timing data from last run for debugging and CI/CD integration
In-memory cache of parsed resource objects indexed by unique_id for graph traversal
Feedback Loops
- Incremental parsing loop (cache-invalidation, balancing) — Trigger: File content hash change detected. Action: Re-parse only changed files and rebuild affected graph sections. Exit: All file hashes match cached versions.
- Dependency resolution loop (recursive, reinforcing) — Trigger: Unresolved ref() or source() call encountered. Action: Traverse parent_map to find referenced resource, compile dependencies first. Exit: All references resolved or circular dependency detected.
Delays
- Jinja compilation (compilation, ~100ms-5s per model) — Templates must be fully resolved before SQL execution can begin
- Database execution (async-processing, ~Variable per query) — Transformation execution waits for database response, long-running queries block dependent models
- Manifest serialization (checkpoint-save, ~1-10s for large projects) — Project metadata persisted to disk after parsing for next run's incremental parsing
Control Points
- materialized (architecture-switch) — Controls: How models are created in database (view/table/incremental/ephemeral). Default: view
- incremental_strategy (sampling-strategy) — Controls: Method for updating incremental models (append/merge/delete+insert)
- batch_size (threshold) — Controls: Number of records processed in single incremental batch
- enabled (feature-flag) — Controls: Whether resource is included in execution. Default: True
Technology Stack
Provides CLI framework for command parsing, argument validation, and help generation
Template engine for SQL compilation, ref() resolution, and macro expansion
Data validation and serialization for resource objects with v1/v2 compatibility shim
Environment variable loading from .env files for configuration management
HTTP client for GitHub API access in container tagging workflow
Version parsing and comparison for release management
Key Components
- dbtRunner (orchestrator) — Orchestrates complete dbt execution pipeline from CLI args through parsing, compilation, and execution
core/dbt/cli/main.py - cli (dispatcher) — Click-based command dispatcher that routes user commands to appropriate execution handlers
core/dbt/cli/main.py - BaseResource (factory) — Base factory for creating standardized resource objects from parsed SQL files and configurations
core/dbt/artifacts/resources/base.py - NodeConfig (validator) — Validates and merges configuration from dbt_project.yml, model configs, and defaults using MergeBehavior
core/dbt/artifacts/resources/v1/config.py - CompiledResource (transformer) — Transforms parsed resources by compiling Jinja templates into executable SQL with resolved references
core/dbt/artifacts/resources/v1/components.py - FileHash (validator) — Creates content-based checksums for incremental parsing and change detection between runs
core/dbt/artifacts/resources/base.py - register_adapter (registry) — Registers database adapter implementations for different warehouse types (Postgres, Snowflake, etc.)
core/dbt/cli/main.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare dbt-core
Related Cli Tool Repositories
Frequently Asked Questions
What is dbt-core used for?
Transforms SQL analytics code into database models using dependency graphs and configuration dbt-labs/dbt-core is a 7-component cli tool written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 636 files.
How is dbt-core architected?
dbt-core is organized into 4 architecture layers: CLI Interface, Resource Management, Graph Engine, Execution Layer. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through dbt-core?
Data moves through 6 stages: Parse CLI arguments and load environment → Parse project resources → Build dependency graph → Compile Jinja templates → Execute transformations → .... dbt processes user commands by first parsing all SQL models and configurations into resource objects, building a dependency graph from ref() calls, compiling Jinja templates with resolved references, and executing the resulting SQL in dependency order against the target database. The system maintains incremental parsing through content hashes and generates artifacts for documentation and debugging. This pipeline design reflects a complex multi-stage processing system.
What technologies does dbt-core use?
The core stack includes Click (Provides CLI framework for command parsing, argument validation, and help generation), Jinja2 (Template engine for SQL compilation, ref() resolution, and macro expansion), Pydantic (Data validation and serialization for resource objects with v1/v2 compatibility shim), dotenv (Environment variable loading from .env files for configuration management), requests (HTTP client for GitHub API access in container tagging workflow), packaging (Version parsing and comparison for release management). A focused set of dependencies that keeps the build manageable.
What system dynamics does dbt-core have?
dbt-core exhibits 3 data pools (manifest.json, run_results.json), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle cache-invalidation and recursive. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does dbt-core use?
4 design patterns detected: Adapter Pattern, Template Method, Builder Pattern, Command Pattern.
How does dbt-core compare to alternatives?
CodeSea has side-by-side architecture comparisons of dbt-core with prefect. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on April 19, 2026 by CodeSea. Written by Karolina Sarna.