airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Orchestrates data movement from APIs, databases and files to warehouses and lakes
Data flows from CLI arguments through configuration parsing and validation into connector operations. The ConnectorCommandLinePropertySource transforms CLI args into Micronaut properties, factories create validated configuration and catalog objects, and the AirbyteConnectorRunnable executes the specific operation (check, discover, read, write) using the configured instances. State flows in for incremental syncs and flows out as checkpoints during execution.
Under the hood, the system uses 2 feedback loops, 2 data pools, 3 control points to manage its runtime behavior.
A 8-component data pipeline. 5156 files analyzed. Data flows through 5 distinct pipeline stages.
How Data Flows Through the System
Data flows from CLI arguments through configuration parsing and validation into connector operations. The ConnectorCommandLinePropertySource transforms CLI args into Micronaut properties, factories create validated configuration and catalog objects, and the AirbyteConnectorRunnable executes the specific operation (check, discover, read, write) using the configured instances. State flows in for incremental syncs and flows out as checkpoints during execution.
- Parse CLI arguments into Micronaut properties — ConnectorCommandLinePropertySource extracts operation type and file paths from CLI args and converts to configuration properties like airbyte.connector.config.json
- Load and validate connector configuration — ConfigurationSpecificationSupplier parses JSON config, validates against JSON schema, and creates typed configuration POJOs
- Parse and validate configured catalog — ConfiguredCatalogFactory parses catalog JSON, validates stream names and sync modes, creates ConfiguredAirbyteCatalog with stream configurations
- Parse input state for incremental syncs — InputStateFactory transforms AirbyteStateMessage list into typed InputState (Empty, Global, or Stream-based) for resuming incremental operations [AirbyteStateMessage → InputState]
- Execute connector operation — AirbyteConnectorRunnable gets Operation instance from Micronaut, executes with error handling, and outputs protocol messages via OutputConsumer [ConfigurationSpecification]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConfiguredCatalogFactory.ktProtocol object containing list of ConfiguredAirbyteStream with stream metadata, sync modes, and destination sync mode configurations
Parsed from JSON input during connector initialization, validated for required fields, passed to sync operations
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/InputState.ktSealed interface with EmptyInputState, GlobalInputState(global: JsonNode, globalStreams: Map<StreamIdentifier, JsonNode>), or StreamInputState(streams: Map<StreamIdentifier, JsonNode>)
Restored from persistent storage at sync start, updated during data extraction, persisted for next incremental run
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/StreamIdentifier.ktData class with namespace: String? and name: String identifying a data stream uniquely across namespaces
Created from AirbyteStream or StreamDescriptor objects, used as map keys throughout sync operations
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConfigurationSpecificationSupplier.ktGeneric connector configuration POJO with connector-specific fields like host, credentials, batch sizes - serialized to/from JSON with JSON Schema validation
Loaded from JSON or Micronaut properties, validated against JSON schema, injected into connector components
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/InputStateFactory.ktProtocol message with type field and state data for global or per-stream incremental sync checkpointing
Parsed from JSON state input, transformed to InputState sealed types, emitted during sync execution
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
metadata.yaml resource exists and is readable at startup time in classpath root
If this fails: ResourceNotFoundException causing immediate connector failure with no graceful degradation
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/MetadataYamlPropertySource.kt:loadFromResource
micronautPropertiesFallback object is serializable to valid JSON and matches the same schema as jsonPropertyValue
If this fails: ConfigErrorException with misleading error message when fallback serialization fails or schema mismatch causes validation errors
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConfigurationSpecificationSupplier.kt:get
CommandLine.optionValue() returns non-null for options that were actually provided on command line
If this fails: Silent null values passed to downstream components or incorrect operation selection when CLI parsing is inconsistent
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConnectorCommandLinePropertySource.kt:resolveValues
operationProvider.get() is idempotent and safe to call multiple times during error recovery
If this fails: Resource leaks or initialization side effects if operation creation is retried after partial failure
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/AirbyteConnectorRunnable.kt:run
System.getenv() and Instant.now() are available and functional when creating offset clock in test environment
If this fails: Clock factory fails to initialize in containerized or restricted environments where system time access is limited
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/ClockFactory.kt:offset
AirbyteStateMessage objects with type=null are always safe to filter out and represent empty/initial state
If this fails: Loss of valid state data if null type actually represents a legitimate state format from older protocol versions
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/InputStateFactory.kt:make
Operation.execute() handles its own error reporting and never throws exceptions that should be propagated to users differently than ConfigErrorException
If this fails: Important connector-specific errors get wrapped in generic failure messages, losing debugging context
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/AirbyteConnectorRunnable.kt:run
CLI operations are mutually exclusive and exactly one operation will be specified per invocation
If this fails: Undefined behavior if multiple operations somehow get through validation, potentially executing wrong operation
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConnectorCommandLinePropertySource.kt:resolveValues
Epoch timestamp 3133641600 (year 2069) is always safe as a future test timestamp that won't conflict with real data
If this fails: Test failures or incorrect behavior in systems that validate timestamps against reasonable ranges or perform date arithmetic
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/ClockFactory.kt:fakeNow
window.analytics Segment SDK is injected by Cloudflare and available when vote tracking executes
If this fails: Silent tracking failure with no user feedback when analytics is unavailable, potentially losing valuable user feedback data
docusaurus/src/theme/TOCItems/index.js:onVote
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Stores connector identity, version, and capability information loaded at startup
Dependency injection container holding configured instances of connectors, operations, and supporting services
Feedback Loops
- Configuration validation loop (retry, balancing) — Trigger: Invalid configuration JSON or schema validation failure. Action: ConfigErrorException thrown with validation details. Exit: Valid configuration parsed or connector exits with error.
- State checkpoint emission loop (checkpoint-save, reinforcing) — Trigger: Periodic intervals during data sync operations. Action: Emit AirbyteStateMessage with current sync progress. Exit: Sync completes or fails.
Delays
- Micronaut context startup (warmup) — Dependency injection container initialization before connector operations can execute
- Configuration parsing and validation (compilation) — JSON parsing and schema validation must complete before connector can start
Control Points
- Connector operation type (env-var) — Controls: Which operation (check, discover, read, write) the connector executes. Default: airbyte.connector.operation
- Feature flags via environment (feature-flag) — Controls: Enables cloud deployment mode and other feature variations. Default: DEPLOYMENT_MODE=CLOUD
- Test vs production clock (runtime-toggle) — Controls: Uses fixed test timestamp or system time for deterministic testing. Default: Micronaut test environment
Technology Stack
Primary language for connector framework and implementation with null safety and functional constructs
Dependency injection and application framework providing property binding and environment-based configuration
Build automation with multi-project support for CDK, connectors, and documentation components
JSON parsing and serialization for protocol messages and configuration with schema validation
Documentation site generation with React components for connector specifications and tutorials
Command line argument parsing integrated with Micronaut for connector operation selection
Key Components
- AirbyteConnectorRunner (orchestrator) — Bootstraps Micronaut application context and executes connector operations based on CLI commands
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/AirbyteConnectorRunner.kt - AirbyteConnectorRunnable (executor) — Executes the specific connector operation with error handling and output management
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/AirbyteConnectorRunnable.kt - ConnectorCommandLinePropertySource (adapter) — Transforms CLI arguments into Micronaut configuration properties for dependency injection
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConnectorCommandLinePropertySource.kt - ConfigurationSpecificationSupplier (factory) — Supplies validated connector configuration instances from JSON or fallback properties with JSON Schema validation
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConfigurationSpecificationSupplier.kt - ConfiguredCatalogFactory (factory) — Creates and validates ConfiguredAirbyteCatalog from JSON input with stream validation
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/ConfiguredCatalogFactory.kt - InputStateFactory (transformer) — Transforms AirbyteStateMessage list into typed InputState sealed interface instances
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/InputStateFactory.kt - MetadataYamlPropertySource (loader) — Loads connector metadata.yaml as Micronaut properties for connector identity and versioning
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/command/MetadataYamlPropertySource.kt - ClockFactory (factory) — Provides system or test clock instances based on Micronaut environment for deterministic testing
airbyte-cdk/bulk/core/base/src/main/kotlin/io/airbyte/cdk/ClockFactory.kt
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare airbyte
Related Data Pipeline Repositories
Frequently Asked Questions
What is airbyte used for?
Orchestrates data movement from APIs, databases and files to warehouses and lakes airbytehq/airbyte is a 8-component data pipeline written in Python. Data flows through 5 distinct pipeline stages. The codebase contains 5156 files.
How is airbyte architected?
airbyte is organized into 3 architecture layers: Connector Development Kit, Connector Implementations, Documentation Platform. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through airbyte?
Data moves through 5 stages: Parse CLI arguments into Micronaut properties → Load and validate connector configuration → Parse and validate configured catalog → Parse input state for incremental syncs → Execute connector operation. Data flows from CLI arguments through configuration parsing and validation into connector operations. The ConnectorCommandLinePropertySource transforms CLI args into Micronaut properties, factories create validated configuration and catalog objects, and the AirbyteConnectorRunnable executes the specific operation (check, discover, read, write) using the configured instances. State flows in for incremental syncs and flows out as checkpoints during execution. This pipeline design reflects a complex multi-stage processing system.
What technologies does airbyte use?
The core stack includes Kotlin/Java (Primary language for connector framework and implementation with null safety and functional constructs), Micronaut (Dependency injection and application framework providing property binding and environment-based configuration), Gradle (Build automation with multi-project support for CDK, connectors, and documentation components), Jackson (JSON parsing and serialization for protocol messages and configuration with schema validation), Docusaurus (Documentation site generation with React components for connector specifications and tutorials), PicoCLI (Command line argument parsing integrated with Micronaut for connector operation selection). A focused set of dependencies that keeps the build manageable.
What system dynamics does airbyte have?
airbyte exhibits 2 data pools (Connector metadata registry, Micronaut application context), 2 feedback loops, 3 control points, 2 delays. The feedback loops handle retry and checkpoint-save. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does airbyte use?
4 design patterns detected: Micronaut Dependency Injection Factory Pattern, Sealed Interface State Machine, Protocol Message Transformation Pipeline, Exception-based Error Classification.
How does airbyte compare to alternatives?
CodeSea has side-by-side architecture comparisons of airbyte with meltano. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on April 19, 2026 by CodeSea. Written by Karolina Sarna.