cube-js/cube
📊 Cube Core is open-source semantic layer for AI, BI and embedded analytics
Provides SQL interface and semantic layer for analytics by translating queries across data sources
Client applications send SQL queries or REST requests to the API gateway, which parses them into internal messages. The query orchestrator determines if results can be served from cache or need fresh computation. For fresh queries, the schema compiler generates database-specific SQL from Cube definitions, database drivers execute queries against data sources, and results flow back through caching layers to clients as JSON or SQL result sets.
Under the hood, the system uses 2 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.
A 6-component repository. 2303 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
Client applications send SQL queries or REST requests to the API gateway, which parses them into internal messages. The query orchestrator determines if results can be served from cache or need fresh computation. For fresh queries, the schema compiler generates database-specific SQL from Cube definitions, database drivers execute queries against data sources, and results flow back through caching layers to clients as JSON or SQL result sets.
- Receive client query — API gateway receives HTTP requests with SQL queries, GraphQL queries, or REST API calls, validates authentication, and converts them to internal HttpMessage format using FlatBuffer serialization
- Parse and validate query — QueryMessageParser deserializes the HttpMessage, extracts the SQL query or API parameters, and validates them against available cube schemas and user permissions [HttpMessage → QueryResult]
- Check pre-aggregation cache — Query orchestrator checks if the requested data exists in cubestore pre-aggregations or Redis cache, returning cached results immediately if available and fresh [QueryResult → QueryResult]
- Compile schema to SQL — Schema compiler takes the CubeDefinition for requested measures and dimensions, applies joins and transformations, and generates optimized SQL queries specific to the target database dialect [CubeDefinition → SQL Query String]
- Execute database query — Appropriate database driver (BigQuery, Snowflake, Postgres, etc.) executes the generated SQL against the data source, handling connection pooling and result streaming [SQL Query String → QueryResult]
- Cache and return results — Results are stored in cubestore for future queries, formatted according to the original request type (JSON for REST, tabular for SQL), and returned to the client application [QueryResult → API Response]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
rust/cubeorchestrator/src/query_message_parser.rsstruct with columns: Vec<String>, rows: Vec<Vec<DBResponseValue>>, columns_pos: IndexMap<String, usize> — tabular query results with column metadata and row data
Created by database drivers after query execution, transformed by orchestrator, and serialized for API responses
rust/cubesqlplanner/cubesqlplanner/src/cube_bridge/cube_definition.rsSchema definition containing measures, dimensions, joins, and SQL templates that define how business metrics map to database tables
Loaded from user schema files, compiled by schema-compiler, and used by SQL planner to generate optimized queries
rust/cubeshared/src/codegen/http_message_generated.rsFlatBuffer-serialized message containing HTTP commands and query data for inter-service communication
Created by API gateway from HTTP requests, passed to Rust services via FlatBuffer serialization, and processed by query engine
packages/cubejs-backend-cloud/src/cloud.tsobject with auth: string, url?: string, deploymentId?: string — authentication context for cloud API calls
Created from environment variables or user config, passed to CubeCloudClient for API authentication
packages/cubejs-backend-maven/src/maven.tsobject with groupId: string, artifactId: string, version: string — Java dependency specification for JDBC drivers
Defined in driver configs, resolved by Maven to download JAR files for JDBC database connections
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Maven executable is in system PATH and responds to 'mvn --version' command
If this fails: System tries to spawn 'mvn' process that doesn't exist, returns null without error indication, falls back to downloading Maven even when system version might be adequate
packages/cubejs-backend-maven/src/maven.ts:getSystemMavenVersion
All rows in QueryResult have exactly the same number of columns as specified in columns vector, with column ordering matching columns_pos IndexMap
If this fails: Accessing rows[i][j] could panic with index out of bounds, or return wrong data if column count mismatches between rows
rust/cubeorchestrator/src/query_message_parser.rs:QueryResult
Input data for HyperLogLog sketches fits within the precision parameters configured at sketch creation time
If this fails: Hash collisions increase dramatically if data cardinality exceeds HLL precision bounds, causing severe underestimation of distinct counts in pre-aggregations
rust/cubestore/cubehll/src/lib.rs:HllSketch
Network requests to Cube Cloud complete within default fetch timeout and cloud endpoints remain stable during deployment operations
If this fails: Long-running deployment operations timeout silently, leaving deployments in undefined state with no retry mechanism
packages/cubejs-backend-cloud/src/cloud.ts:request
Cube schema definitions loaded from separate modules maintain consistent data types and naming conventions across cube_definition, dimension_definition, and measure_definition
If this fails: SQL planning generates invalid queries when schema modules use incompatible types or column references, causing runtime query failures
rust/cubesqlplanner/cubesqlplanner/src/cube_bridge/mod.rs
Filesystem has write permissions to create POM files in working directory and sufficient disk space for Maven dependency downloads
If this fails: Maven resolution fails silently when POM file creation fails, JDBC drivers never download, database connections fail with cryptic ClassNotFoundException
packages/cubejs-backend-maven/src/maven.ts:generateXml
CUBE_CLOUD_HOST environment variable contains valid URL with proper protocol, or defaults to https://cubecloud.dev which remains accessible
If this fails: Authentication requests fail with connection errors if environment points to invalid host or default domain becomes unreachable
packages/cubejs-backend-cloud/src/cloud.ts:getDeploymentToken
Maven version checking logic assumes semantic versioning format 'X.Y.Z' in Maven output and version numbers fit in 32-bit integers
If this fails: Version parsing fails for non-standard Maven builds or future versions with different format, causing incorrect version comparisons
packages/cubejs-backend-maven/src/maven.ts:MINIMAL_VERSION
Column positions in IndexMap maintain stable ordering that matches the sequence of columns in the database result set
If this fails: Column data gets mapped to wrong field names when database driver returns columns in different order than expected, corrupting query results
rust/cubeorchestrator/src/query_message_parser.rs:columns_pos
HTTP request bodies contain properly shaped JSON that matches expected interface structure for SQL queries, GraphQL queries, or REST API calls
If this fails: Request parsing fails unpredictably when client sends malformed requests, causing unhelpful error messages instead of specific validation feedback
packages/cubejs-api-gateway/src/index.ts:TransformDataRequest
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Columnar storage engine that accumulates pre-aggregated query results and intermediate computations for fast analytical queries
In-memory store of compiled cube definitions, measures, dimensions, and joins loaded from user schema files
Database connection pools maintained by each driver for efficient query execution against data sources
Feedback Loops
- Pre-aggregation Build Loop (scheduled-job, reinforcing) — Trigger: Query patterns and refresh schedules. Action: Query orchestrator identifies frequently-used query patterns and builds optimized pre-aggregations in cubestore. Exit: Pre-aggregation build completes or fails.
- Query Cache Loop (cache-invalidation, balancing) — Trigger: Cache TTL expiration or data source changes. Action: Cached query results are invalidated and fresh queries trigger cache rebuilds. Exit: New results cached with updated TTL.
Delays
- Schema Compilation (compilation, ~100ms-1s) — First query after schema changes requires compilation before execution
- Pre-aggregation Build (batch-window, ~minutes to hours) — Large pre-aggregations build asynchronously while serving stale data or falling back to source queries
- JDBC Driver Download (warmup, ~10-30s) — First connection to new database requires Maven dependency resolution and JAR download
Control Points
- Database Connection String (env-var) — Controls: Which data source each query executes against
- Pre-aggregation Strategy (runtime-toggle) — Controls: Whether to use pre-aggregations, original queries, or hybrid approach
- Cache TTL (threshold) — Controls: How long query results remain valid in cache before refresh
Technology Stack
Primary language for API gateway, query orchestration, and schema compilation logic
High-performance SQL engine, query planner, and columnar storage implementation
Zero-copy serialization for communication between TypeScript and Rust components
Runtime environment for the main server orchestration and API handling
Monorepo management for coordinating builds and dependencies across 60+ packages
Test framework for validation across all TypeScript packages and integration testing
Key Components
- CubeCloudClient (gateway) — Handles authentication and API communication with Cube Cloud platform for deployment management and live preview features
packages/cubejs-backend-cloud/src/cloud.ts - QueryMessageParser (parser) — Parses FlatBuffer-encoded HTTP messages from the API gateway and converts them into structured QueryResult objects for processing
rust/cubeorchestrator/src/query_message_parser.rs - CubeBridge (adapter) — Bridges between SQL query planning and Cube's semantic model, providing access to cube definitions, joins, and measure calculations
rust/cubesqlplanner/cubesqlplanner/src/cube_bridge/mod.rs - HLLDataSketch (processor) — Implements HyperLogLog algorithms for approximate distinct count calculations in pre-aggregations, optimizing memory usage for large datasets
rust/cubestore/cubehll/src/lib.rs - ApiGateway (gateway) — Exposes REST, GraphQL, and WebSocket endpoints for client applications, handling request parsing, authentication, and response formatting
packages/cubejs-api-gateway/src/index.ts - MavenResolver (resolver) — Downloads and manages JDBC driver dependencies by generating Maven POM files and executing dependency resolution
packages/cubejs-backend-maven/src/maven.ts
Package Structure
SQL query engine that parses and executes SQL queries against Cube's semantic layer, translating them to backend database queries.
Query planner that optimizes SQL queries by analyzing Cube schemas and generating efficient execution plans.
High-performance columnar storage engine optimized for OLAP workloads and pre-aggregation caching.
Core server runtime that orchestrates query execution, schema compilation, and caching across all data sources.
HTTP gateway that exposes REST, GraphQL, and WebSocket APIs for client applications to query the semantic layer.
Manages query execution lifecycle including caching, pre-aggregation, and coordination between multiple database drivers.
Compiles Cube schema definitions (measures, dimensions, joins) into executable SQL queries for each target database.
Collection of 25+ database drivers that adapt Cube's query interface to specific database backends (BigQuery, Snowflake, Postgres, etc.).
Frontend SDKs for React, Vue, Angular that provide components and hooks for building analytics dashboards.
Cloud deployment and management APIs for Cube Cloud platform integration.
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Repository Repositories
Frequently Asked Questions
What is cube used for?
Provides SQL interface and semantic layer for analytics by translating queries across data sources cube-js/cube is a 6-component repository written in Rust. Data flows through 6 distinct pipeline stages. The codebase contains 2303 files.
How is cube architected?
cube is organized into 6 architecture layers: SQL Interface, Query Orchestration, Schema Layer, Data Access, and 2 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through cube?
Data moves through 6 stages: Receive client query → Parse and validate query → Check pre-aggregation cache → Compile schema to SQL → Execute database query → .... Client applications send SQL queries or REST requests to the API gateway, which parses them into internal messages. The query orchestrator determines if results can be served from cache or need fresh computation. For fresh queries, the schema compiler generates database-specific SQL from Cube definitions, database drivers execute queries against data sources, and results flow back through caching layers to clients as JSON or SQL result sets. This pipeline design reflects a complex multi-stage processing system.
What technologies does cube use?
The core stack includes TypeScript (Primary language for API gateway, query orchestration, and schema compilation logic), Rust (High-performance SQL engine, query planner, and columnar storage implementation), FlatBuffers (Zero-copy serialization for communication between TypeScript and Rust components), Node.js (Runtime environment for the main server orchestration and API handling), Lerna (Monorepo management for coordinating builds and dependencies across 60+ packages), Jest (Test framework for validation across all TypeScript packages and integration testing). A focused set of dependencies that keeps the build manageable.
What system dynamics does cube have?
cube exhibits 3 data pools (CubeStore, Schema Registry), 2 feedback loops, 3 control points, 3 delays. The feedback loops handle scheduled-job and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does cube use?
4 design patterns detected: Multi-Language Bridge, Driver Abstraction, Semantic Schema Compilation, Layered Caching.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.