cube-js/cube

📊 Cube Core is open-source semantic layer for AI, BI and embedded analytics

19,817 stars Rust 6 components

Provides SQL interface and semantic layer for analytics by translating queries across data sources

Client applications send SQL queries or REST requests to the API gateway, which parses them into internal messages. The query orchestrator determines if results can be served from cache or need fresh computation. For fresh queries, the schema compiler generates database-specific SQL from Cube definitions, database drivers execute queries against data sources, and results flow back through caching layers to clients as JSON or SQL result sets.

Under the hood, the system uses 2 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.

A 6-component repository. 2303 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Receive client query — API gateway receives HTTP requests with SQL queries, GraphQL queries, or REST API calls, validates authentication, and converts them to internal HttpMessage format using FlatBuffer serialization
Parse and validate query — QueryMessageParser deserializes the HttpMessage, extracts the SQL query or API parameters, and validates them against available cube schemas and user permissions [HttpMessage → QueryResult]
Check pre-aggregation cache — Query orchestrator checks if the requested data exists in cubestore pre-aggregations or Redis cache, returning cached results immediately if available and fresh [QueryResult → QueryResult]
Compile schema to SQL — Schema compiler takes the CubeDefinition for requested measures and dimensions, applies joins and transformations, and generates optimized SQL queries specific to the target database dialect [CubeDefinition → SQL Query String]
Execute database query — Appropriate database driver (BigQuery, Snowflake, Postgres, etc.) executes the generated SQL against the data source, handling connection pooling and result streaming [SQL Query String → QueryResult]
Cache and return results — Results are stored in cubestore for future queries, formatted according to the original request type (JSON for REST, tabular for SQL), and returned to the client application [QueryResult → API Response]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

QueryResult rust/cubeorchestrator/src/query_message_parser.rs
struct with columns: Vec<String>, rows: Vec<Vec<DBResponseValue>>, columns_pos: IndexMap<String, usize> — tabular query results with column metadata and row data
Created by database drivers after query execution, transformed by orchestrator, and serialized for API responses

CubeDefinition rust/cubesqlplanner/cubesqlplanner/src/cube_bridge/cube_definition.rs
Schema definition containing measures, dimensions, joins, and SQL templates that define how business metrics map to database tables
Loaded from user schema files, compiled by schema-compiler, and used by SQL planner to generate optimized queries

HttpMessage rust/cubeshared/src/codegen/http_message_generated.rs
FlatBuffer-serialized message containing HTTP commands and query data for inter-service communication
Created by API gateway from HTTP requests, passed to Rust services via FlatBuffer serialization, and processed by query engine

AuthObject packages/cubejs-backend-cloud/src/cloud.ts
object with auth: string, url?: string, deploymentId?: string — authentication context for cloud API calls
Created from environment variables or user config, passed to CubeCloudClient for API authentication

MavenDependency packages/cubejs-backend-maven/src/maven.ts
object with groupId: string, artifactId: string, version: string — Java dependency specification for JDBC drivers
Defined in driver configs, resolved by Maven to download JAR files for JDBC database connections

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment weakly guarded

Maven executable is in system PATH and responds to 'mvn --version' command

If this fails: System tries to spawn 'mvn' process that doesn't exist, returns null without error indication, falls back to downloading Maven even when system version might be adequate

packages/cubejs-backend-maven/src/maven.ts:getSystemMavenVersion

critical Shape unguarded

All rows in QueryResult have exactly the same number of columns as specified in columns vector, with column ordering matching columns_pos IndexMap

If this fails: Accessing rows[i][j] could panic with index out of bounds, or return wrong data if column count mismatches between rows

rust/cubeorchestrator/src/query_message_parser.rs:QueryResult

critical Domain unguarded

Input data for HyperLogLog sketches fits within the precision parameters configured at sketch creation time

If this fails: Hash collisions increase dramatically if data cardinality exceeds HLL precision bounds, causing severe underestimation of distinct counts in pre-aggregations

rust/cubestore/cubehll/src/lib.rs:HllSketch

critical Temporal unguarded

Network requests to Cube Cloud complete within default fetch timeout and cloud endpoints remain stable during deployment operations

If this fails: Long-running deployment operations timeout silently, leaving deployments in undefined state with no retry mechanism

packages/cubejs-backend-cloud/src/cloud.ts:request

critical Contract unguarded

Cube schema definitions loaded from separate modules maintain consistent data types and naming conventions across cube_definition, dimension_definition, and measure_definition

If this fails: SQL planning generates invalid queries when schema modules use incompatible types or column references, causing runtime query failures

rust/cubesqlplanner/cubesqlplanner/src/cube_bridge/mod.rs

warning Resource unguarded

Filesystem has write permissions to create POM files in working directory and sufficient disk space for Maven dependency downloads

If this fails: Maven resolution fails silently when POM file creation fails, JDBC drivers never download, database connections fail with cryptic ClassNotFoundException

packages/cubejs-backend-maven/src/maven.ts:generateXml

warning Environment weakly guarded

CUBE_CLOUD_HOST environment variable contains valid URL with proper protocol, or defaults to https://cubecloud.dev which remains accessible

If this fails: Authentication requests fail with connection errors if environment points to invalid host or default domain becomes unreachable

packages/cubejs-backend-cloud/src/cloud.ts:getDeploymentToken

warning Scale weakly guarded

Maven version checking logic assumes semantic versioning format 'X.Y.Z' in Maven output and version numbers fit in 32-bit integers

If this fails: Version parsing fails for non-standard Maven builds or future versions with different format, causing incorrect version comparisons

packages/cubejs-backend-maven/src/maven.ts:MINIMAL_VERSION

warning Ordering unguarded

Column positions in IndexMap maintain stable ordering that matches the sequence of columns in the database result set

If this fails: Column data gets mapped to wrong field names when database driver returns columns in different order than expected, corrupting query results

rust/cubeorchestrator/src/query_message_parser.rs:columns_pos

warning Shape unguarded

HTTP request bodies contain properly shaped JSON that matches expected interface structure for SQL queries, GraphQL queries, or REST API calls

If this fails: Request parsing fails unpredictably when client sends malformed requests, causing unhelpful error messages instead of specific validation feedback

packages/cubejs-api-gateway/src/index.ts:TransformDataRequest

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

CubeStore (cache)
Columnar storage engine that accumulates pre-aggregated query results and intermediate computations for fast analytical queries

Schema Registry (registry)
In-memory store of compiled cube definitions, measures, dimensions, and joins loaded from user schema files

Connection Pool (buffer)
Database connection pools maintained by each driver for efficient query execution against data sources

Feedback Loops

Pre-aggregation Build Loop (scheduled-job, reinforcing) — Trigger: Query patterns and refresh schedules. Action: Query orchestrator identifies frequently-used query patterns and builds optimized pre-aggregations in cubestore. Exit: Pre-aggregation build completes or fails.
Query Cache Loop (cache-invalidation, balancing) — Trigger: Cache TTL expiration or data source changes. Action: Cached query results are invalidated and fresh queries trigger cache rebuilds. Exit: New results cached with updated TTL.

Delays

Schema Compilation (compilation, ~100ms-1s) — First query after schema changes requires compilation before execution
Pre-aggregation Build (batch-window, ~minutes to hours) — Large pre-aggregations build asynchronously while serving stale data or falling back to source queries
JDBC Driver Download (warmup, ~10-30s) — First connection to new database requires Maven dependency resolution and JAR download

Control Points

Database Connection String (env-var) — Controls: Which data source each query executes against
Pre-aggregation Strategy (runtime-toggle) — Controls: Whether to use pre-aggregations, original queries, or hybrid approach
Cache TTL (threshold) — Controls: How long query results remain valid in cache before refresh

Technology Stack

TypeScript (runtime)
Primary language for API gateway, query orchestration, and schema compilation logic

Rust (runtime)
High-performance SQL engine, query planner, and columnar storage implementation

FlatBuffers (serialization)
Zero-copy serialization for communication between TypeScript and Rust components

Node.js (runtime)
Runtime environment for the main server orchestration and API handling

Lerna (build)
Monorepo management for coordinating builds and dependencies across 60+ packages

Jest (testing)
Test framework for validation across all TypeScript packages and integration testing

Key Components

CubeCloudClient (gateway) — Handles authentication and API communication with Cube Cloud platform for deployment management and live preview features packages/cubejs-backend-cloud/src/cloud.ts
QueryMessageParser (parser) — Parses FlatBuffer-encoded HTTP messages from the API gateway and converts them into structured QueryResult objects for processing rust/cubeorchestrator/src/query_message_parser.rs
CubeBridge (adapter) — Bridges between SQL query planning and Cube's semantic model, providing access to cube definitions, joins, and measure calculations rust/cubesqlplanner/cubesqlplanner/src/cube_bridge/mod.rs
HLLDataSketch (processor) — Implements HyperLogLog algorithms for approximate distinct count calculations in pre-aggregations, optimizing memory usage for large datasets rust/cubestore/cubehll/src/lib.rs
ApiGateway (gateway) — Exposes REST, GraphQL, and WebSocket endpoints for client applications, handling request parsing, authentication, and response formatting packages/cubejs-api-gateway/src/index.ts
MavenResolver (resolver) — Downloads and manages JDBC driver dependencies by generating Maven POM files and executing dependency resolution packages/cubejs-backend-maven/src/maven.ts

Package Structure

cubesql (app)
SQL query engine that parses and executes SQL queries against Cube's semantic layer, translating them to backend database queries.

cubesqlplanner (library)
Query planner that optimizes SQL queries by analyzing Cube schemas and generating efficient execution plans.

cubestore (app)
High-performance columnar storage engine optimized for OLAP workloads and pre-aggregation caching.

server-core (app)
Core server runtime that orchestrates query execution, schema compilation, and caching across all data sources.

api-gateway (library)
HTTP gateway that exposes REST, GraphQL, and WebSocket APIs for client applications to query the semantic layer.

query-orchestrator (library)
Manages query execution lifecycle including caching, pre-aggregation, and coordination between multiple database drivers.

schema-compiler (library)
Compiles Cube schema definitions (measures, dimensions, joins) into executable SQL queries for each target database.

database-drivers (library)
Collection of 25+ database drivers that adapt Cube's query interface to specific database backends (BigQuery, Snowflake, Postgres, etc.).

client-libraries (library)
Frontend SDKs for React, Vue, Angular that provide components and hooks for building analytics dashboards.

cloud (library)
Cloud deployment and management APIs for Cube Cloud platform integration.

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Repository Repositories

Frequently Asked Questions

What is cube used for?

Provides SQL interface and semantic layer for analytics by translating queries across data sources cube-js/cube is a 6-component repository written in Rust. Data flows through 6 distinct pipeline stages. The codebase contains 2303 files.

How is cube architected?

cube is organized into 6 architecture layers: SQL Interface, Query Orchestration, Schema Layer, Data Access, and 2 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through cube?

Data moves through 6 stages: Receive client query → Parse and validate query → Check pre-aggregation cache → Compile schema to SQL → Execute database query → .... Client applications send SQL queries or REST requests to the API gateway, which parses them into internal messages. The query orchestrator determines if results can be served from cache or need fresh computation. For fresh queries, the schema compiler generates database-specific SQL from Cube definitions, database drivers execute queries against data sources, and results flow back through caching layers to clients as JSON or SQL result sets. This pipeline design reflects a complex multi-stage processing system.

What technologies does cube use?

The core stack includes TypeScript (Primary language for API gateway, query orchestration, and schema compilation logic), Rust (High-performance SQL engine, query planner, and columnar storage implementation), FlatBuffers (Zero-copy serialization for communication between TypeScript and Rust components), Node.js (Runtime environment for the main server orchestration and API handling), Lerna (Monorepo management for coordinating builds and dependencies across 60+ packages), Jest (Test framework for validation across all TypeScript packages and integration testing). A focused set of dependencies that keeps the build manageable.

What system dynamics does cube have?

cube exhibits 3 data pools (CubeStore, Schema Registry), 2 feedback loops, 3 control points, 3 delays. The feedback loops handle scheduled-job and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does cube use?

4 design patterns detected: Multi-Language Bridge, Driver Abstraction, Semantic Schema Compilation, Layered Caching.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.