ray-project/ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

41,901 stars Python 10 components 4 connections

Distributed AI compute engine with runtime and ML libraries

Ray processes distributed workloads through task graphs and actor systems, with data flowing from user code through the scheduler to workers and back via the object store.

Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

Structural Verdict

A 10-component ml training with 4 connections. 7035 files analyzed. Loosely coupled — components are relatively independent.

How Data Flows Through the System

Ray processes distributed workloads through task graphs and actor systems, with data flowing from user code through the scheduler to workers and back via the object store.

  1. Submit Task/Actor — User submits remote functions or creates actors via Python API
  2. Schedule Work — GCS and Raylet schedule tasks to available worker nodes
  3. Execute on Workers — CoreWorker processes execute tasks and store results in ObjectStore
  4. Return Results — Object references returned to caller, data retrieved on-demand from ObjectStore

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

ObjectStore (in-memory)
Distributed shared memory storing Ray task results and large objects
GCS (database)
Global control store maintaining cluster metadata, job state, and actor registry
Dashboard State (cache)
Frontend caching layer for cluster metrics and status information

Feedback Loops

Delays & Async Processing

Control Points

Package Structure

This monorepo contains 2 packages:

python (app)
Main Ray Python package containing the core runtime, dashboard, and all AI libraries (Serve, Train, Tune, RLlib, Data).
release (tooling)
Release testing infrastructure with benchmarks, integration tests, and example applications for validating Ray deployments.

Technology Stack

C++ (framework)
Core distributed runtime implementation
Python (framework)
Primary user API and library implementations
React (framework)
Dashboard web interface
gRPC (library)
Inter-service communication
Plasma (database)
Shared memory object store
PyTorch (library)
Deep learning integration
Bazel (build)
Build system
TypeScript (framework)
Dashboard frontend

Key Components

Sub-Modules

RLlib (independence: medium)
Reinforcement learning library with algorithms, environments, and training infrastructure
Ray Serve (independence: medium)
Model serving framework for ML inference with autoscaling and deployment management
Ray Data (independence: medium)
Distributed data processing engine for ML preprocessing and ETL workloads
Ray Tune (independence: medium)
Hyperparameter tuning library with search algorithms and schedulers

Configuration

semgrep.yml (yaml)

ray-images.json (json)

ci/ray_ci/doc/api.py (python-dataclass)

ci/raydepsets/workspace.py (python-dataclass)

Science Pipeline

  1. Data Ingestion — ray.data.read_* functions load from various sources (S3, files, databases) [Variable depending on source format → Ray Dataset with inferred schema] python/ray/data/datasource/
  2. Data Transformation — map_batches applies user functions with configurable batch sizes [(batch_size, *feature_dims) → Transformed batch maintaining batch dimension] python/ray/data/dataset.py
  3. Model Training — Distributed training across multiple workers with gradient aggregation [(batch_size, *input_dims) → Updated model parameters] python/ray/train/trainer.py
  4. Model Serving — Deploy trained models with autoscaling based on request load [HTTP request body or batch of inputs → Model predictions as HTTP response] python/ray/serve/deployment.py

Assumptions & Constraints

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is ray used for?

Distributed AI compute engine with runtime and ML libraries ray-project/ray is a 10-component ml training written in Python. Loosely coupled — components are relatively independent. The codebase contains 7035 files.

How is ray architected?

ray is organized into 4 architecture layers: Core Runtime, Python APIs, Dashboard, Testing Infrastructure. Loosely coupled — components are relatively independent. This layered structure keeps concerns separated and modules independent.

How does data flow through ray?

Data moves through 4 stages: Submit Task/Actor → Schedule Work → Execute on Workers → Return Results. Ray processes distributed workloads through task graphs and actor systems, with data flowing from user code through the scheduler to workers and back via the object store. This pipeline design keeps the data transformation process straightforward.

What technologies does ray use?

The core stack includes C++ (Core distributed runtime implementation), Python (Primary user API and library implementations), React (Dashboard web interface), gRPC (Inter-service communication), Plasma (Shared memory object store), PyTorch (Deep learning integration), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does ray have?

ray exhibits 3 data pools (ObjectStore, GCS), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle auto-scale and retry. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does ray use?

5 design patterns detected: Actor Model, Lazy Evaluation, Handle Pattern, Plugin Architecture, Event-Driven State Management.

Analyzed on March 31, 2026 by CodeSea. Written by .