flyteorg/flyte

Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows. Flyte 2 now available locally: https://github.com/flyteorg/flyte-sdk

6,911 stars Go 10 components 12 connections

Kubernetes-native workflow orchestration platform for ML/data pipelines

Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service.

Under the hood, the system uses 4 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.

Structural Verdict

A 10-component ml training with 12 connections. 2495 files analyzed. Highly interconnected — components depend on each other heavily.

How Data Flows Through the System

Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service.

  1. Workflow Registration — Users register workflow definitions via flytectl or SDK to FlyteAdmin (config: admin.endpoint)
  2. Execution Request — FlyteAdmin receives execution requests and creates workflow execution records (config: admin.insecure)
  3. Workflow Orchestration — FlytePropeller picks up executions and manages Kubernetes resources (config: propeller.create-flyteworkflow-crd)
  4. Task Execution — Individual tasks run in Kubernetes pods using appropriate plugins
  5. Data Tracking — DataCatalog tracks input/output artifacts and provides caching (config: catalog-cache.endpoint, catalog-cache.type)
  6. Schedule Management — Scheduler service manages cron-based workflow executions

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

PostgreSQL Database (database)
Workflow metadata, execution state, user data, and scheduling information
Data Catalog Store (database)
Artifact metadata, lineage information, and cached data references
Kubernetes etcd (state-store)
Workflow execution state, custom resource definitions, and pod specifications
Storage Backends (file-store)
Raw data artifacts, logs, and intermediate workflow outputs

Feedback Loops

Delays & Async Processing

Control Points

Technology Stack

Go (framework)
Primary programming language for all services
Kubernetes (infra)
Container orchestration and workflow execution platform
gRPC (framework)
Inter-service communication protocol
PostgreSQL (database)
Primary database for metadata storage
GORM (library)
ORM for database operations
Protocol Buffers (framework)
API interface definitions and serialization
robfig/cron (library)
Cron scheduling library
Prometheus (infra)
Metrics collection and monitoring
OpenTelemetry (infra)
Distributed tracing and observability
Cobra (library)
CLI framework for flytectl

Key Components

Sub-Modules

FlyteAdmin (independence: medium)
API server and control plane for workflow management, user interface, and metadata storage
FlytePropeller (independence: medium)
Kubernetes controller that executes workflows by managing pods and orchestrating task execution
DataCatalog (independence: medium)
Service for data artifact tracking, caching, and lineage management
flytectl (independence: high)
Command-line interface for interacting with Flyte clusters and managing workflows
FlyteStdLib (independence: low)
Shared utility libraries for storage, configuration, logging, and common functionality
FlyteIDL (independence: low)
Protocol buffer definitions and generated code for API interfaces

Configuration

codecov.yml (yaml)

flyte-single-binary-local.yaml (yaml)

monodocs-environment.lock.yaml (yaml)

monodocs-environment.yaml (yaml)

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is flyte used for?

Kubernetes-native workflow orchestration platform for ML/data pipelines flyteorg/flyte is a 10-component ml training written in Go. Highly interconnected — components depend on each other heavily. The codebase contains 2495 files.

How is flyte architected?

flyte is organized into 5 architecture layers: Control Plane Services, CLI & Client Tools, Plugin System, Shared Libraries, and 1 more. Highly interconnected — components depend on each other heavily. This layered structure enables tight integration between components.

How does data flow through flyte?

Data moves through 6 stages: Workflow Registration → Execution Request → Workflow Orchestration → Task Execution → Data Tracking → .... Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service. This pipeline design reflects a complex multi-stage processing system.

What technologies does flyte use?

The core stack includes Go (Primary programming language for all services), Kubernetes (Container orchestration and workflow execution platform), gRPC (Inter-service communication protocol), PostgreSQL (Primary database for metadata storage), GORM (ORM for database operations), Protocol Buffers (API interface definitions and serialization), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does flyte have?

flyte exhibits 4 data pools (PostgreSQL Database, Data Catalog Store), 4 feedback loops, 5 control points, 4 delays. The feedback loops handle polling and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does flyte use?

6 design patterns detected: Microservice Architecture, Plugin System, Kubernetes Controller Pattern, gRPC with REST Gateway, Repository Pattern, and 1 more.

Analyzed on March 31, 2026 by CodeSea. Written by .