flyteorg/flyte

Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows. Flyte 2 now available locally: https://github.com/flyteorg/flyte-sdk

6,911 stars Go 10 components 12 connections

Kubernetes-native workflow orchestration platform for ML/data pipelines

Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service.

Under the hood, the system uses 4 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.

A 10-component ml training with 12 connections. 2495 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service.

  1. Workflow Registration — Users register workflow definitions via flytectl or SDK to FlyteAdmin (config: admin.endpoint)
  2. Execution Request — FlyteAdmin receives execution requests and creates workflow execution records (config: admin.insecure)
  3. Workflow Orchestration — FlytePropeller picks up executions and manages Kubernetes resources (config: propeller.create-flyteworkflow-crd)
  4. Task Execution — Individual tasks run in Kubernetes pods using appropriate plugins
  5. Data Tracking — DataCatalog tracks input/output artifacts and provides caching (config: catalog-cache.endpoint, catalog-cache.type)
  6. Schedule Management — Scheduler service manages cron-based workflow executions

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

PostgreSQL Database (database)
Workflow metadata, execution state, user data, and scheduling information
Data Catalog Store (database)
Artifact metadata, lineage information, and cached data references
Kubernetes etcd (state-store)
Workflow execution state, custom resource definitions, and pod specifications
Storage Backends (file-store)
Raw data artifacts, logs, and intermediate workflow outputs

Feedback Loops

Delays

Control Points

Technology Stack

Go (framework)
Primary programming language for all services
Kubernetes (infra)
Container orchestration and workflow execution platform
gRPC (framework)
Inter-service communication protocol
PostgreSQL (database)
Primary database for metadata storage
GORM (library)
ORM for database operations
Protocol Buffers (framework)
API interface definitions and serialization
robfig/cron (library)
Cron scheduling library
Prometheus (infra)
Metrics collection and monitoring
OpenTelemetry (infra)
Distributed tracing and observability
Cobra (library)
CLI framework for flytectl

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is flyte used for?

Kubernetes-native workflow orchestration platform for ML/data pipelines flyteorg/flyte is a 10-component ml training written in Go. Data flows through 6 distinct pipeline stages. The codebase contains 2495 files.

How is flyte architected?

flyte is organized into 5 architecture layers: Control Plane Services, CLI & Client Tools, Plugin System, Shared Libraries, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure enables tight integration between components.

How does data flow through flyte?

Data moves through 6 stages: Workflow Registration → Execution Request → Workflow Orchestration → Task Execution → Data Tracking → .... Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service. This pipeline design reflects a complex multi-stage processing system.

What technologies does flyte use?

The core stack includes Go (Primary programming language for all services), Kubernetes (Container orchestration and workflow execution platform), gRPC (Inter-service communication protocol), PostgreSQL (Primary database for metadata storage), GORM (ORM for database operations), Protocol Buffers (API interface definitions and serialization), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does flyte have?

flyte exhibits 4 data pools (PostgreSQL Database, Data Catalog Store), 4 feedback loops, 5 control points, 4 delays. The feedback loops handle polling and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does flyte use?

6 design patterns detected: Microservice Architecture, Plugin System, Kubernetes Controller Pattern, gRPC with REST Gateway, Repository Pattern, and 1 more.

Analyzed on March 31, 2026 by CodeSea. Written by .