flyteorg/flyte

Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows. Flyte 2 now available locally: https://github.com/flyteorg/flyte-sdk

6,911 stars Go 10 components 12 connections

Kubernetes-native workflow orchestration platform for ML/data pipelines

Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service.

Under the hood, the system uses 4 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.

A 10-component ml training with 12 connections. 2495 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service.

Workflow Registration — Users register workflow definitions via flytectl or SDK to FlyteAdmin (config: admin.endpoint)
Execution Request — FlyteAdmin receives execution requests and creates workflow execution records (config: admin.insecure)
Workflow Orchestration — FlytePropeller picks up executions and manages Kubernetes resources (config: propeller.create-flyteworkflow-crd)
Task Execution — Individual tasks run in Kubernetes pods using appropriate plugins
Data Tracking — DataCatalog tracks input/output artifacts and provides caching (config: catalog-cache.endpoint, catalog-cache.type)
Schedule Management — Scheduler service manages cron-based workflow executions

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

PostgreSQL Database (database)
Workflow metadata, execution state, user data, and scheduling information

Data Catalog Store (database)
Artifact metadata, lineage information, and cached data references

Kubernetes etcd (state-store)
Workflow execution state, custom resource definitions, and pod specifications

Storage Backends (file-store)
Raw data artifacts, logs, and intermediate workflow outputs

Feedback Loops

Workflow Status Updates (polling, balancing) — Trigger: FlytePropeller controller reconcile loop. Action: Updates workflow execution status in FlyteAdmin. Exit: Workflow completion or failure.
Schedule Updates (polling, balancing) — Trigger: Updater.UpdateGoCronSchedules periodic execution. Action: Fetches latest schedules from DB and updates scheduler. Exit: Service shutdown.
Scheduler Catchup (retry, balancing) — Trigger: Missed scheduled executions detected. Action: Attempts to execute missed workflows up to catchup time. Exit: All missed executions processed or rate limit hit.
Snapshot Recovery (circuit-breaker, balancing) — Trigger: Scheduler restart or failure. Action: Restores schedule state from last snapshot. Exit: Successful state restoration.

Delays

Schedule Snapshots (scheduled-job, ~configurable interval) — Periodic saving of scheduler state for crash recovery
Workflow Reconciliation (eventual-consistency, ~controller resync period) — Delay between workflow state changes and status updates
Rate Limited Execution (rate-limit) — Throttles concurrent workflow executions to prevent resource overload
Database Connection Pools (queue-drain) — Limits concurrent database operations and can cause request queuing

Control Points

Admin Endpoint (env-var) — Controls: FlyteAdmin API server endpoint configuration. Default: localhost:30080
Insecure Mode (env-var) — Controls: Whether to run services without TLS encryption. Default: true
CRD Creation (feature-flag) — Controls: Whether FlytePropeller creates FlyteWorkflow custom resource definitions. Default: true
Rate Limiter (runtime-toggle) — Controls: Throttling of concurrent scheduled workflow executions
Log Level (env-var) — Controls: Verbosity of service logging output. Default: 5

Technology Stack

Go (framework)
Primary programming language for all services

Kubernetes (infra)
Container orchestration and workflow execution platform

gRPC (framework)
Inter-service communication protocol

PostgreSQL (database)
Primary database for metadata storage

GORM (library)
ORM for database operations

Protocol Buffers (framework)
API interface definitions and serialization

robfig/cron (library)
Cron scheduling library

Prometheus (infra)
Metrics collection and monitoring

OpenTelemetry (infra)
Distributed tracing and observability

Cobra (library)
CLI framework for flytectl

Key Components

FlyteAdmin (service) — Main API service handling workflow registration, execution requests, and metadata management flyteadmin/
FlytePropeller (service) — Kubernetes controller that executes workflows by managing pods and resources flytepropeller/
DataCatalog (service) — Service for tracking data artifacts, lineage, and caching workflow inputs/outputs datacatalog/
GoCronScheduler (class) — Cron-based scheduler for executing workflows on time-based triggers using robfig/cron library flyteadmin/scheduler/core/gocron_scheduler.go
flytectl (cli-command) — Command-line tool for interacting with Flyte clusters and managing workflows flytectl/
PluginRegistry (plugin) — Extensible system for integrating different compute backends and task types flyteplugins/
FlyteStdLib (utility) — Shared libraries providing storage, logging, configuration, and other common functionality flytestdlib/
single.Execute (handler) — Entry point for running all Flyte services in a single binary for development/testing cmd/single/
auth (middleware) — Authentication and authorization system supporting OIDC, OAuth2, and other auth methods flyteadmin/auth/
repositories (service) — Database access layer using GORM for persisting workflow metadata and execution state flyteadmin/pkg/repositories/

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is flyte used for?

Kubernetes-native workflow orchestration platform for ML/data pipelines flyteorg/flyte is a 10-component ml training written in Go. Data flows through 6 distinct pipeline stages. The codebase contains 2495 files.

How is flyte architected?

flyte is organized into 5 architecture layers: Control Plane Services, CLI & Client Tools, Plugin System, Shared Libraries, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure enables tight integration between components.

How does data flow through flyte?

Data moves through 6 stages: Workflow Registration → Execution Request → Workflow Orchestration → Task Execution → Data Tracking → .... Workflows are registered to FlyteAdmin, executed by FlytePropeller, with data tracked by DataCatalog and scheduled by the Scheduler service. This pipeline design reflects a complex multi-stage processing system.

What technologies does flyte use?

The core stack includes Go (Primary programming language for all services), Kubernetes (Container orchestration and workflow execution platform), gRPC (Inter-service communication protocol), PostgreSQL (Primary database for metadata storage), GORM (ORM for database operations), Protocol Buffers (API interface definitions and serialization), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does flyte have?

flyte exhibits 4 data pools (PostgreSQL Database, Data Catalog Store), 4 feedback loops, 5 control points, 4 delays. The feedback loops handle polling and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does flyte use?

6 design patterns detected: Microservice Architecture, Plugin System, Kubernetes Controller Pattern, gRPC with REST Gateway, Repository Pattern, and 1 more.

Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.

flyteorg/flyte

How Data Flows Through the System

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Related Ml Training Repositories

tensorflow/tensorflow

automatic1111/stable-diffusion-webui

huggingface/transformers

ggml-org/llama.cpp

pytorch/pytorch

openai/whisper

Frequently Asked Questions