prefecthq/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Orchestrates Python data pipeline execution with scheduling, retries, and distributed task management
Flow definitions with @flow/@task decorators are submitted as deployments to the server database. The scheduler creates flow runs and places them in work queues. Workers poll queues, retrieve flow runs, and execute them using the flow engine which manages task execution, state transitions, and result persistence. Events are emitted throughout for observability and automation triggers.
Under the hood, the system uses 5 feedback loops, 5 data pools, 8 control points to manage its runtime behavior.
A 10-component repository. 3350 files analyzed. Data flows through 7 distinct pipeline stages.
How Data Flows Through the System
Flow definitions with @flow/@task decorators are submitted as deployments to the server database. The scheduler creates flow runs and places them in work queues. Workers poll queues, retrieve flow runs, and execute them using the flow engine which manages task execution, state transitions, and result persistence. Events are emitted throughout for observability and automation triggers.
- Flow definition and deployment — User defines flows with @flow decorator and creates deployments via CLI using Flow.deploy() method, storing flow code references and configuration in server database [Flow → Deployment]
- Schedule-based flow run creation — Scheduler service queries deployments with schedules, creates FlowRun instances using deployment parameters, and queues them in specified work queues [Deployment → FlowRun]
- Worker polling and run acquisition — Workers poll work queues via PrefectClient.get_runs_in_work_queue(), acquire FlowRun assignments, and prepare execution environment based on infrastructure configuration [WorkQueue → FlowRun]
- Flow execution and task orchestration — FlowRunner loads flow code, resolves parameters from blocks, executes @flow function which creates TaskRun instances for each @task call, managing parallel execution and dependencies [FlowRun → TaskRun]
- Task execution with caching and retries — TaskRunner executes individual tasks, checks cache via cache_key lookup, handles retries on failure, and persists results using ResultStore [TaskRun → Task results]
- State management and persistence — StateManager validates state transitions, updates FlowRun/TaskRun states in database via PrefectClient, and triggers event emission for state changes [State → Event]
- Event processing and automation — EventEmitter publishes events to automation engine, which evaluates trigger conditions and executes actions like creating new flow runs or sending notifications [Event → Automation actions]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
src/prefect/server/schemas/core.pyPydantic model with id: UUID, flow_id: UUID, state: State, parameters: dict, start_time: datetime, end_time: datetime, deployment_id: Optional[UUID]
Created when flow is called, executed by engine with state transitions, persisted to database, displayed in UI
src/prefect/server/schemas/core.pyPydantic model with id: UUID, task_key: str, flow_run_id: UUID, state: State, parameters: dict, cache_key: Optional[str]
Created during flow execution when @task functions are called, executed by task engine, potentially cached
src/prefect/client/schemas/objects.pyPydantic model with type: StateType (Pending/Running/Completed/Failed/Cancelled), message: str, data: Any, state_details: StateDetails
Represents current execution status, transitions through lifecycle, persisted for observability
src/prefect/server/schemas/core.pyPydantic model with id: UUID, flow_id: UUID, name: str, work_queue_name: str, parameters: dict, schedule: Optional[Schedule], infrastructure: dict
Created via CLI/API, stored in database, used by scheduler to create flow runs, executed by workers
src/prefect/server/schemas/core.pyPydantic model with id: UUID, name: str, filter: Optional[dict], concurrency_limit: Optional[int]
Created for organizing work, receives flow runs from scheduler, polled by workers for execution
src/prefect/blocks/core.pyPydantic model with _block_type_name: str, _block_document_name: str, plus dynamic fields for configuration storage
Created to store reusable configuration, referenced in flows/deployments, resolved at runtime
src/prefect/events/schemas/events.pyPydantic model with event: str, resource: dict, payload: dict, occurred: datetime, id: UUID
Emitted during flow/task execution, stored in database, processed by automations, displayed in UI
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
assumes HTML element with id 'app' exists in the DOM for Vue.js to mount to
If this fails: Vue app fails to mount, entire UI doesn't render, white screen of death
ui/src/main.ts:start
assumes VITE_AMPLITUDE_API_KEY environment variable format is valid for Amplitude SDK without validation
If this fails: Amplitude initialization silently fails or crashes with invalid API key format, analytics data lost
ui-v2/src/analytics/index.ts:initAmplitude
assumes sessionStorage is available and writable in the browser environment
If this fails: tracking fails in incognito mode or browsers with disabled storage, duplicate events sent
ui-v2/src/analytics/index.ts:trackWebAppLoaded
assumes createWorkspaceRouteRecords function accepts all provided route components and they have compatible interfaces
If this fails: route registration fails at runtime, navigation breaks with cryptic errors
ui/src/router/index.ts:workspaceRoutes
assumes BASE_URL from meta utilities is a valid URL path for web history routing
If this fails: router creates invalid routes, navigation fails or redirects to wrong paths
ui/src/router/index.ts:BASE_URL
assumes plugin installation order (router, PrefectDesign, PrefectUILibrary) doesn't matter for component dependencies
If this fails: Vue components fail to resolve dependencies, runtime errors when accessing plugin features
ui/src/main.ts:start
assumes all mapper functions in designMaps object have compatible signatures with spread operator
If this fails: object spread fails with non-enumerable properties, mapping functions unavailable at runtime
ui/src/maps/index.ts:maps
assumes DBT_ environment variables are properly formatted for Pydantic field types (Path, etc.)
If this fails: Pydantic validation fails with cryptic type errors, dbt integration breaks during initialization
src/integrations/prefect-dbt/prefect_dbt/core/settings.py:PrefectDbtSettings
assumes current working directory and standard dbt profile locations are readable and accessible
If this fails: profiles_dir lookup fails in containerized or restricted environments, dbt commands fail to find configuration
src/integrations/prefect-dbt/prefect_dbt/core/settings.py:find_profiles_dir
assumes session storage persists for the lifetime of a single browser session without clearing
If this fails: web app loaded events get tracked multiple times per session if storage is cleared, analytics data inflated
ui-v2/src/analytics/index.ts:SESSION_STORAGE_KEY
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Stores all persistent state including flow runs, task runs, deployments, work queues, and events
Queues containing flow runs ready for execution, organized by priority and filtering rules
Pluggable storage backends (filesystem, S3, GCS) for persisting flow and task results
In-memory and database storage of configuration blocks for reusable infrastructure and credentials
Async message queue for events that trigger automations and provide observability
Feedback Loops
- Task retry loop (retry, balancing) — Trigger: Task failure with retries configured. Action: TaskRunner waits exponential backoff delay then re-executes task with same parameters. Exit: Task succeeds or retry limit reached.
- Worker heartbeat polling (polling, reinforcing) — Trigger: Worker starts and joins work pool. Action: Worker sends heartbeat and polls for new flow runs every polling_interval seconds. Exit: Worker shutdown or pool deletion.
- Scheduler deployment polling (polling, reinforcing) — Trigger: Scheduler service startup. Action: Queries deployments for due schedules and creates flow runs every scheduling_interval. Exit: Service shutdown.
- Automation trigger evaluation (auto-scale, reinforcing) — Trigger: Event matching automation trigger condition. Action: Creates new flow run or performs configured action. Exit: Trigger condition no longer met or automation disabled.
- Flow run state propagation (self-correction, balancing) — Trigger: Flow run state change. Action: Updates child task run states and parent flow context. Exit: All state updates completed.
Delays
- Task retry backoff (rate-limit, ~exponential: retry_delay_seconds * (2 ** attempt)) — Prevents overwhelming failing resources while allowing recovery
- Worker polling interval (scheduled-job, ~PREFECT_WORKER_QUERY_SECONDS (default 10s)) — Controls frequency of work queue polling and resource usage
- Database connection pool (queue-drain, ~Connection acquisition timeout) — Blocks API requests when connection pool exhausted
- Event processing delay (async-processing, ~Event emission to automation trigger evaluation) — Automations react after event processing latency
- Flow run scheduling (scheduled-job, ~Schedule interval or cron expression) — Determines when scheduled flows execute
Control Points
- PREFECT_API_URL (env-var) — Controls: Which Prefect server the client connects to. Default: http://127.0.0.1:4200/api
- PREFECT_WORKER_QUERY_SECONDS (env-var) — Controls: How frequently workers poll for new flow runs. Default: 10
- task_runner (runtime-toggle) — Controls: Execution strategy for tasks (sequential, concurrent, distributed). Default: ConcurrentTaskRunner
- retries (hyperparameter) — Controls: How many times a failed task will retry execution. Default: 0
- cache_policy (feature-flag) — Controls: Whether task results are cached and cache key strategy. Default: None
- infrastructure (architecture-switch) — Controls: Where and how flow runs execute (local, Docker, Kubernetes). Default: Process
- log_level (env-var) — Controls: Verbosity of logging output. Default: INFO
- concurrency_limit (rate-limit) — Controls: Maximum concurrent executions for tagged runs
Technology Stack
Provides REST API server for flow orchestration and web UI serving
ORM for database operations with async support for flow/task persistence
Database migration system for evolving Prefect server schema
Data validation and serialization for API schemas and configuration
Frontend framework for both legacy UI and next-gen UI applications
ASGI server running the FastAPI application
Container infrastructure for flow execution environments
Async execution framework for concurrent task processing
HTTP client for server communication with async support
Serialization of flow/task code for remote execution
Terminal formatting for CLI output and logging
Command-line interface framework
Key Components
- FlowRunner (orchestrator) — Orchestrates flow execution including parameter resolution, task scheduling, and state management
src/prefect/flow_engine.py - TaskRunner (executor) — Executes individual tasks with retry logic, caching, and result persistence
src/prefect/task_engine.py - PrefectClient (gateway) — HTTP client for communicating with Prefect server API, handles authentication and retries
src/prefect/client/orchestration.py - Scheduler (scheduler) — Creates flow runs from deployments based on schedules and triggers
src/prefect/server/services/scheduler.py - Worker (executor) — Polls work queues and executes flow runs in configured infrastructure environments
src/prefect/workers/base.py - DatabaseInterface (adapter) — Abstracts database operations across SQLite and PostgreSQL with connection pooling
src/prefect/server/database/interface.py - StateManager (processor) — Manages state transitions and validation for flows and tasks
src/prefect/states.py - ResultStore (store) — Persists and retrieves flow/task results using pluggable storage backends
src/prefect/results.py - BlockRegistry (registry) — Manages registration and instantiation of configuration blocks with validation
src/prefect/blocks/core.py - EventEmitter (adapter) — Emits events during flow/task execution for automation triggers and observability
src/prefect/events/client.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare prefect
Related Repository Repositories
Frequently Asked Questions
What is prefect used for?
Orchestrates Python data pipeline execution with scheduling, retries, and distributed task management prefecthq/prefect is a 10-component repository written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 3350 files.
How is prefect architected?
prefect is organized into 6 architecture layers: Client SDK, Server API, Database Layer, Engine Layer, and 2 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through prefect?
Data moves through 7 stages: Flow definition and deployment → Schedule-based flow run creation → Worker polling and run acquisition → Flow execution and task orchestration → Task execution with caching and retries → .... Flow definitions with @flow/@task decorators are submitted as deployments to the server database. The scheduler creates flow runs and places them in work queues. Workers poll queues, retrieve flow runs, and execute them using the flow engine which manages task execution, state transitions, and result persistence. Events are emitted throughout for observability and automation triggers. This pipeline design reflects a complex multi-stage processing system.
What technologies does prefect use?
The core stack includes FastAPI (Provides REST API server for flow orchestration and web UI serving), SQLAlchemy (ORM for database operations with async support for flow/task persistence), Alembic (Database migration system for evolving Prefect server schema), Pydantic (Data validation and serialization for API schemas and configuration), Vue.js (Frontend framework for both legacy UI and next-gen UI applications), Uvicorn (ASGI server running the FastAPI application), and 6 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does prefect have?
prefect exhibits 5 data pools (SQLite/PostgreSQL Database, Work Queues), 5 feedback loops, 8 control points, 5 delays. The feedback loops handle retry and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does prefect use?
6 design patterns detected: Decorator-based instrumentation, Async context propagation, Pluggable infrastructure adapters, Event-driven automation, State machine orchestration, and 1 more.
How does prefect compare to alternatives?
CodeSea has side-by-side architecture comparisons of prefect with dbt-core, celery, luigi. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on April 19, 2026 by CodeSea. Written by Karolina Sarna.