kyegomez/swarms

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai

6,288 stars Python 6 components

Orchestrates multi-agent swarms with enterprise infrastructure for production AI workflows

Tasks enter through the CLI, API, or direct Agent instantiation. Agents process tasks through LLM calls, maintaining conversation history and executing tools as needed. Multi-agent workflows coordinate through swarm structures that handle task routing, result aggregation, and consensus mechanisms. All execution is monitored via telemetry collection.

Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

A 6-component ml inference. 833 files analyzed. Data flows through 4 distinct pipeline stages.

How Data Flows Through the System

Task Input Processing — Tasks arrive via CLI commands (swarms run), direct Agent.run() calls, or AOP HTTP/MCP requests. Input is validated and converted to ChatMessageInput format with role and content fields.
Agent Task Execution — Agent.run() method processes the task through an execution loop, calling LLM APIs via litellm with the agent's system prompt and conversation context. Supports tool calling, dynamic temperature, and configurable max_loops. [ChatMessageInput → AgentStep]
Multi-Agent Coordination — Swarm structures like HeavySwarm execute multiple agents in parallel or LLMCouncil chains them sequentially. Results are aggregated, compared, or passed through depending on the swarm type's coordination strategy. [Agent → Aggregated Results]
Response Generation — Final results are formatted and returned to the caller. AOP server wraps responses in MCP-compliant JSON, CLI pretty-prints with Rich formatting, direct calls return raw agent output. [AgentStep → ChatMessageResponse]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Agent swarms/structs/agent.py
Python class with agent_name: str, system_prompt: str, model_name: str, max_loops: int, temperature: float, conversation history: List[dict], tools: List[BaseTool]
Created with configuration, maintains conversation state during execution, can be serialized for persistence or network transport

ChatMessageInput swarms/schemas/base_schemas.py
Pydantic model with role: str ('user'|'assistant'|'system'), content: Union[str, List[ContentItem]] supporting text and images
Constructed from user input or agent responses, validated by Pydantic, passed to LLM APIs and stored in conversation history

AgentStep swarms/schemas/agent_step_schemas.py
Pydantic model with step_id: str, time: float, response: AgentChatCompletionResponse, containing execution metadata and LLM response
Created for each agent execution step, captures timing and response data, aggregated for performance monitoring and debugging

AOPTaskRequest swarms/structs/aop.py
Dict with agent_name: str, task: str, optional img/imgs for vision tasks, queued for distributed processing
Received via HTTP/MCP, validated, queued for processing, executed by target agent, response returned to client

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Domain unguarded

Assumes model_name follows litellm's naming convention (e.g., 'anthropic/claude-sonnet-4-5', 'gpt-4') but never validates format or provider availability before execution

If this fails: Agent silently fails or crashes when given invalid model names like 'gpt-5.4' (from example.py) - litellm may not recognize the model, causing runtime exceptions without helpful error messages

swarms/structs/agent.py:Agent.run

warning Temporal unguarded

@lru_cache(maxsize=1) decorator assumes system info remains static throughout process lifetime, never invalidating cached hardware/memory data

If this fails: Reports stale system metrics - if memory usage changes significantly or hardware is hot-swapped during long-running processes, telemetry shows outdated values leading to incorrect capacity planning

swarms/telemetry/main.py:get_comprehensive_system_info

critical Resource unguarded

max_queue_size_per_agent=100 assumes memory can hold 100 queued tasks per agent but never estimates actual memory usage based on task content size

If this fails: Memory exhaustion when tasks contain large payloads (images, long documents) - 100 tasks with 10MB each consumes 1GB per agent, potentially crashing the system without warning

examples/aop_examples/client/aop_queue_example.py:AOP

warning Contract weakly guarded

Assumes task and model parameters are 'non-empty' according to docstring but only validates they exist, not their actual content or format

If this fails: Empty strings or whitespace-only inputs pass validation but cause downstream failures in agent execution or swarm generation with confusing error messages

swarms/cli/main.py:run_autoswarm

critical Environment weakly guarded

streamablehttp_client context manager assumes it returns exactly 2 or more items but handles variable return lengths inconsistently

If this fails: IndexError or unpacking failures when network conditions or MCP server versions return different context structures, causing client connections to crash

examples/aop_examples/client/aop_raw_task_example.py:call_agent_tool_raw

critical Ordering unguarded

agents=[agent1, agent2, agent3] list assumes agents maintain their order and identity throughout AOP lifecycle

If this fails: Task routing breaks if agents are internally reordered or replaced - requests for 'agent1' might execute on agent3, producing wrong results without detection

examples/aop_examples/utils/comprehensive_aop_example.py:AOP

warning Scale unguarded

max_network_retries=5 and network_retry_delay=3.0 assumes network issues resolve within 15 seconds total retry window

If this fails: Permanent network failures in cloud environments with longer recovery times cause task abandonment - legitimate requests fail after 15s when infrastructure might need 30-60s to recover

examples/aop_examples/utils/network_error_example.py:AOP

warning Resource unguarded

platform.node() assumes hostname is available and unique across deployments for machine identification

If this fails: Telemetry data collision in containerized environments where multiple containers share localhost/generic hostnames - metrics get attributed to wrong instances, corrupting usage analytics

swarms/telemetry/main.py:get_machine_id

warning Contract unguarded

json.dumps({}) for empty arguments assumes MCP servers accept empty JSON objects but different implementations might require specific parameter structures

If this fails: Discovery fails against MCP servers expecting explicit parameter schemas - some servers reject empty args while others need version fields or authentication tokens

examples/aop_examples/discovery/simple_discovery_example.py:call_discover_agents_sync

info Temporal unguarded

dynamic_temperature_enabled=True assumes temperature adjustments improve output quality but never validates if the model actually supports dynamic temperature changes

If this fails: Some models ignore temperature changes or behave unpredictably when temperature varies mid-conversation, leading to inconsistent response quality without feedback to the user

examples/aop_examples/server.py:Agent

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Agent Memory (in-memory)
Each agent maintains conversation history, tool call results, and execution state in memory during its lifecycle

AOP Task Queue (queue)
Per-agent queues buffer incoming task requests when queue_enabled=True, with configurable max_queue_size_per_agent and worker pools

Telemetry Store (registry)
Collects and aggregates execution metrics, system performance data, and usage statistics for monitoring and optimization

Feedback Loops

Agent Execution Loop (retry, balancing) — Trigger: max_loops configuration in Agent. Action: Re-executes LLM call with updated conversation context, can adjust temperature dynamically based on dynamic_temperature_enabled. Exit: Reaches max_loops limit or task completion condition.
Dynamic Temperature Adjustment (self-correction, balancing) — Trigger: dynamic_temperature_enabled=True in Agent configuration. Action: Adjusts LLM temperature based on response quality or execution context to optimize output consistency vs creativity. Exit: Task completion or manual override.
AOP Network Retry (circuit-breaker, balancing) — Trigger: Network failures or timeout errors in AOP requests. Action: Implements exponential backoff and retry logic with max_network_retries and network_retry_delay configuration. Exit: Successful connection or max retry limit reached.

Delays

LLM API Latency (async-processing, ~variable (typically 1-30 seconds)) — Agent execution blocks waiting for LLM response, affects overall swarm coordination timing
AOP Queue Processing (queue-drain, ~configurable via processing_timeout) — Task requests wait in queue until worker threads become available, controlled by max_workers_per_agent
Swarm Coordination Wait (batch-window, ~depends on slowest agent in parallel swarms) — Parallel swarms wait for all agents to complete before aggregating results

Control Points

max_loops (threshold) — Controls: Number of execution iterations per task, affects agent persistence and reasoning depth. Default: configurable, typically 1-10
dynamic_temperature_enabled (feature-flag) — Controls: Whether agents adjust LLM temperature dynamically during execution for optimization. Default: boolean flag
queue_enabled (architecture-switch) — Controls: Enables queue-based task processing vs direct execution in AOP servers. Default: boolean, enables distributed processing
model_name (runtime-toggle) — Controls: Which LLM model to use (GPT-4, Claude, etc.), affects agent capabilities and costs. Default: string, e.g. 'gpt-4', 'anthropic/claude-sonnet-4-5'

Technology Stack

litellm (library)
Unified LLM API client supporting OpenAI, Anthropic, and other providers with consistent interface

pydantic (library)
Data validation and serialization for agent configurations, message schemas, and API contracts

rich (library)
Terminal formatting and progress display in CLI with colored output and status indicators

asyncio (runtime)
Asynchronous execution for concurrent agent operations and network communication

httpx (library)
HTTP client for API calls and AOP network communication with async support

networkx (library)
Graph-based agent routing and dependency management in complex swarm topologies

tenacity (library)
Retry logic and resilience patterns for LLM API calls and network operations

mcp (framework)
Model Context Protocol implementation for standardized agent communication and tool calling

Key Components

Agent (executor) — Core agent runtime that executes tasks using LLMs, maintains conversation state, handles tool calling, and manages execution loops with retries and dynamic temperature adjustment swarms/structs/agent.py
AOP (orchestrator) — Agent-over-Protocol server that exposes agents as network services via HTTP and MCP, handles task queuing, load balancing, and distributed agent execution with automatic retry and monitoring swarms/structs/aop.py
HeavySwarm (orchestrator) — Parallel multi-agent executor that runs multiple agents concurrently on the same task, aggregates results, and can apply consensus mechanisms or result selection strategies swarms/structs/heavy_swarm.py
LLMCouncil (orchestrator) — Sequential agent chain that passes tasks through multiple agents in order, where each agent can build on the previous agent's work to create multi-step reasoning workflows swarms/structs/llm_council.py
SwarmCLI (dispatcher) — Command-line interface that handles agent creation, swarm configuration generation, YAML-based agent loading, and system management operations with rich formatting and progress feedback swarms/cli/main.py
TelemetryCollector (monitor) — System telemetry collection that gathers agent execution metrics, system performance data, and usage analytics for monitoring and optimization of agent workflows swarms/telemetry/main.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Inference Repositories

Frequently Asked Questions

What is swarms used for?

Orchestrates multi-agent swarms with enterprise infrastructure for production AI workflows kyegomez/swarms is a 6-component ml inference written in Python. Data flows through 4 distinct pipeline stages. The codebase contains 833 files.

How is swarms architected?

swarms is organized into 3 architecture layers: Agent Layer, Swarm Orchestration, Infrastructure Services. Data flows through 4 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through swarms?

Data moves through 4 stages: Task Input Processing → Agent Task Execution → Multi-Agent Coordination → Response Generation. Tasks enter through the CLI, API, or direct Agent instantiation. Agents process tasks through LLM calls, maintaining conversation history and executing tools as needed. Multi-agent workflows coordinate through swarm structures that handle task routing, result aggregation, and consensus mechanisms. All execution is monitored via telemetry collection. This pipeline design keeps the data transformation process straightforward.

What technologies does swarms use?

The core stack includes litellm (Unified LLM API client supporting OpenAI, Anthropic, and other providers with consistent interface), pydantic (Data validation and serialization for agent configurations, message schemas, and API contracts), rich (Terminal formatting and progress display in CLI with colored output and status indicators), asyncio (Asynchronous execution for concurrent agent operations and network communication), httpx (HTTP client for API calls and AOP network communication with async support), networkx (Graph-based agent routing and dependency management in complex swarm topologies), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does swarms have?

swarms exhibits 3 data pools (Agent Memory, AOP Task Queue), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle retry and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does swarms use?

4 design patterns detected: Agent-over-Protocol, Dynamic Agent Composition, Enterprise Telemetry, Tool Integration Framework.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.