foundationagents/metagpt

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

67,258 stars Python 10 components

Orchestrates multi-agent teams of LLM-powered roles to collaborate on software development projects

A user requirement enters through Team.run_project(), gets processed by specialized roles (ProductManager creates PRD, Architect designs system, Engineer writes code) where each role executes actions that call LLMs through providers, store outputs in memory/documents, and send messages to trigger downstream roles. The cycle continues until all deliverables are complete.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 10-component ml inference. 913 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Requirement Intake — Team.run_project() receives user requirement string and publishes UserRequirement message to all hired roles via Environment.publish_message() (config: llm.model, llm.api_key)
Role Activation — Each Role checks if incoming Message.cause_by matches their _watch set, adds matching messages to memory via Memory.add(), and queues appropriate Actions in _rc.todo [Message → RoleContext]
Action Execution — Role._act() pops Action from todo queue, calls Action.run() which formats prompts using PROMPT_TEMPLATE and calls LLMProvider.aask() or LLMProvider.acompletion() [RoleContext → ActionOutput] (config: llm.model, llm.base_url, llm.api_type)
Output Processing — Action.run() processes LLM response, structures it using Pydantic models if defined, stores in ProjectRepo or DocumentStore, and returns ActionOutput [ActionOutput → Document]
Message Broadcasting — Role._react() converts ActionOutput to Message, publishes via Environment.publish_message() to roles watching for this message type based on Message.cause_by [ActionOutput → Message]
Team Coordination — Team.run() orchestrates n_round iterations, checking if all roles have completed their actions via Role.get_memories(), and manages team-wide termination conditions [Message]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Message metagpt/schema.py
Pydantic model with content: str, role: str (user/assistant/system), cause_by: Action class, send_to: Role or str, restricted_to: str, tag: str, and metadata fields
Created when agents communicate, routed through team message bus, stored in agent memory, consumed by receiving agents

ActionOutput metagpt/actions/__init__.py
Generic container with content: Any (text, code, or structured data), instruct_content: BaseModel for structured outputs
Generated by Action.run(), consumed by Role._act(), can be converted to Message for inter-agent communication

Context metagpt/context.py
Container with config: Config object, git_repo: GitRepository, src_workspace: Path, project_repo: ProjectRepo
Created during team setup, shared across all agents in a team, provides access to workspace and configuration

Document metagpt/document.py
Pydantic model with name: str, n_docs: int, n_chars: int, symbols: list for tracking document metadata and content statistics
Created by document-generating actions, stored in document_store, retrieved for context in subsequent actions

Node metagpt/ext/aflow/scripts/operator_an.py
Workflow graph node with id: str, operation: str, inputs: list[str], outputs: list[str] representing atomic operations in AFlow
Constructed during workflow generation, optimized through genetic algorithms, executed in dependency order

RoleContext metagpt/roles/role.py
Dict containing role_id: str, watch: set[Action types], news: list[Message], memory: Memory, todo: deque[Action]
Maintained throughout role lifespan, updated on each message, drives role's action decisions and execution

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

Assumes a Minecraft server is running on localhost with admin privileges allowing /clear, /kill, /give, and /item commands without authentication or permission checks

If this fails: Bot fails to execute setup commands if server lacks required permissions or plugins, causing silent initialization failure with equipment/inventory not matching expected state

metagpt/environment/minecraft/mineflayer/index.js:bot.chat()

critical Resource weakly guarded

Assumes LLM API has sufficient rate limits and quota to handle team.run(n_round=5) with multiple roles making concurrent API calls without hitting budget or rate limits

If this fails: Team execution silently fails or produces partial results when API quota exhausted, leaving some roles unable to complete their actions while others succeed

metagpt/team.py:company.invest()

critical Temporal unguarded

Assumes team.run_project() completes synchronously before await company.run(n_round=5) begins, but run_project only publishes initial message without waiting for processing completion

If this fails: Race condition where roles may not have processed the initial requirement message when the n_round execution loop starts, causing first round to execute with empty todo queues

examples/ui_with_chainlit/app.py:company.run_project()

warning Domain unguarded

Assumes equipment array has exactly 6 elements corresponding to [head, chest, legs, feet, mainhand, offhand] armor slots, but skips index 4 (mainhand) without validation

If this fails: Array index out of bounds or incorrect equipment assignment if client sends equipment array with different length or ordering than expected

metagpt/environment/minecraft/mineflayer/index.js:equipment array

warning Scale unguarded

Assumes genetic algorithm population size and generation limits are sufficient for dataset complexity, with hardcoded operators list per experiment type

If this fails: Optimization may converge to suboptimal solutions for complex datasets or fail to explore solution space adequately if population/generation limits too low

examples/aflow/optimize.py:Optimizer

warning Contract unguarded

Assumes LLM response contains valid Python code wrapped in ```python ``` markdown blocks that can be extracted and executed without syntax validation

If this fails: Generated agent code may contain syntax errors, security vulnerabilities, or malformed class definitions that cause runtime failures when instantiated

examples/agent_creator.py:CreateAgent.run()

warning Resource unguarded

Assumes Android device has sufficient storage space in /sdcard/Pictures/Screenshots and /sdcard directories for continuous screenshot and XML file generation

If this fails: Assistant fails when device storage full, causing screenshot capture to fail and breaking the observation-action loop without graceful degradation

examples/android_assistant/run_assistant.py:AndroidEnv

warning Environment weakly guarded

Assumes pathfinder and tool plugins load successfully within the setTimeout delay, but uses arbitrary 0ms timeout without checking load completion

If this fails: CollectBlock functionality may fail if dependent plugins haven't finished loading when bot tries to use pathfinder or tool capabilities

metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/index.ts:setTimeout

warning Ordering weakly guarded

Assumes qa list contains dictionaries with 'question' and 'answer' keys, but strips whitespace from string conversion without validating dictionary structure

If this fails: AttributeError when qa items are not dictionaries or missing expected keys, causing template save to fail and losing user's optimization configuration

metagpt/ext/spo/app.py:save_yaml_template()

info Domain weakly guarded

Assumes all files in company workspace are text-based project deliverables suitable for display, filtering only .git files but not binary files, images, or system files

If this fails: UI may attempt to display binary files or large media files as text, causing display corruption or memory issues in the Chainlit interface

examples/ui_with_chainlit/app.py:files iteration

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Agent Memory (in-memory)
Each agent maintains message history, experiences, and learned knowledge with similarity search capabilities

Document Store (database)
Vector database storing generated documents, code, and knowledge with embedding-based retrieval

Project Workspace (file-store)
File system workspace where generated code, documentation, and project artifacts are stored and versioned

Message Bus (queue)
Environment maintains message queue for inter-agent communication with role-based routing

Feedback Loops

Multi-Round Team Execution (convergence, reinforcing) — Trigger: Team.run() starts n_round loop. Action: All roles process messages, execute actions, publish results, check completion criteria. Exit: n_round limit reached or all roles report completion.
Role Action-Message Cycle (recursive, reinforcing) — Trigger: Role receives Message matching _watch criteria. Action: Role._act() executes action, outputs trigger new messages to other roles via Environment.publish_message(). Exit: No more messages match role's watch criteria or role._rc.todo is empty.
AFlow Genetic Optimization (training-loop, reinforcing) — Trigger: Optimizer.optimize() called with initial population. Action: Evaluate workflow fitness, select parents, crossover/mutate to create new generation, test against validation data. Exit: Target fitness reached or max generations exceeded.
LLM Retry with Backoff (retry, balancing) — Trigger: LLM API call fails with rate limit or temporary error. Action: Wait with exponential backoff, retry API call up to max_retry_times. Exit: Successful response or max retries exceeded.

Delays

LLM API Latency (async-processing, ~1-30 seconds per call) — Each action execution waits for LLM response before role can proceed to next action
Vector Embedding Generation (batch-window, ~100ms-2s per document) — Document storage waits for embedding computation before enabling similarity search
Team Round Synchronization (eventual-consistency, ~variable) — Team waits for all active roles to complete their actions before starting next round
File System Operations (batch-window, ~10-100ms) — Code generation waits for file writes to complete workspace updates

Control Points

LLM Model Selection (architecture-switch) — Controls: Which LLM backend powers all agent reasoning and text generation. Default: gpt-4-turbo
Team Investment Budget (threshold) — Controls: Maximum cost allowed for LLM API calls during team execution. Default: 3.0
Role React Mode (runtime-toggle) — Controls: Whether roles react to all messages (RoleReactMode.REACT) or only to specific triggers (RoleReactMode.BY_ORDER). Default: RoleReactMode.REACT
Memory Search Type (feature-flag) — Controls: Use similarity search vs chronological retrieval for agent memory lookups. Default: similarity
Document Store Backend (architecture-switch) — Controls: Vector database implementation (ChromaDB, Qdrant, Milvus) for document storage and retrieval. Default: ChromaDB
AFlow Population Size (hyperparameter) — Controls: Number of candidate workflows in each generation during genetic optimization. Default: 50

Technology Stack

Pydantic (serialization)
Provides data validation and structured outputs for agent messages and action results

asyncio (runtime)
Enables concurrent execution of multiple agents and asynchronous LLM API calls

ChromaDB/Qdrant/Milvus (database)
Vector databases for storing and retrieving documents with semantic similarity search

OpenAI/Anthropic/etc APIs (compute)
LLM providers that power agent reasoning, text generation, and structured output parsing

GitPython (infra)
Git operations for managing code repositories and version control in generated projects

Chainlit/Streamlit (framework)
Web UI frameworks for interactive agent interfaces and real-time conversation displays

Playwright/Selenium (infra)
Web automation tools enabling agents to interact with web browsers and scrape content

Mineflayer (library)
JavaScript Minecraft bot framework for creating agents that can interact with Minecraft servers

Key Components

Team (orchestrator) — Orchestrates multiple agent roles, manages their interactions through a shared environment and message bus, handles team-wide execution loops metagpt/team.py
Role (processor) — Base class for all agent types - manages action execution, message handling, memory, and role-specific behaviors through configurable SOPs metagpt/roles/role.py
Action (executor) — Encapsulates atomic agent capabilities - formats prompts, calls LLMs, processes responses, and produces structured outputs metagpt/actions/__init__.py
Environment (adapter) — Provides execution context and external system interfaces - handles message routing, maintains shared state, bridges to external tools metagpt/environment/base_env.py
LLMProvider (adapter) — Abstracts different LLM APIs (OpenAI, Claude, etc.) providing unified interface for completion, chat, and embedding calls with retry logic metagpt/provider/base_llm.py
Memory (store) — Manages agent's persistent knowledge and message history with search, storage, and retrieval capabilities including similarity search metagpt/memory/memory.py
ToolRegistry (registry) — Central registry for external tools - manages tool discovery, instantiation, and execution for agent-tool interactions metagpt/tools/tool_registry.py
Optimizer (optimizer) — AFlow workflow optimization engine using genetic algorithms to evolve and improve multi-agent workflow structures metagpt/ext/aflow/scripts/optimizer.py
DocumentStore (store) — Persists and retrieves documents with vector search capabilities - supports multiple backends like Chroma, Qdrant, Milvus metagpt/document_store/base_store.py
ProjectRepo (store) — Manages code project structure, file operations, and workspace organization during software development workflows metagpt/utils/project_repo.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Inference Repositories

Frequently Asked Questions

What is MetaGPT used for?

Orchestrates multi-agent teams of LLM-powered roles to collaborate on software development projects foundationagents/metagpt is a 10-component ml inference written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 913 files.

How is MetaGPT architected?

MetaGPT is organized into 4 architecture layers: Framework Core, Specialized Roles, Environment & Tools, Configuration & Infrastructure. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through MetaGPT?

Data moves through 6 stages: Requirement Intake → Role Activation → Action Execution → Output Processing → Message Broadcasting → .... A user requirement enters through Team.run_project(), gets processed by specialized roles (ProductManager creates PRD, Architect designs system, Engineer writes code) where each role executes actions that call LLMs through providers, store outputs in memory/documents, and send messages to trigger downstream roles. The cycle continues until all deliverables are complete. This pipeline design reflects a complex multi-stage processing system.

What technologies does MetaGPT use?

The core stack includes Pydantic (Provides data validation and structured outputs for agent messages and action results), asyncio (Enables concurrent execution of multiple agents and asynchronous LLM API calls), ChromaDB/Qdrant/Milvus (Vector databases for storing and retrieving documents with semantic similarity search), OpenAI/Anthropic/etc APIs (LLM providers that power agent reasoning, text generation, and structured output parsing), GitPython (Git operations for managing code repositories and version control in generated projects), Chainlit/Streamlit (Web UI frameworks for interactive agent interfaces and real-time conversation displays), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does MetaGPT have?

MetaGPT exhibits 4 data pools (Agent Memory, Document Store), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle convergence and recursive. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does MetaGPT use?

5 design patterns detected: Agent-Action Composition, Message-Driven Architecture, Provider Pattern for LLMs, SOP (Standard Operating Procedure), Context Injection.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.