foundationagents/metagpt
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Orchestrates multi-agent teams of LLM-powered roles to collaborate on software development projects
A user requirement enters through Team.run_project(), gets processed by specialized roles (ProductManager creates PRD, Architect designs system, Engineer writes code) where each role executes actions that call LLMs through providers, store outputs in memory/documents, and send messages to trigger downstream roles. The cycle continues until all deliverables are complete.
Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.
A 10-component ml inference. 913 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
A user requirement enters through Team.run_project(), gets processed by specialized roles (ProductManager creates PRD, Architect designs system, Engineer writes code) where each role executes actions that call LLMs through providers, store outputs in memory/documents, and send messages to trigger downstream roles. The cycle continues until all deliverables are complete.
- Requirement Intake — Team.run_project() receives user requirement string and publishes UserRequirement message to all hired roles via Environment.publish_message() (config: llm.model, llm.api_key)
- Role Activation — Each Role checks if incoming Message.cause_by matches their _watch set, adds matching messages to memory via Memory.add(), and queues appropriate Actions in _rc.todo [Message → RoleContext]
- Action Execution — Role._act() pops Action from todo queue, calls Action.run() which formats prompts using PROMPT_TEMPLATE and calls LLMProvider.aask() or LLMProvider.acompletion() [RoleContext → ActionOutput] (config: llm.model, llm.base_url, llm.api_type)
- Output Processing — Action.run() processes LLM response, structures it using Pydantic models if defined, stores in ProjectRepo or DocumentStore, and returns ActionOutput [ActionOutput → Document]
- Message Broadcasting — Role._react() converts ActionOutput to Message, publishes via Environment.publish_message() to roles watching for this message type based on Message.cause_by [ActionOutput → Message]
- Team Coordination — Team.run() orchestrates n_round iterations, checking if all roles have completed their actions via Role.get_memories(), and manages team-wide termination conditions [Message]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
metagpt/schema.pyPydantic model with content: str, role: str (user/assistant/system), cause_by: Action class, send_to: Role or str, restricted_to: str, tag: str, and metadata fields
Created when agents communicate, routed through team message bus, stored in agent memory, consumed by receiving agents
metagpt/actions/__init__.pyGeneric container with content: Any (text, code, or structured data), instruct_content: BaseModel for structured outputs
Generated by Action.run(), consumed by Role._act(), can be converted to Message for inter-agent communication
metagpt/context.pyContainer with config: Config object, git_repo: GitRepository, src_workspace: Path, project_repo: ProjectRepo
Created during team setup, shared across all agents in a team, provides access to workspace and configuration
metagpt/document.pyPydantic model with name: str, n_docs: int, n_chars: int, symbols: list for tracking document metadata and content statistics
Created by document-generating actions, stored in document_store, retrieved for context in subsequent actions
metagpt/ext/aflow/scripts/operator_an.pyWorkflow graph node with id: str, operation: str, inputs: list[str], outputs: list[str] representing atomic operations in AFlow
Constructed during workflow generation, optimized through genetic algorithms, executed in dependency order
metagpt/roles/role.pyDict containing role_id: str, watch: set[Action types], news: list[Message], memory: Memory, todo: deque[Action]
Maintained throughout role lifespan, updated on each message, drives role's action decisions and execution
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Assumes a Minecraft server is running on localhost with admin privileges allowing /clear, /kill, /give, and /item commands without authentication or permission checks
If this fails: Bot fails to execute setup commands if server lacks required permissions or plugins, causing silent initialization failure with equipment/inventory not matching expected state
metagpt/environment/minecraft/mineflayer/index.js:bot.chat()
Assumes LLM API has sufficient rate limits and quota to handle team.run(n_round=5) with multiple roles making concurrent API calls without hitting budget or rate limits
If this fails: Team execution silently fails or produces partial results when API quota exhausted, leaving some roles unable to complete their actions while others succeed
metagpt/team.py:company.invest()
Assumes team.run_project() completes synchronously before await company.run(n_round=5) begins, but run_project only publishes initial message without waiting for processing completion
If this fails: Race condition where roles may not have processed the initial requirement message when the n_round execution loop starts, causing first round to execute with empty todo queues
examples/ui_with_chainlit/app.py:company.run_project()
Assumes equipment array has exactly 6 elements corresponding to [head, chest, legs, feet, mainhand, offhand] armor slots, but skips index 4 (mainhand) without validation
If this fails: Array index out of bounds or incorrect equipment assignment if client sends equipment array with different length or ordering than expected
metagpt/environment/minecraft/mineflayer/index.js:equipment array
Assumes genetic algorithm population size and generation limits are sufficient for dataset complexity, with hardcoded operators list per experiment type
If this fails: Optimization may converge to suboptimal solutions for complex datasets or fail to explore solution space adequately if population/generation limits too low
examples/aflow/optimize.py:Optimizer
Assumes LLM response contains valid Python code wrapped in ```python ``` markdown blocks that can be extracted and executed without syntax validation
If this fails: Generated agent code may contain syntax errors, security vulnerabilities, or malformed class definitions that cause runtime failures when instantiated
examples/agent_creator.py:CreateAgent.run()
Assumes Android device has sufficient storage space in /sdcard/Pictures/Screenshots and /sdcard directories for continuous screenshot and XML file generation
If this fails: Assistant fails when device storage full, causing screenshot capture to fail and breaking the observation-action loop without graceful degradation
examples/android_assistant/run_assistant.py:AndroidEnv
Assumes pathfinder and tool plugins load successfully within the setTimeout delay, but uses arbitrary 0ms timeout without checking load completion
If this fails: CollectBlock functionality may fail if dependent plugins haven't finished loading when bot tries to use pathfinder or tool capabilities
metagpt/environment/minecraft/mineflayer/mineflayer-collectblock/src/index.ts:setTimeout
Assumes qa list contains dictionaries with 'question' and 'answer' keys, but strips whitespace from string conversion without validating dictionary structure
If this fails: AttributeError when qa items are not dictionaries or missing expected keys, causing template save to fail and losing user's optimization configuration
metagpt/ext/spo/app.py:save_yaml_template()
Assumes all files in company workspace are text-based project deliverables suitable for display, filtering only .git files but not binary files, images, or system files
If this fails: UI may attempt to display binary files or large media files as text, causing display corruption or memory issues in the Chainlit interface
examples/ui_with_chainlit/app.py:files iteration
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Each agent maintains message history, experiences, and learned knowledge with similarity search capabilities
Vector database storing generated documents, code, and knowledge with embedding-based retrieval
File system workspace where generated code, documentation, and project artifacts are stored and versioned
Environment maintains message queue for inter-agent communication with role-based routing
Feedback Loops
- Multi-Round Team Execution (convergence, reinforcing) — Trigger: Team.run() starts n_round loop. Action: All roles process messages, execute actions, publish results, check completion criteria. Exit: n_round limit reached or all roles report completion.
- Role Action-Message Cycle (recursive, reinforcing) — Trigger: Role receives Message matching _watch criteria. Action: Role._act() executes action, outputs trigger new messages to other roles via Environment.publish_message(). Exit: No more messages match role's watch criteria or role._rc.todo is empty.
- AFlow Genetic Optimization (training-loop, reinforcing) — Trigger: Optimizer.optimize() called with initial population. Action: Evaluate workflow fitness, select parents, crossover/mutate to create new generation, test against validation data. Exit: Target fitness reached or max generations exceeded.
- LLM Retry with Backoff (retry, balancing) — Trigger: LLM API call fails with rate limit or temporary error. Action: Wait with exponential backoff, retry API call up to max_retry_times. Exit: Successful response or max retries exceeded.
Delays
- LLM API Latency (async-processing, ~1-30 seconds per call) — Each action execution waits for LLM response before role can proceed to next action
- Vector Embedding Generation (batch-window, ~100ms-2s per document) — Document storage waits for embedding computation before enabling similarity search
- Team Round Synchronization (eventual-consistency, ~variable) — Team waits for all active roles to complete their actions before starting next round
- File System Operations (batch-window, ~10-100ms) — Code generation waits for file writes to complete workspace updates
Control Points
- LLM Model Selection (architecture-switch) — Controls: Which LLM backend powers all agent reasoning and text generation. Default: gpt-4-turbo
- Team Investment Budget (threshold) — Controls: Maximum cost allowed for LLM API calls during team execution. Default: 3.0
- Role React Mode (runtime-toggle) — Controls: Whether roles react to all messages (RoleReactMode.REACT) or only to specific triggers (RoleReactMode.BY_ORDER). Default: RoleReactMode.REACT
- Memory Search Type (feature-flag) — Controls: Use similarity search vs chronological retrieval for agent memory lookups. Default: similarity
- Document Store Backend (architecture-switch) — Controls: Vector database implementation (ChromaDB, Qdrant, Milvus) for document storage and retrieval. Default: ChromaDB
- AFlow Population Size (hyperparameter) — Controls: Number of candidate workflows in each generation during genetic optimization. Default: 50
Technology Stack
Provides data validation and structured outputs for agent messages and action results
Enables concurrent execution of multiple agents and asynchronous LLM API calls
Vector databases for storing and retrieving documents with semantic similarity search
LLM providers that power agent reasoning, text generation, and structured output parsing
Git operations for managing code repositories and version control in generated projects
Web UI frameworks for interactive agent interfaces and real-time conversation displays
Web automation tools enabling agents to interact with web browsers and scrape content
JavaScript Minecraft bot framework for creating agents that can interact with Minecraft servers
Key Components
- Team (orchestrator) — Orchestrates multiple agent roles, manages their interactions through a shared environment and message bus, handles team-wide execution loops
metagpt/team.py - Role (processor) — Base class for all agent types - manages action execution, message handling, memory, and role-specific behaviors through configurable SOPs
metagpt/roles/role.py - Action (executor) — Encapsulates atomic agent capabilities - formats prompts, calls LLMs, processes responses, and produces structured outputs
metagpt/actions/__init__.py - Environment (adapter) — Provides execution context and external system interfaces - handles message routing, maintains shared state, bridges to external tools
metagpt/environment/base_env.py - LLMProvider (adapter) — Abstracts different LLM APIs (OpenAI, Claude, etc.) providing unified interface for completion, chat, and embedding calls with retry logic
metagpt/provider/base_llm.py - Memory (store) — Manages agent's persistent knowledge and message history with search, storage, and retrieval capabilities including similarity search
metagpt/memory/memory.py - ToolRegistry (registry) — Central registry for external tools - manages tool discovery, instantiation, and execution for agent-tool interactions
metagpt/tools/tool_registry.py - Optimizer (optimizer) — AFlow workflow optimization engine using genetic algorithms to evolve and improve multi-agent workflow structures
metagpt/ext/aflow/scripts/optimizer.py - DocumentStore (store) — Persists and retrieves documents with vector search capabilities - supports multiple backends like Chroma, Qdrant, Milvus
metagpt/document_store/base_store.py - ProjectRepo (store) — Manages code project structure, file operations, and workspace organization during software development workflows
metagpt/utils/project_repo.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Inference Repositories
Frequently Asked Questions
What is MetaGPT used for?
Orchestrates multi-agent teams of LLM-powered roles to collaborate on software development projects foundationagents/metagpt is a 10-component ml inference written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 913 files.
How is MetaGPT architected?
MetaGPT is organized into 4 architecture layers: Framework Core, Specialized Roles, Environment & Tools, Configuration & Infrastructure. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through MetaGPT?
Data moves through 6 stages: Requirement Intake → Role Activation → Action Execution → Output Processing → Message Broadcasting → .... A user requirement enters through Team.run_project(), gets processed by specialized roles (ProductManager creates PRD, Architect designs system, Engineer writes code) where each role executes actions that call LLMs through providers, store outputs in memory/documents, and send messages to trigger downstream roles. The cycle continues until all deliverables are complete. This pipeline design reflects a complex multi-stage processing system.
What technologies does MetaGPT use?
The core stack includes Pydantic (Provides data validation and structured outputs for agent messages and action results), asyncio (Enables concurrent execution of multiple agents and asynchronous LLM API calls), ChromaDB/Qdrant/Milvus (Vector databases for storing and retrieving documents with semantic similarity search), OpenAI/Anthropic/etc APIs (LLM providers that power agent reasoning, text generation, and structured output parsing), GitPython (Git operations for managing code repositories and version control in generated projects), Chainlit/Streamlit (Web UI frameworks for interactive agent interfaces and real-time conversation displays), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does MetaGPT have?
MetaGPT exhibits 4 data pools (Agent Memory, Document Store), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle convergence and recursive. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does MetaGPT use?
5 design patterns detected: Agent-Action Composition, Message-Driven Architecture, Provider Pattern for LLMs, SOP (Standard Operating Procedure), Context Injection.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.