stanfordnlp/dspy
DSPy: The framework for programming—not prompting—language models
Programs language models with declarative Python code and auto-optimizes prompts
Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.
Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.
A 10-component ml inference. 244 files analyzed. Data flows through 7 distinct pipeline stages.
How Data Flows Through the System
Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.
- Define signature contract — User creates a Signature class with typed input/output fields and optional instructions — DSPy validates field types, generates descriptions, and creates the execution contract that modules must fulfill
- Create module instance — User instantiates a module (Predict, ChainOfThought, ReAct) with the signature — module validates compatibility and prepares for execution [Signature]
- Execute with input data — Module.__call__ receives input values, validates them against signature fields, and triggers the prediction pipeline [input values]
- Format prompt through adapter — ChatAdapter.format_prompt combines signature, input values, few-shot examples, and conversation history into LM messages — uses field headers like [[## question ##]] to structure content and instructs LM on output format [Signature → LMMessage]
- Call language model — BaseLM.generate sends formatted messages to the LM API (via LiteLLM), handles retries and caching, returns raw text response with usage metadata [LMMessage → LM response text]
- Parse structured response — Adapter.parse_response extracts field values from LM text using header patterns or JSON parsing, validates types against signature, handles errors with fallbacks [LM response text → Prediction]
- Return prediction result — Module returns Prediction object containing parsed field values, completion metadata, and conversation history — accessible via dot notation like result.answer [Prediction]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
dspy/signatures/signature.pyPydantic model with input_fields: dict[str, FieldInfo], output_fields: dict[str, FieldInfo], instructions: str, and field type annotations
Defined by user, validated during class creation, transformed by adapters into LM prompts, used to parse responses
dspy/primitives/prediction.pyDict-like object containing field_name: value pairs matching signature outputs, plus metadata like completions and history
Created by adapters from LM responses, returned by modules, used in optimization and evaluation
dspy/primitives/example.pyDict-like object with input and output field values, plus metadata like reasoning traces
Created from training data or generated during optimization, used as demonstrations in prompts
dspy/clients/base_lm.pyDict with role: 'user'|'assistant'|'system', content: str|list[dict], and optional tool_calls, citations
Generated by adapters, sent to LM APIs, stored in conversation history for multi-turn interactions
dspy/adapters/types/base_type.pyPydantic BaseModel subclass with format() -> list[dict[str, Any]]|str method for custom content types
Instantiated in signature fields, formatted by adapters into LM-compatible content structures
dspy/teleprompt/bootstrap.pyObject with predictor: Module, examples: list[Example], optim_config: dict tracking optimization progress
Created at optimization start, updated during trace generation, used to track learned examples and predictor state
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Language models will treat field headers like [[ ## field_name ## ]] as meaningful delimiters and not generate them as part of actual content
If this fails: If the LM includes these patterns in its response text, the parser will incorrectly split the content at those points, breaking field extraction and potentially losing data
dspy/adapters/chat_adapter.py:field_header_pattern
All adapters will successfully fallback to JSONAdapter when parsing fails, and JSONAdapter will always be available and functional
If this fails: If JSONAdapter is misconfigured, unavailable, or fails on the same input, the fallback chain breaks and parsing permanently fails with no recovery mechanism
dspy/adapters/base.py:Adapter
The LITELLM_LOCAL_MODEL_COST_MAP environment variable can be safely set to 'True' without conflicting with user's existing environment configuration
If this fails: If user has explicitly set this variable to a different value or path, DSPy silently overrides it, potentially breaking their cost tracking or billing integration
dspy/__init__.py:os.environ.setdefault
Custom type format() methods return either list[dict[str, Any]] matching OpenAI API content structure or plain strings, never mixed types or nested structures
If this fails: If a custom type returns unexpected formats like nested lists or non-dict objects, the adapter formatting pipeline will crash when trying to serialize the content for LM APIs
dspy/adapters/types/base_type.py:Type.format
Field processing order from signature input_fields and output_fields dictionaries is deterministic and consistent across Python versions and executions
If this fails: If dict iteration order changes between runs, field headers appear in different orders in prompts, causing LMs trained on specific formats to generate inconsistent responses
dspy/adapters/chat_adapter.py:FieldInfoWithName
Field values can be successfully parsed from string representations back to their original types without precision loss or format ambiguity
If this fails: Complex types like datetime objects, custom classes, or floating-point numbers with specific precision requirements may lose information during string round-trip, silently corrupting data
dspy/adapters/utils.py:parse_value
The disk cache directory is writable and has sufficient space for storing LM responses across all concurrent DSPy programs
If this fails: If cache directory becomes full or unwritable, all LM calls will fail silently or fall back to uncached mode, drastically increasing API costs and latency without user awareness
dspy/clients/cache.py:DSPY_CACHE
Chat-formatted prompts with field headers and demonstrations will fit within the language model's context window limit
If this fails: Large signatures with many fields or extensive few-shot examples will exceed context limits, causing ContextWindowExceededError but only after expensive prompt formatting has been completed
dspy/adapters/chat_adapter.py:ChatAdapter
All signature field types have meaningful string representations and can be serialized/deserialized through the adapter pipeline
If this fails: Custom types without proper __str__ methods or non-serializable objects will cause cryptic errors during prompt formatting, making debugging difficult
dspy/signatures/signature.py:Signature
Language model API responses arrive in reasonable time and don't timeout during streaming or batch processing
If this fails: Long-running optimizations like GEPA or BootstrapFewShot may fail partway through if individual LM calls timeout, losing all progress and requiring complete restart
dspy/clients/base_lm.py:BaseLM
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Disk-based cache for LM responses using diskcache — prevents duplicate API calls, persists across sessions
Global configuration store for active LM, adapter, and system settings — maintains context stack for nested configurations
Module-level storage for demonstration examples used in prompts — populated by optimizers, used during execution
Per-session message history for multi-turn conversations — maintains context across interactions
Feedback Loops
- Bootstrap few-shot learning (training-loop, reinforcing) — Trigger: BootstrapFewShot.compile() called. Action: Run program on training examples, collect successful traces, add high-scoring examples as demonstrations. Exit: Max examples reached or no improvement.
- GEPA genetic optimization (training-loop, reinforcing) — Trigger: GEPA.compile() called. Action: Mutate instruction text, crossbreed variants, evaluate fitness, select survivors for next generation. Exit: Max generations reached or convergence.
- Adapter format fallback (retry, balancing) — Trigger: ChatAdapter parsing fails. Action: Fall back to JSONAdapter, attempt structured parsing again. Exit: Successful parse or final failure.
- LM retry with backoff (retry, balancing) — Trigger: LM API call fails or rate limited. Action: Wait with exponential backoff, retry API call. Exit: Success or max retries exceeded.
Delays
- Cache lookup (cache-ttl, ~immediate if hit) — Eliminates LM API latency for repeated calls
- LM API generation (async-processing, ~variable by model) — Main bottleneck in program execution
- Optimization compilation (batch-window, ~minutes to hours) — One-time cost to improve program quality
- Response streaming (async-processing, ~partial results available immediately) — Enables progressive output display
Control Points
- LM provider selection (runtime-toggle) — Controls: Which language model API to use (OpenAI, Anthropic, etc.). Default: configurable via dspy.configure()
- Adapter choice (architecture-switch) — Controls: How signatures are formatted for LMs (chat, JSON, two-step). Default: ChatAdapter default
- Max tokens limit (threshold) — Controls: Maximum response length from language models. Default: model-dependent default
- Cache enabled (feature-flag) — Controls: Whether to cache LM responses to disk. Default: enabled by default
- Few-shot count (hyperparameter) — Controls: Number of demonstration examples to include in prompts. Default: optimizer-dependent
- Temperature setting (hyperparameter) — Controls: Randomness in LM generation. Default: model default or user override
Technology Stack
Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface
Type validation and serialization for signatures, custom types, and configuration models
Persistent caching of language model responses to reduce API costs and latency
Retry logic with exponential backoff for resilient LM API calls
Attempts to fix malformed JSON in LM responses before parsing
Pattern matching for parsing structured outputs from LM text responses
Asynchronous execution support for non-blocking LM calls and streaming responses
Hyperparameter optimization for teleprompt algorithms
Serialization of complex Python objects for caching and persistence
Key Components
- Signature (validator) — Validates and structures input/output specifications for LM calls — ensures type consistency, generates field descriptions, provides the contract that modules must fulfill
dspy/signatures/signature.py - Predict (executor) — Core module that executes signatures by formatting them through adapters, calling LMs, and parsing responses — the basic building block for all DSPy programs
dspy/predict/predict.py - ChatAdapter (adapter) — Transforms signatures into chat-formatted prompts using field header delimiters like [[## field_name ##]] and parses structured responses back into predictions
dspy/adapters/chat_adapter.py - BaseLM (gateway) — Abstract interface for language model providers — standardizes generation, embedding, tool calling across different APIs while handling caching and usage tracking
dspy/clients/base_lm.py - BootstrapFewShot (optimizer) — Automatically generates effective few-shot examples by running programs on training data, selecting high-quality input-output traces, and including them in future prompts
dspy/teleprompt/bootstrap.py - GEPA (optimizer) — Genetic algorithm for prompt evolution — mutates instruction text, crossbreeds successful variants, and selects improvements based on evaluation metrics
dspy/teleprompt/gepa.py - ReAct (orchestrator) — Implements the Reasoning-Acting pattern for tool use — alternates between thought, action, and observation steps until reaching a final answer
dspy/predict/react.py - Example (store) — Immutable container for training examples and demonstrations — stores input/output pairs with metadata like reasoning traces and quality scores
dspy/primitives/example.py - Evaluate (processor) — Evaluates program performance on datasets using custom metrics — runs programs on test cases, computes scores, provides optimization feedback
dspy/evaluate/evaluate.py - Type (adapter) — Base class for custom content types like Image, Audio, Code — provides format() method to convert structured data into LM-compatible content representations
dspy/adapters/types/base_type.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare dspy
Related Ml Inference Repositories
Frequently Asked Questions
What is dspy used for?
Programs language models with declarative Python code and auto-optimizes prompts stanfordnlp/dspy is a 10-component ml inference written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 244 files.
How is dspy architected?
dspy is organized into 6 architecture layers: Signatures, Modules, Adapters, Language Models, and 2 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through dspy?
Data moves through 7 stages: Define signature contract → Create module instance → Execute with input data → Format prompt through adapter → Call language model → .... Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions. This pipeline design reflects a complex multi-stage processing system.
What technologies does dspy use?
The core stack includes LiteLLM (Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface), Pydantic (Type validation and serialization for signatures, custom types, and configuration models), DiskCache (Persistent caching of language model responses to reduce API costs and latency), Tenacity (Retry logic with exponential backoff for resilient LM API calls), JSON Repair (Attempts to fix malformed JSON in LM responses before parsing), Regex (Pattern matching for parsing structured outputs from LM text responses), and 3 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does dspy have?
dspy exhibits 4 data pools (DSPY_CACHE, Settings registry), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle training-loop and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does dspy use?
6 design patterns detected: Signature-based programming, Adapter pattern for LM interfaces, Module composition, Meta-learning optimization, Type-driven custom content, and 1 more.
How does dspy compare to alternatives?
CodeSea has side-by-side architecture comparisons of dspy with langchain, llama_index, guidance. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.