How DSPy Works
Prompt engineering is manual tuning — you adjust words until the output looks right. DSPy treats it as an optimization problem: define what you want (signatures), provide examples, and let the framework find the prompts and few-shot examples that actually work.
What dspy Does
Programs language models with declarative Python code and auto-optimizes prompts
DSPy is a framework for building modular AI systems by writing compositional Python code instead of brittle prompts. It automatically optimizes LM prompts and weights using algorithms like bootstrap few-shot learning and genetic prompt evolution. The core philosophy is 'programming, not prompting' — you define signatures (input/output specs) and modules (like ChainOfThought), then DSPy teaches the LM to deliver high-quality outputs.
Architecture Overview
dspy is organized into 6 layers, with 10 components and 0 connections between them.
How Data Flows Through dspy
Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.
1Define signature contract
User creates a Signature class with typed input/output fields and optional instructions — DSPy validates field types, generates descriptions, and creates the execution contract that modules must fulfill
2Create module instance
User instantiates a module (Predict, ChainOfThought, ReAct) with the signature — module validates compatibility and prepares for execution
3Execute with input data
Module.__call__ receives input values, validates them against signature fields, and triggers the prediction pipeline
4Format prompt through adapter
ChatAdapter.format_prompt combines signature, input values, few-shot examples, and conversation history into LM messages — uses field headers like [[## question ##]] to structure content and instructs LM on output format
5Call language model
BaseLM.generate sends formatted messages to the LM API (via LiteLLM), handles retries and caching, returns raw text response with usage metadata
6Parse structured response
Adapter.parse_response extracts field values from LM text using header patterns or JSON parsing, validates types against signature, handles errors with fallbacks
7Return prediction result
Module returns Prediction object containing parsed field values, completion metadata, and conversation history — accessible via dot notation like result.answer
System Dynamics
Beyond the pipeline, dspy has runtime behaviors that shape how it responds to load, failures, and configuration changes.
Data Pools
DSPY_CACHE
Disk-based cache for LM responses using diskcache — prevents duplicate API calls, persists across sessions
Type: cache
Settings registry
Global configuration store for active LM, adapter, and system settings — maintains context stack for nested configurations
Type: registry
Few-shot example store
Module-level storage for demonstration examples used in prompts — populated by optimizers, used during execution
Type: buffer
Conversation history
Per-session message history for multi-turn conversations — maintains context across interactions
Type: state-store
Feedback Loops
Bootstrap few-shot learning
Trigger: BootstrapFewShot.compile() called → Run program on training examples, collect successful traces, add high-scoring examples as demonstrations (exits when: Max examples reached or no improvement)
Type: training-loop
GEPA genetic optimization
Trigger: GEPA.compile() called → Mutate instruction text, crossbreed variants, evaluate fitness, select survivors for next generation (exits when: Max generations reached or convergence)
Type: training-loop
Adapter format fallback
Trigger: ChatAdapter parsing fails → Fall back to JSONAdapter, attempt structured parsing again (exits when: Successful parse or final failure)
Type: retry
LM retry with backoff
Trigger: LM API call fails or rate limited → Wait with exponential backoff, retry API call (exits when: Success or max retries exceeded)
Type: retry
Control Points
LM provider selection
Adapter choice
Max tokens limit
Cache enabled
Few-shot count
Temperature setting
Delays
Cache lookup
Duration: immediate if hit
LM API generation
Duration: variable by model
Optimization compilation
Duration: minutes to hours
Response streaming
Duration: partial results available immediately
Technology Choices
dspy is built with 9 key technologies. Each serves a specific role in the system.
Key Components
- Signature (validator): Validates and structures input/output specifications for LM calls — ensures type consistency, generates field descriptions, provides the contract that modules must fulfill
- Predict (executor): Core module that executes signatures by formatting them through adapters, calling LMs, and parsing responses — the basic building block for all DSPy programs
- ChatAdapter (adapter): Transforms signatures into chat-formatted prompts using field header delimiters like [[## field_name ##]] and parses structured responses back into predictions
- BaseLM (gateway): Abstract interface for language model providers — standardizes generation, embedding, tool calling across different APIs while handling caching and usage tracking
- BootstrapFewShot (optimizer): Automatically generates effective few-shot examples by running programs on training data, selecting high-quality input-output traces, and including them in future prompts
- GEPA (optimizer): Genetic algorithm for prompt evolution — mutates instruction text, crossbreeds successful variants, and selects improvements based on evaluation metrics
- ReAct (orchestrator): Implements the Reasoning-Acting pattern for tool use — alternates between thought, action, and observation steps until reaching a final answer
- Example (store): Immutable container for training examples and demonstrations — stores input/output pairs with metadata like reasoning traces and quality scores
- Evaluate (processor): Evaluates program performance on datasets using custom metrics — runs programs on test cases, computes scores, provides optimization feedback
- Type (adapter): Base class for custom content types like Image, Audio, Code — provides format() method to convert structured data into LM-compatible content representations
Who Should Read This
ML researchers and engineers who want to move beyond manual prompt engineering, or teams building complex LLM pipelines.
This analysis was generated by CodeSea from the stanfordnlp/dspy source code. For the full interactive visualization — including pipeline graph, architecture diagram, and system behavior map — see the complete analysis.
Explore Further
Full Analysis
Interactive architecture map for dspy
dspy vs langchain
Side-by-side architecture comparison
dspy vs llama_index
Side-by-side architecture comparison
dspy vs guidance
Side-by-side architecture comparison
How LangChain Works
ML Inference & Agents
How LlamaIndex Works
ML Inference & Agents
How vLLM Works
ML Inference & Agents
Frequently Asked Questions
What is dspy?
Programs language models with declarative Python code and auto-optimizes prompts
How does dspy's pipeline work?
dspy processes data through 7 stages: Define signature contract, Create module instance, Execute with input data, Format prompt through adapter, Call language model, and more. Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.
What tech stack does dspy use?
dspy is built with LiteLLM (Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface), Pydantic (Type validation and serialization for signatures, custom types, and configuration models), DiskCache (Persistent caching of language model responses to reduce API costs and latency), Tenacity (Retry logic with exponential backoff for resilient LM API calls), JSON Repair (Attempts to fix malformed JSON in LM responses before parsing), and 4 more technologies.
How does dspy handle errors and scaling?
dspy uses 4 feedback loops, 6 control points, 4 data pools to manage its runtime behavior. These mechanisms handle error recovery, load distribution, and configuration changes.
How does dspy compare to langchain?
CodeSea has detailed side-by-side architecture comparisons of dspy with langchain, llama_index, guidance. These cover tech stack differences, pipeline design, and system behavior.