stanfordnlp/dspy

DSPy: The framework for programming—not prompting—language models

33,832 stars Python 10 components

Programs language models with declarative Python code and auto-optimizes prompts

Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 10-component ml inference. 244 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Define signature contract — User creates a Signature class with typed input/output fields and optional instructions — DSPy validates field types, generates descriptions, and creates the execution contract that modules must fulfill
Create module instance — User instantiates a module (Predict, ChainOfThought, ReAct) with the signature — module validates compatibility and prepares for execution [Signature]
Execute with input data — Module.__call__ receives input values, validates them against signature fields, and triggers the prediction pipeline [input values]
Format prompt through adapter — ChatAdapter.format_prompt combines signature, input values, few-shot examples, and conversation history into LM messages — uses field headers like [[## question ##]] to structure content and instructs LM on output format [Signature → LMMessage]
Call language model — BaseLM.generate sends formatted messages to the LM API (via LiteLLM), handles retries and caching, returns raw text response with usage metadata [LMMessage → LM response text]
Parse structured response — Adapter.parse_response extracts field values from LM text using header patterns or JSON parsing, validates types against signature, handles errors with fallbacks [LM response text → Prediction]
Return prediction result — Module returns Prediction object containing parsed field values, completion metadata, and conversation history — accessible via dot notation like result.answer [Prediction]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Signature dspy/signatures/signature.py
Pydantic model with input_fields: dict[str, FieldInfo], output_fields: dict[str, FieldInfo], instructions: str, and field type annotations
Defined by user, validated during class creation, transformed by adapters into LM prompts, used to parse responses

Prediction dspy/primitives/prediction.py
Dict-like object containing field_name: value pairs matching signature outputs, plus metadata like completions and history
Created by adapters from LM responses, returned by modules, used in optimization and evaluation

Example dspy/primitives/example.py
Dict-like object with input and output field values, plus metadata like reasoning traces
Created from training data or generated during optimization, used as demonstrations in prompts

LMMessage dspy/clients/base_lm.py
Dict with role: 'user'|'assistant'|'system', content: str|list[dict], and optional tool_calls, citations
Generated by adapters, sent to LM APIs, stored in conversation history for multi-turn interactions

Type dspy/adapters/types/base_type.py
Pydantic BaseModel subclass with format() -> list[dict[str, Any]]|str method for custom content types
Instantiated in signature fields, formatted by adapters into LM-compatible content structures

BootstrapState dspy/teleprompt/bootstrap.py
Object with predictor: Module, examples: list[Example], optim_config: dict tracking optimization progress
Created at optimization start, updated during trace generation, used to track learned examples and predictor state

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Domain unguarded

Language models will treat field headers like [[ ## field_name ## ]] as meaningful delimiters and not generate them as part of actual content

If this fails: If the LM includes these patterns in its response text, the parser will incorrectly split the content at those points, breaking field extraction and potentially losing data

dspy/adapters/chat_adapter.py:field_header_pattern

critical Contract weakly guarded

All adapters will successfully fallback to JSONAdapter when parsing fails, and JSONAdapter will always be available and functional

If this fails: If JSONAdapter is misconfigured, unavailable, or fails on the same input, the fallback chain breaks and parsing permanently fails with no recovery mechanism

dspy/adapters/base.py:Adapter

warning Environment weakly guarded

The LITELLM_LOCAL_MODEL_COST_MAP environment variable can be safely set to 'True' without conflicting with user's existing environment configuration

If this fails: If user has explicitly set this variable to a different value or path, DSPy silently overrides it, potentially breaking their cost tracking or billing integration

dspy/__init__.py:os.environ.setdefault

critical Shape unguarded

Custom type format() methods return either list[dict[str, Any]] matching OpenAI API content structure or plain strings, never mixed types or nested structures

If this fails: If a custom type returns unexpected formats like nested lists or non-dict objects, the adapter formatting pipeline will crash when trying to serialize the content for LM APIs

dspy/adapters/types/base_type.py:Type.format

warning Ordering unguarded

Field processing order from signature input_fields and output_fields dictionaries is deterministic and consistent across Python versions and executions

If this fails: If dict iteration order changes between runs, field headers appear in different orders in prompts, causing LMs trained on specific formats to generate inconsistent responses

dspy/adapters/chat_adapter.py:FieldInfoWithName

warning Domain weakly guarded

Field values can be successfully parsed from string representations back to their original types without precision loss or format ambiguity

If this fails: Complex types like datetime objects, custom classes, or floating-point numbers with specific precision requirements may lose information during string round-trip, silently corrupting data

dspy/adapters/utils.py:parse_value

warning Resource unguarded

The disk cache directory is writable and has sufficient space for storing LM responses across all concurrent DSPy programs

If this fails: If cache directory becomes full or unwritable, all LM calls will fail silently or fall back to uncached mode, drastically increasing API costs and latency without user awareness

dspy/clients/cache.py:DSPY_CACHE

warning Scale weakly guarded

Chat-formatted prompts with field headers and demonstrations will fit within the language model's context window limit

If this fails: Large signatures with many fields or extensive few-shot examples will exceed context limits, causing ContextWindowExceededError but only after expensive prompt formatting has been completed

dspy/adapters/chat_adapter.py:ChatAdapter

info Contract weakly guarded

All signature field types have meaningful string representations and can be serialized/deserialized through the adapter pipeline

If this fails: Custom types without proper __str__ methods or non-serializable objects will cause cryptic errors during prompt formatting, making debugging difficult

dspy/signatures/signature.py:Signature

info Temporal unguarded

Language model API responses arrive in reasonable time and don't timeout during streaming or batch processing

If this fails: Long-running optimizations like GEPA or BootstrapFewShot may fail partway through if individual LM calls timeout, losing all progress and requiring complete restart

dspy/clients/base_lm.py:BaseLM

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

DSPY_CACHE (cache)
Disk-based cache for LM responses using diskcache — prevents duplicate API calls, persists across sessions

Settings registry (registry)
Global configuration store for active LM, adapter, and system settings — maintains context stack for nested configurations

Few-shot example store (buffer)
Module-level storage for demonstration examples used in prompts — populated by optimizers, used during execution

Conversation history (state-store)
Per-session message history for multi-turn conversations — maintains context across interactions

Feedback Loops

Bootstrap few-shot learning (training-loop, reinforcing) — Trigger: BootstrapFewShot.compile() called. Action: Run program on training examples, collect successful traces, add high-scoring examples as demonstrations. Exit: Max examples reached or no improvement.
GEPA genetic optimization (training-loop, reinforcing) — Trigger: GEPA.compile() called. Action: Mutate instruction text, crossbreed variants, evaluate fitness, select survivors for next generation. Exit: Max generations reached or convergence.
Adapter format fallback (retry, balancing) — Trigger: ChatAdapter parsing fails. Action: Fall back to JSONAdapter, attempt structured parsing again. Exit: Successful parse or final failure.
LM retry with backoff (retry, balancing) — Trigger: LM API call fails or rate limited. Action: Wait with exponential backoff, retry API call. Exit: Success or max retries exceeded.

Delays

Cache lookup (cache-ttl, ~immediate if hit) — Eliminates LM API latency for repeated calls
LM API generation (async-processing, ~variable by model) — Main bottleneck in program execution
Optimization compilation (batch-window, ~minutes to hours) — One-time cost to improve program quality
Response streaming (async-processing, ~partial results available immediately) — Enables progressive output display

Control Points

LM provider selection (runtime-toggle) — Controls: Which language model API to use (OpenAI, Anthropic, etc.). Default: configurable via dspy.configure()
Adapter choice (architecture-switch) — Controls: How signatures are formatted for LMs (chat, JSON, two-step). Default: ChatAdapter default
Max tokens limit (threshold) — Controls: Maximum response length from language models. Default: model-dependent default
Cache enabled (feature-flag) — Controls: Whether to cache LM responses to disk. Default: enabled by default
Few-shot count (hyperparameter) — Controls: Number of demonstration examples to include in prompts. Default: optimizer-dependent
Temperature setting (hyperparameter) — Controls: Randomness in LM generation. Default: model default or user override

Technology Stack

LiteLLM (runtime)
Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface

Pydantic (framework)
Type validation and serialization for signatures, custom types, and configuration models

DiskCache (database)
Persistent caching of language model responses to reduce API costs and latency

Tenacity (library)
Retry logic with exponential backoff for resilient LM API calls

JSON Repair (library)
Attempts to fix malformed JSON in LM responses before parsing

Regex (library)
Pattern matching for parsing structured outputs from LM text responses

Asyncio (runtime)
Asynchronous execution support for non-blocking LM calls and streaming responses

Optuna (library)
Hyperparameter optimization for teleprompt algorithms

CloudPickle (serialization)
Serialization of complex Python objects for caching and persistence

Key Components

Signature (validator) — Validates and structures input/output specifications for LM calls — ensures type consistency, generates field descriptions, provides the contract that modules must fulfill dspy/signatures/signature.py
Predict (executor) — Core module that executes signatures by formatting them through adapters, calling LMs, and parsing responses — the basic building block for all DSPy programs dspy/predict/predict.py
ChatAdapter (adapter) — Transforms signatures into chat-formatted prompts using field header delimiters like [[## field_name ##]] and parses structured responses back into predictions dspy/adapters/chat_adapter.py
BaseLM (gateway) — Abstract interface for language model providers — standardizes generation, embedding, tool calling across different APIs while handling caching and usage tracking dspy/clients/base_lm.py
BootstrapFewShot (optimizer) — Automatically generates effective few-shot examples by running programs on training data, selecting high-quality input-output traces, and including them in future prompts dspy/teleprompt/bootstrap.py
GEPA (optimizer) — Genetic algorithm for prompt evolution — mutates instruction text, crossbreeds successful variants, and selects improvements based on evaluation metrics dspy/teleprompt/gepa.py
ReAct (orchestrator) — Implements the Reasoning-Acting pattern for tool use — alternates between thought, action, and observation steps until reaching a final answer dspy/predict/react.py
Example (store) — Immutable container for training examples and demonstrations — stores input/output pairs with metadata like reasoning traces and quality scores dspy/primitives/example.py
Evaluate (processor) — Evaluates program performance on datasets using custom metrics — runs programs on test cases, computes scores, provides optimization feedback dspy/evaluate/evaluate.py
Type (adapter) — Base class for custom content types like Image, Audio, Code — provides format() method to convert structured data into LM-compatible content representations dspy/adapters/types/base_type.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare dspy

Related Ml Inference Repositories

Frequently Asked Questions

What is dspy used for?

Programs language models with declarative Python code and auto-optimizes prompts stanfordnlp/dspy is a 10-component ml inference written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 244 files.

How is dspy architected?

dspy is organized into 6 architecture layers: Signatures, Modules, Adapters, Language Models, and 2 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through dspy?

Data moves through 7 stages: Define signature contract → Create module instance → Execute with input data → Format prompt through adapter → Call language model → .... Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions. This pipeline design reflects a complex multi-stage processing system.

What technologies does dspy use?

The core stack includes LiteLLM (Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface), Pydantic (Type validation and serialization for signatures, custom types, and configuration models), DiskCache (Persistent caching of language model responses to reduce API costs and latency), Tenacity (Retry logic with exponential backoff for resilient LM API calls), JSON Repair (Attempts to fix malformed JSON in LM responses before parsing), Regex (Pattern matching for parsing structured outputs from LM text responses), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does dspy have?

dspy exhibits 4 data pools (DSPY_CACHE, Settings registry), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle training-loop and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does dspy use?

6 design patterns detected: Signature-based programming, Adapter pattern for LM interfaces, Module composition, Meta-learning optimization, Type-driven custom content, and 1 more.

How does dspy compare to alternatives?

CodeSea has side-by-side architecture comparisons of dspy with langchain, llama_index, guidance. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.

stanfordnlp/dspy

How Data Flows Through the System

Data Models

Hidden Assumptions

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Compare dspy

dspy vs Langchain

dspy vs Llama_index

dspy vs Guidance

Related Ml Inference Repositories

significant-gravitas/autogpt

ollama/ollama

langflow-ai/langflow

langchain-ai/langchain

ggml-org/llama.cpp

instructkr/claw-code

Frequently Asked Questions