stanfordnlp/dspy

DSPy: The framework for programming—not prompting—language models

33,832 stars Python 10 components

Programs language models with declarative Python code and auto-optimizes prompts

Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 10-component ml inference. 244 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.

  1. Define signature contract — User creates a Signature class with typed input/output fields and optional instructions — DSPy validates field types, generates descriptions, and creates the execution contract that modules must fulfill
  2. Create module instance — User instantiates a module (Predict, ChainOfThought, ReAct) with the signature — module validates compatibility and prepares for execution [Signature]
  3. Execute with input data — Module.__call__ receives input values, validates them against signature fields, and triggers the prediction pipeline [input values]
  4. Format prompt through adapter — ChatAdapter.format_prompt combines signature, input values, few-shot examples, and conversation history into LM messages — uses field headers like [[## question ##]] to structure content and instructs LM on output format [Signature → LMMessage]
  5. Call language model — BaseLM.generate sends formatted messages to the LM API (via LiteLLM), handles retries and caching, returns raw text response with usage metadata [LMMessage → LM response text]
  6. Parse structured response — Adapter.parse_response extracts field values from LM text using header patterns or JSON parsing, validates types against signature, handles errors with fallbacks [LM response text → Prediction]
  7. Return prediction result — Module returns Prediction object containing parsed field values, completion metadata, and conversation history — accessible via dot notation like result.answer [Prediction]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Signature dspy/signatures/signature.py
Pydantic model with input_fields: dict[str, FieldInfo], output_fields: dict[str, FieldInfo], instructions: str, and field type annotations
Defined by user, validated during class creation, transformed by adapters into LM prompts, used to parse responses
Prediction dspy/primitives/prediction.py
Dict-like object containing field_name: value pairs matching signature outputs, plus metadata like completions and history
Created by adapters from LM responses, returned by modules, used in optimization and evaluation
Example dspy/primitives/example.py
Dict-like object with input and output field values, plus metadata like reasoning traces
Created from training data or generated during optimization, used as demonstrations in prompts
LMMessage dspy/clients/base_lm.py
Dict with role: 'user'|'assistant'|'system', content: str|list[dict], and optional tool_calls, citations
Generated by adapters, sent to LM APIs, stored in conversation history for multi-turn interactions
Type dspy/adapters/types/base_type.py
Pydantic BaseModel subclass with format() -> list[dict[str, Any]]|str method for custom content types
Instantiated in signature fields, formatted by adapters into LM-compatible content structures
BootstrapState dspy/teleprompt/bootstrap.py
Object with predictor: Module, examples: list[Example], optim_config: dict tracking optimization progress
Created at optimization start, updated during trace generation, used to track learned examples and predictor state

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Domain unguarded

Language models will treat field headers like [[ ## field_name ## ]] as meaningful delimiters and not generate them as part of actual content

If this fails: If the LM includes these patterns in its response text, the parser will incorrectly split the content at those points, breaking field extraction and potentially losing data

dspy/adapters/chat_adapter.py:field_header_pattern
critical Contract weakly guarded

All adapters will successfully fallback to JSONAdapter when parsing fails, and JSONAdapter will always be available and functional

If this fails: If JSONAdapter is misconfigured, unavailable, or fails on the same input, the fallback chain breaks and parsing permanently fails with no recovery mechanism

dspy/adapters/base.py:Adapter
warning Environment weakly guarded

The LITELLM_LOCAL_MODEL_COST_MAP environment variable can be safely set to 'True' without conflicting with user's existing environment configuration

If this fails: If user has explicitly set this variable to a different value or path, DSPy silently overrides it, potentially breaking their cost tracking or billing integration

dspy/__init__.py:os.environ.setdefault
critical Shape unguarded

Custom type format() methods return either list[dict[str, Any]] matching OpenAI API content structure or plain strings, never mixed types or nested structures

If this fails: If a custom type returns unexpected formats like nested lists or non-dict objects, the adapter formatting pipeline will crash when trying to serialize the content for LM APIs

dspy/adapters/types/base_type.py:Type.format
warning Ordering unguarded

Field processing order from signature input_fields and output_fields dictionaries is deterministic and consistent across Python versions and executions

If this fails: If dict iteration order changes between runs, field headers appear in different orders in prompts, causing LMs trained on specific formats to generate inconsistent responses

dspy/adapters/chat_adapter.py:FieldInfoWithName
warning Domain weakly guarded

Field values can be successfully parsed from string representations back to their original types without precision loss or format ambiguity

If this fails: Complex types like datetime objects, custom classes, or floating-point numbers with specific precision requirements may lose information during string round-trip, silently corrupting data

dspy/adapters/utils.py:parse_value
warning Resource unguarded

The disk cache directory is writable and has sufficient space for storing LM responses across all concurrent DSPy programs

If this fails: If cache directory becomes full or unwritable, all LM calls will fail silently or fall back to uncached mode, drastically increasing API costs and latency without user awareness

dspy/clients/cache.py:DSPY_CACHE
warning Scale weakly guarded

Chat-formatted prompts with field headers and demonstrations will fit within the language model's context window limit

If this fails: Large signatures with many fields or extensive few-shot examples will exceed context limits, causing ContextWindowExceededError but only after expensive prompt formatting has been completed

dspy/adapters/chat_adapter.py:ChatAdapter
info Contract weakly guarded

All signature field types have meaningful string representations and can be serialized/deserialized through the adapter pipeline

If this fails: Custom types without proper __str__ methods or non-serializable objects will cause cryptic errors during prompt formatting, making debugging difficult

dspy/signatures/signature.py:Signature
info Temporal unguarded

Language model API responses arrive in reasonable time and don't timeout during streaming or batch processing

If this fails: Long-running optimizations like GEPA or BootstrapFewShot may fail partway through if individual LM calls timeout, losing all progress and requiring complete restart

dspy/clients/base_lm.py:BaseLM

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

DSPY_CACHE (cache)
Disk-based cache for LM responses using diskcache — prevents duplicate API calls, persists across sessions
Settings registry (registry)
Global configuration store for active LM, adapter, and system settings — maintains context stack for nested configurations
Few-shot example store (buffer)
Module-level storage for demonstration examples used in prompts — populated by optimizers, used during execution
Conversation history (state-store)
Per-session message history for multi-turn conversations — maintains context across interactions

Feedback Loops

Delays

Control Points

Technology Stack

LiteLLM (runtime)
Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface
Pydantic (framework)
Type validation and serialization for signatures, custom types, and configuration models
DiskCache (database)
Persistent caching of language model responses to reduce API costs and latency
Tenacity (library)
Retry logic with exponential backoff for resilient LM API calls
JSON Repair (library)
Attempts to fix malformed JSON in LM responses before parsing
Regex (library)
Pattern matching for parsing structured outputs from LM text responses
Asyncio (runtime)
Asynchronous execution support for non-blocking LM calls and streaming responses
Optuna (library)
Hyperparameter optimization for teleprompt algorithms
CloudPickle (serialization)
Serialization of complex Python objects for caching and persistence

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare dspy

Related Ml Inference Repositories

Frequently Asked Questions

What is dspy used for?

Programs language models with declarative Python code and auto-optimizes prompts stanfordnlp/dspy is a 10-component ml inference written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 244 files.

How is dspy architected?

dspy is organized into 6 architecture layers: Signatures, Modules, Adapters, Language Models, and 2 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through dspy?

Data moves through 7 stages: Define signature contract → Create module instance → Execute with input data → Format prompt through adapter → Call language model → .... Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions. This pipeline design reflects a complex multi-stage processing system.

What technologies does dspy use?

The core stack includes LiteLLM (Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface), Pydantic (Type validation and serialization for signatures, custom types, and configuration models), DiskCache (Persistent caching of language model responses to reduce API costs and latency), Tenacity (Retry logic with exponential backoff for resilient LM API calls), JSON Repair (Attempts to fix malformed JSON in LM responses before parsing), Regex (Pattern matching for parsing structured outputs from LM text responses), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does dspy have?

dspy exhibits 4 data pools (DSPY_CACHE, Settings registry), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle training-loop and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does dspy use?

6 design patterns detected: Signature-based programming, Adapter pattern for LM interfaces, Module composition, Meta-learning optimization, Type-driven custom content, and 1 more.

How does dspy compare to alternatives?

CodeSea has side-by-side architecture comparisons of dspy with langchain, llama_index, guidance. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by .