How DSPy Works

Prompt engineering is manual tuning — you adjust words until the output looks right. DSPy treats it as an optimization problem: define what you want (signatures), provide examples, and let the framework find the prompts and few-shot examples that actually work.

33,832 stars Python 10 components 7-stage pipeline

What dspy Does

Programs language models with declarative Python code and auto-optimizes prompts

DSPy is a framework for building modular AI systems by writing compositional Python code instead of brittle prompts. It automatically optimizes LM prompts and weights using algorithms like bootstrap few-shot learning and genetic prompt evolution. The core philosophy is 'programming, not prompting' — you define signatures (input/output specs) and modules (like ChainOfThought), then DSPy teaches the LM to deliver high-quality outputs.

Architecture Overview

dspy is organized into 6 layers, with 10 components and 0 connections between them.

Signatures
Declarative specifications of input/output contracts — like function signatures but for LM calls, defining what fields to expect and their types
Modules
Composable building blocks that execute signatures — Predict for simple calls, ChainOfThought for reasoning, ReAct for tool use
Adapters
Transform signatures into LM-specific formats and parse responses back — handles chat formatting, JSON schemas, tool calls
Language Models
Unified interface to various LM providers through LiteLLM — handles calls, caching, usage tracking
Optimizers
Automatic prompt and example optimization algorithms — bootstrap learning, genetic evolution, hyperparameter tuning
Evaluation
Metrics and assessment frameworks for measuring program performance and guiding optimization

How Data Flows Through dspy

Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.

1Define signature contract

User creates a Signature class with typed input/output fields and optional instructions — DSPy validates field types, generates descriptions, and creates the execution contract that modules must fulfill

2Create module instance

User instantiates a module (Predict, ChainOfThought, ReAct) with the signature — module validates compatibility and prepares for execution

3Execute with input data

Module.__call__ receives input values, validates them against signature fields, and triggers the prediction pipeline

4Format prompt through adapter

ChatAdapter.format_prompt combines signature, input values, few-shot examples, and conversation history into LM messages — uses field headers like [[## question ##]] to structure content and instructs LM on output format

5Call language model

BaseLM.generate sends formatted messages to the LM API (via LiteLLM), handles retries and caching, returns raw text response with usage metadata

6Parse structured response

Adapter.parse_response extracts field values from LM text using header patterns or JSON parsing, validates types against signature, handles errors with fallbacks

7Return prediction result

Module returns Prediction object containing parsed field values, completion metadata, and conversation history — accessible via dot notation like result.answer

System Dynamics

Beyond the pipeline, dspy has runtime behaviors that shape how it responds to load, failures, and configuration changes.

Data Pools

Pool

DSPY_CACHE

Disk-based cache for LM responses using diskcache — prevents duplicate API calls, persists across sessions

Type: cache

Pool

Settings registry

Global configuration store for active LM, adapter, and system settings — maintains context stack for nested configurations

Type: registry

Pool

Few-shot example store

Module-level storage for demonstration examples used in prompts — populated by optimizers, used during execution

Type: buffer

Pool

Conversation history

Per-session message history for multi-turn conversations — maintains context across interactions

Type: state-store

Feedback Loops

Loop

Bootstrap few-shot learning

Trigger: BootstrapFewShot.compile() called → Run program on training examples, collect successful traces, add high-scoring examples as demonstrations (exits when: Max examples reached or no improvement)

Type: training-loop

Loop

GEPA genetic optimization

Trigger: GEPA.compile() called → Mutate instruction text, crossbreed variants, evaluate fitness, select survivors for next generation (exits when: Max generations reached or convergence)

Type: training-loop

Loop

Adapter format fallback

Trigger: ChatAdapter parsing fails → Fall back to JSONAdapter, attempt structured parsing again (exits when: Successful parse or final failure)

Type: retry

Loop

LM retry with backoff

Trigger: LM API call fails or rate limited → Wait with exponential backoff, retry API call (exits when: Success or max retries exceeded)

Type: retry

Control Points

Control

LM provider selection

Control

Adapter choice

Control

Max tokens limit

Control

Cache enabled

Control

Few-shot count

Control

Temperature setting

Delays

Delay

Cache lookup

Duration: immediate if hit

Delay

LM API generation

Duration: variable by model

Delay

Optimization compilation

Duration: minutes to hours

Delay

Response streaming

Duration: partial results available immediately

Technology Choices

dspy is built with 9 key technologies. Each serves a specific role in the system.

LiteLLM
Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface
Pydantic
Type validation and serialization for signatures, custom types, and configuration models
DiskCache
Persistent caching of language model responses to reduce API costs and latency
Tenacity
Retry logic with exponential backoff for resilient LM API calls
JSON Repair
Attempts to fix malformed JSON in LM responses before parsing
Regex
Pattern matching for parsing structured outputs from LM text responses
Asyncio
Asynchronous execution support for non-blocking LM calls and streaming responses
Optuna
Hyperparameter optimization for teleprompt algorithms
CloudPickle
Serialization of complex Python objects for caching and persistence

Key Components

Who Should Read This

ML researchers and engineers who want to move beyond manual prompt engineering, or teams building complex LLM pipelines.

This analysis was generated by CodeSea from the stanfordnlp/dspy source code. For the full interactive visualization — including pipeline graph, architecture diagram, and system behavior map — see the complete analysis.

Explore Further

Frequently Asked Questions

What is dspy?

Programs language models with declarative Python code and auto-optimizes prompts

How does dspy's pipeline work?

dspy processes data through 7 stages: Define signature contract, Create module instance, Execute with input data, Format prompt through adapter, Call language model, and more. Programs flow from signature definition through module execution to LM calls and back. Users define signatures specifying inputs/outputs, create modules like Predict or ChainOfThought that implement these signatures, then execute them with actual data. The adapter layer transforms signatures into LM-specific prompts (with field delimiters and formatting), sends them to language models via the client layer, then parses responses back into structured predictions. Optimizers like BootstrapFewShot and GEPA improve this pipeline by generating better examples and instructions.

What tech stack does dspy use?

dspy is built with LiteLLM (Unified API client for multiple language model providers — handles OpenAI, Anthropic, local models with consistent interface), Pydantic (Type validation and serialization for signatures, custom types, and configuration models), DiskCache (Persistent caching of language model responses to reduce API costs and latency), Tenacity (Retry logic with exponential backoff for resilient LM API calls), JSON Repair (Attempts to fix malformed JSON in LM responses before parsing), and 4 more technologies.

How does dspy handle errors and scaling?

dspy uses 4 feedback loops, 6 control points, 4 data pools to manage its runtime behavior. These mechanisms handle error recovery, load distribution, and configuration changes.

How does dspy compare to langchain?

CodeSea has detailed side-by-side architecture comparisons of dspy with langchain, llama_index, guidance. These cover tech stack differences, pipeline design, and system behavior.

Visualize dspy yourself

See the interactive pipeline graph, architecture diagram, and system behavior map.

See Full Analysis