Hidden Assumptions in dspy
12 assumptions this code never checks · 3 critical · spanning Domain, Contract, Environment, Shape, Ordering, Resource, Scale, Temporal
Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at stanfordnlp/dspy and picked out the few most likely to cause trouble. The full list is just below.
Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".
If the LM includes these patterns in its response text, the parser will incorrectly split the content at those points, breaking field extraction and potentially losing data
If JSONAdapter is misconfigured, unavailable, or fails on the same input, the fallback chain breaks and parsing permanently fails with no recovery mechanism
If a custom type returns unexpected formats like nested lists or non-dict objects, the adapter formatting pipeline will crash when trying to serialize the content for LM APIs
Show everything (9 more)
The LITELLM_LOCAL_MODEL_COST_MAP environment variable can be safely set to 'True' without conflicting with user's existing environment configuration
If this fails: If user has explicitly set this variable to a different value or path, DSPy silently overrides it, potentially breaking their cost tracking or billing integration
dspy/__init__.py:os.environ.setdefault
Field processing order from signature input_fields and output_fields dictionaries is deterministic and consistent across Python versions and executions
If this fails: If dict iteration order changes between runs, field headers appear in different orders in prompts, causing LMs trained on specific formats to generate inconsistent responses
dspy/adapters/chat_adapter.py:FieldInfoWithName
Field values can be successfully parsed from string representations back to their original types without precision loss or format ambiguity
If this fails: Complex types like datetime objects, custom classes, or floating-point numbers with specific precision requirements may lose information during string round-trip, silently corrupting data
dspy/adapters/utils.py:parse_value
The disk cache directory is writable and has sufficient space for storing LM responses across all concurrent DSPy programs
If this fails: If cache directory becomes full or unwritable, all LM calls will fail silently or fall back to uncached mode, drastically increasing API costs and latency without user awareness
dspy/clients/cache.py:DSPY_CACHE
Chat-formatted prompts with field headers and demonstrations will fit within the language model's context window limit
If this fails: Large signatures with many fields or extensive few-shot examples will exceed context limits, causing ContextWindowExceededError but only after expensive prompt formatting has been completed
dspy/adapters/chat_adapter.py:ChatAdapter
All signature field types have meaningful string representations and can be serialized/deserialized through the adapter pipeline
If this fails: Custom types without proper __str__ methods or non-serializable objects will cause cryptic errors during prompt formatting, making debugging difficult
dspy/signatures/signature.py:Signature
Language model API responses arrive in reasonable time and don't timeout during streaming or batch processing
If this fails: Long-running optimizations like GEPA or BootstrapFewShot may fail partway through if individual LM calls timeout, losing all progress and requiring complete restart
dspy/clients/base_lm.py:BaseLM
The json_repair library can successfully fix malformed JSON from language models without changing semantic meaning
If this fails: If json_repair 'fixes' JSON by altering actual content (e.g., removing quotes from intended string values), parsed results will be semantically incorrect while appearing syntactically valid
dspy/adapters/types/base_type.py:json_repair
Global settings state is thread-safe and won't cause race conditions when multiple DSPy programs run concurrently
If this fails: Concurrent programs sharing the same process may experience configuration bleeding where one program's LM settings affect another's execution, leading to inconsistent results
dspy/dsp/utils/settings.py:settings
See the full structural analysis of dspy: the pipeline, data models, and system behavior that put these assumptions in context.
Full analysis of stanfordnlp/dspy →Compare dspy
Frequently Asked Questions
What does dspy assume that could break in production?
The one most likely to cause trouble: Language models will treat field headers like [[ ## field_name ## ]] as meaningful delimiters and not generate them as part of actual content If this fails, If the LM includes these patterns in its response text, the parser will incorrectly split the content at those points, breaking field extraction and potentially losing data
How many hidden assumptions does dspy have?
CodeSea found 12 assumptions dspy relies on but never validates, 3 of them critical, spanning Domain, Contract, Environment, Shape, Ordering, Resource, Scale, Temporal. Most are routine — the analysis flags the two or three most likely to actually bite.
What is a hidden assumption?
Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.