guidance-ai/guidance
A guidance language for controlling large language models.
Controls language model generation with grammars and constraints
User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information.
Under the hood, the system uses 2 feedback loops, 3 data pools, 5 control points to manage its runtime behavior.
A 9-component library. 170 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information.
- Parse grammar definition — The @guidance decorator intercepts Python function calls, strips indentation, and converts the function body containing library calls (gen(), select(), regex()) into a tree of GrammarNode objects representing the generation constraints [Function → GrammarNode]
- Compile to parser — TokenParser.__init__ takes the GrammarNode tree and compiles it into an llguidance.LLInterpreter instance along with the model's tokenizer, creating the constraint enforcement engine [GrammarNode → LLInterpreter]
- Generate tokens — The model backend calls TokenParser.process_token for each potential token, which checks if the token violates constraints and returns a GenData object with a mask indicating which tokens are valid for the next step [Token candidates → GenData]
- Validate constraints — The parser processes each generated token through LLInterpreter.advance, which may trigger backtracking if constraints are violated, returning an LLInterpreterResponse with new bytes, probabilities, and any required backtrack amount [Generated tokens → LLInterpreterResponse]
- Update model state — The Model class processes the LLInterpreterResponse to update its internal state, handle any backtracking by removing tokens, and capture named outputs into the model's variable dictionary [LLInterpreterResponse → Updated model state]
- Display in notebook — TokenOutput objects are serialized and sent to the StitchView Jupyter widget, which renders them in a sandboxed iframe showing generation progress, token probabilities, and captured variables in real-time [TokenOutput → Visualization]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
guidance/_ast.pyAbstract base class with subclasses like LiteralNode (fixed text), RegexNode (pattern matching), RuleNode (generation with constraints), SelectNode (alternatives), RepeatNode (loops), and SpecialToken (model-specific tokens)
Created when parsing @guidance decorated functions, compiled into constraint rules, then consumed during token generation to enforce patterns
guidance/_schema.pyPydantic model with new_bytes: bytes, is_generated: bool, new_bytes_prob: float, capture_groups: dict, backtrack: NonNegativeInt, latency_ms: NonNegativeInt
Generated by the parser for each token, contains the new bytes to add, whether they came from generation or forced input, probability scores, and any backtracking needed
client/graphpaper-inline/src/stitch.tsTypeScript interface extending TextOutput with token: Token (containing token string, bytes, prob, masked), top_k: Array<Token> for alternative tokens
Created for each generated token, sent from Python kernel to browser widget, rendered with probability information and alternative options
guidance/_schema.pyPydantic model with tokens: list[int], mask: bytes (which tokens are valid), temperature: float for sampling control
Passed to model backends to specify which tokens are valid at each generation step according to grammar constraints
guidance/_ast.pyDataclass with name: str, f: Callable (the decorated Python function), kwargs: dict[str, Any] for parameters
Created when @guidance decorator is applied to a Python function, stored in a registry, executed when called to build grammar trees
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
The tokenizer parameter has a _ll_tokenizer attribute that is compatible with llguidance.LLInterpreter but never checks if this attribute exists or is the right type
If this fails: When using a custom tokenizer without _ll_tokenizer, the parser initialization fails with AttributeError instead of a clear validation error
guidance/_parser.py:TokenParser.__init__
The model.isNew() check will eventually return false within a reasonable time, but uses unbounded polling every 100ms without timeout
If this fails: If model initialization hangs, the widget polls indefinitely consuming CPU and never displays content, making notebooks unresponsive
packages/python/stitch/src/widget.ts:initOnReady
The llguidance module is available and compatible, but only catches import error at module level, not version compatibility at runtime
If this fails: Mismatched llguidance versions can cause silent failures or wrong constraint enforcement behavior without clear error messages
guidance/_parser.py:TokenParser.__init__
Tag pool entries are populated before they are referenced in f-strings, but there's no guarantee that tagged functions are defined before use
If this fails: Using {{G|tag_name|G}} syntax before the tagged function is decorated results in KeyError during string parsing
guidance/_ast.py:_parse_tags
ThreadPoolExecutor with default settings can handle concurrent parser compilation without memory limits or cleanup
If this fails: Heavy grammar compilation workloads can exhaust memory or file descriptors, causing the entire application to become unresponsive
guidance/_parser.py:_parser_cache
Messages from the iframe contain valid JSON that can be parsed, but event.data structure is not validated
If this fails: Malformed messages from the sandboxed iframe can crash the widget with JSON parse errors or access undefined properties
packages/python/stitch/src/widget.ts:recvFromClient
GenData.mask is a bytes object with length matching the tokenizer vocabulary size, but never validates length or type
If this fails: Mask size mismatches cause numpy indexing errors or wrong tokens being masked, leading to invalid generation results
guidance/_parser.py:TokenParser.process_token
The tag delimiters {{G| and |G}} will never appear naturally in user strings as literal text to be generated
If this fails: User content containing these exact delimiters gets incorrectly parsed as function tags, breaking generation or causing undefined tag errors
guidance/_ast.py:tag_start and tag_end
ContextVar state persists correctly across async boundaries and thread switches without corruption
If this fails: In async environments, stateless flag may leak between different guidance function executions, causing unexpected state sharing
guidance/_guidance.py:_in_stateless_context
100ms polling interval is appropriate for all model initialization speeds and doesn't cause performance issues
If this fails: Fast models waste CPU cycles with unnecessary polling; very slow models appear frozen to users who expect faster feedback
packages/python/stitch/src/widget.ts:refreshTimeMs
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
_tag_pool dict stores Function and GrammarNode objects referenced by tags in f-strings, enabling embedding of guidance constructs inside Python string literals
Model.State accumulates generated tokens, captured variables, and conversation history as generation progresses
ThreadPoolExecutor caches parser instances to avoid recompilation overhead when the same grammar is used multiple times
Feedback Loops
- Backtrack on constraint violation (self-correction, balancing) — Trigger: When generated token violates grammar constraints. Action: TokenParser.process_token detects violation, LLInterpreter.advance returns backtrack amount, Model removes invalid tokens and retries generation from earlier state. Exit: When valid token is found that satisfies constraints.
- Interactive generation loop (polling, reinforcing) — Trigger: User executes model with generation rules. Action: Model repeatedly calls backend for next token, validates against constraints, updates visualization, continues until stop condition or max tokens reached. Exit: Stop condition met, max tokens reached, or constraint cannot be satisfied.
Delays
- Parser compilation (compilation, ~Variable based on grammar complexity) — First use of a grammar requires compilation to llguidance parser, subsequent uses are cached
- Widget iframe initialization (async-processing, ~100ms refresh intervals) — StitchView polls model readiness state before initializing bidirectional communication with Jupyter kernel
- Token generation latency (async-processing, ~Variable per model backend) — Each token generation involves model inference, constraint validation, and potentially multiple retries if backtracking occurs
Control Points
- enable_backtrack (feature-flag) — Controls: Whether the parser can backtrack when constraints are violated, or fails immediately. Default: True
- enable_ff_tokens (feature-flag) — Controls: Whether parser can force-forward tokens that are required by the grammar without generation. Default: True
- stateless (runtime-toggle) — Controls: Whether guidance functions execute with isolated state or can access/modify parent context. Default: False
- cache (feature-flag) — Controls: Whether to cache compiled grammar functions for performance. Default: False
- temperature (hyperparameter) — Controls: Sampling temperature for token generation in gen() calls. Default: null
Technology Stack
Rust-based constraint parsing engine that enforces grammar rules during token generation with backtracking support
Validates and serializes data models for parser responses, model configurations, and tool definitions
Template processing for dynamic prompt generation within guidance functions
Handles token probability arrays and numerical operations in constraint parsing
Interfaces with Hugging Face models for local LLM inference and tokenization
Connects to OpenAI API endpoints for remote model access with streaming and tool calling
Powers the interactive visualization widget frontend with reactive component updates
Type-safe development for browser widget components and kernel communication interfaces
Enables bidirectional communication between Python kernel and browser-based visualization components
Key Components
- guidance (decorator) — Main decorator that transforms Python functions into grammar-aware generation functions, handling stateless execution, caching, and indentation processing
guidance/_guidance.py - TokenParser (processor) — Enforces grammar constraints during token generation using the llguidance library, handles backtracking when constraints are violated, and manages token masking
guidance/_parser.py - Model (orchestrator) — Abstract base class that coordinates grammar parsing, token generation, and state management across different LLM backends
guidance/models/_base.py - StitchView (adapter) — Jupyter widget that creates sandboxed iframe for rendering generation progress, handles bidirectional communication between Python kernel and browser
packages/python/stitch/src/widget.ts - LiteralNode (processor) — Represents fixed text that must appear exactly as specified in the generation, used for prompt templates and structured output formatting
guidance/_ast.py - RegexNode (validator) — Enforces regular expression patterns during generation, ensuring output matches specific formats like emails, URLs, or structured data
guidance/_ast.py - RuleNode (generator) — Defines generation rules with constraints like stop conditions, temperature, max tokens, and output capture names
guidance/_ast.py - SelectNode (dispatcher) — Implements choice between alternative grammar branches, allowing conditional generation paths based on model decisions or explicit selection
guidance/_ast.py - Tokenizer (encoder) — Wraps model-specific tokenizers with a common interface, handles special tokens and provides encoding/decoding for constraint parsing
guidance/models/_engine/_tokenizer.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaCompare guidance
Related Library Repositories
Frequently Asked Questions
What is guidance used for?
Controls language model generation with grammars and constraints guidance-ai/guidance is a 9-component library written in Jupyter Notebook. Data flows through 6 distinct pipeline stages. The codebase contains 170 files.
How is guidance architected?
guidance is organized into 4 architecture layers: Grammar DSL, AST & Parser, Model Adapters, Visualization. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through guidance?
Data moves through 6 stages: Parse grammar definition → Compile to parser → Generate tokens → Validate constraints → Update model state → .... User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information. This pipeline design reflects a complex multi-stage processing system.
What technologies does guidance use?
The core stack includes llguidance (Rust-based constraint parsing engine that enforces grammar rules during token generation with backtracking support), pydantic (Validates and serializes data models for parser responses, model configurations, and tool definitions), jinja2 (Template processing for dynamic prompt generation within guidance functions), numpy (Handles token probability arrays and numerical operations in constraint parsing), transformers (Interfaces with Hugging Face models for local LLM inference and tokenization), openai (Connects to OpenAI API endpoints for remote model access with streaming and tool calling), and 3 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does guidance have?
guidance exhibits 3 data pools (Tag pool, Model state), 2 feedback loops, 5 control points, 3 delays. The feedback loops handle self-correction and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does guidance use?
5 design patterns detected: Grammar compilation, Real-time constraint enforcement, Sandboxed visualization, Backend abstraction, Tag-based embedding.
How does guidance compare to alternatives?
CodeSea has side-by-side architecture comparisons of guidance with dspy. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.