guidance-ai/guidance

A guidance language for controlling large language models.

21,399 stars Jupyter Notebook 9 components

Controls language model generation with grammars and constraints

User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information.

Under the hood, the system uses 2 feedback loops, 3 data pools, 5 control points to manage its runtime behavior.

A 9-component library. 170 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Parse grammar definition — The @guidance decorator intercepts Python function calls, strips indentation, and converts the function body containing library calls (gen(), select(), regex()) into a tree of GrammarNode objects representing the generation constraints [Function → GrammarNode]
Compile to parser — TokenParser.__init__ takes the GrammarNode tree and compiles it into an llguidance.LLInterpreter instance along with the model's tokenizer, creating the constraint enforcement engine [GrammarNode → LLInterpreter]
Generate tokens — The model backend calls TokenParser.process_token for each potential token, which checks if the token violates constraints and returns a GenData object with a mask indicating which tokens are valid for the next step [Token candidates → GenData]
Validate constraints — The parser processes each generated token through LLInterpreter.advance, which may trigger backtracking if constraints are violated, returning an LLInterpreterResponse with new bytes, probabilities, and any required backtrack amount [Generated tokens → LLInterpreterResponse]
Update model state — The Model class processes the LLInterpreterResponse to update its internal state, handle any backtracking by removing tokens, and capture named outputs into the model's variable dictionary [LLInterpreterResponse → Updated model state]
Display in notebook — TokenOutput objects are serialized and sent to the StitchView Jupyter widget, which renders them in a sandboxed iframe showing generation progress, token probabilities, and captured variables in real-time [TokenOutput → Visualization]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

GrammarNode guidance/_ast.py
Abstract base class with subclasses like LiteralNode (fixed text), RegexNode (pattern matching), RuleNode (generation with constraints), SelectNode (alternatives), RepeatNode (loops), and SpecialToken (model-specific tokens)
Created when parsing @guidance decorated functions, compiled into constraint rules, then consumed during token generation to enforce patterns

LLInterpreterResponse guidance/_schema.py
Pydantic model with new_bytes: bytes, is_generated: bool, new_bytes_prob: float, capture_groups: dict, backtrack: NonNegativeInt, latency_ms: NonNegativeInt
Generated by the parser for each token, contains the new bytes to add, whether they came from generation or forced input, probability scores, and any backtracking needed

TokenOutput client/graphpaper-inline/src/stitch.ts
TypeScript interface extending TextOutput with token: Token (containing token string, bytes, prob, masked), top_k: Array<Token> for alternative tokens
Created for each generated token, sent from Python kernel to browser widget, rendered with probability information and alternative options

GenData guidance/_schema.py
Pydantic model with tokens: list[int], mask: bytes (which tokens are valid), temperature: float for sampling control
Passed to model backends to specify which tokens are valid at each generation step according to grammar constraints

Function guidance/_ast.py
Dataclass with name: str, f: Callable (the decorated Python function), kwargs: dict[str, Any] for parameters
Created when @guidance decorator is applied to a Python function, stored in a registry, executed when called to build grammar trees

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract unguarded

The tokenizer parameter has a _ll_tokenizer attribute that is compatible with llguidance.LLInterpreter but never checks if this attribute exists or is the right type

If this fails: When using a custom tokenizer without _ll_tokenizer, the parser initialization fails with AttributeError instead of a clear validation error

guidance/_parser.py:TokenParser.__init__

critical Temporal unguarded

The model.isNew() check will eventually return false within a reasonable time, but uses unbounded polling every 100ms without timeout

If this fails: If model initialization hangs, the widget polls indefinitely consuming CPU and never displays content, making notebooks unresponsive

packages/python/stitch/src/widget.ts:initOnReady

critical Environment weakly guarded

The llguidance module is available and compatible, but only catches import error at module level, not version compatibility at runtime

If this fails: Mismatched llguidance versions can cause silent failures or wrong constraint enforcement behavior without clear error messages

guidance/_parser.py:TokenParser.__init__

critical Ordering unguarded

Tag pool entries are populated before they are referenced in f-strings, but there's no guarantee that tagged functions are defined before use

If this fails: Using {{G|tag_name|G}} syntax before the tagged function is decorated results in KeyError during string parsing

guidance/_ast.py:_parse_tags

warning Resource unguarded

ThreadPoolExecutor with default settings can handle concurrent parser compilation without memory limits or cleanup

If this fails: Heavy grammar compilation workloads can exhaust memory or file descriptors, causing the entire application to become unresponsive

guidance/_parser.py:_parser_cache

warning Contract unguarded

Messages from the iframe contain valid JSON that can be parsed, but event.data structure is not validated

If this fails: Malformed messages from the sandboxed iframe can crash the widget with JSON parse errors or access undefined properties

packages/python/stitch/src/widget.ts:recvFromClient

warning Shape weakly guarded

GenData.mask is a bytes object with length matching the tokenizer vocabulary size, but never validates length or type

If this fails: Mask size mismatches cause numpy indexing errors or wrong tokens being masked, leading to invalid generation results

guidance/_parser.py:TokenParser.process_token

warning Domain unguarded

The tag delimiters {{G| and |G}} will never appear naturally in user strings as literal text to be generated

If this fails: User content containing these exact delimiters gets incorrectly parsed as function tags, breaking generation or causing undefined tag errors

guidance/_ast.py:tag_start and tag_end

warning Temporal unguarded

ContextVar state persists correctly across async boundaries and thread switches without corruption

If this fails: In async environments, stateless flag may leak between different guidance function executions, causing unexpected state sharing

guidance/_guidance.py:_in_stateless_context

info Scale unguarded

100ms polling interval is appropriate for all model initialization speeds and doesn't cause performance issues

If this fails: Fast models waste CPU cycles with unnecessary polling; very slow models appear frozen to users who expect faster feedback

packages/python/stitch/src/widget.ts:refreshTimeMs

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Tag pool (registry)
_tag_pool dict stores Function and GrammarNode objects referenced by tags in f-strings, enabling embedding of guidance constructs inside Python string literals

Model state (state-store)
Model.State accumulates generated tokens, captured variables, and conversation history as generation progresses

Parser cache (cache)
ThreadPoolExecutor caches parser instances to avoid recompilation overhead when the same grammar is used multiple times

Feedback Loops

Backtrack on constraint violation (self-correction, balancing) — Trigger: When generated token violates grammar constraints. Action: TokenParser.process_token detects violation, LLInterpreter.advance returns backtrack amount, Model removes invalid tokens and retries generation from earlier state. Exit: When valid token is found that satisfies constraints.
Interactive generation loop (polling, reinforcing) — Trigger: User executes model with generation rules. Action: Model repeatedly calls backend for next token, validates against constraints, updates visualization, continues until stop condition or max tokens reached. Exit: Stop condition met, max tokens reached, or constraint cannot be satisfied.

Delays

Parser compilation (compilation, ~Variable based on grammar complexity) — First use of a grammar requires compilation to llguidance parser, subsequent uses are cached
Widget iframe initialization (async-processing, ~100ms refresh intervals) — StitchView polls model readiness state before initializing bidirectional communication with Jupyter kernel
Token generation latency (async-processing, ~Variable per model backend) — Each token generation involves model inference, constraint validation, and potentially multiple retries if backtracking occurs

Control Points

enable_backtrack (feature-flag) — Controls: Whether the parser can backtrack when constraints are violated, or fails immediately. Default: True
enable_ff_tokens (feature-flag) — Controls: Whether parser can force-forward tokens that are required by the grammar without generation. Default: True
stateless (runtime-toggle) — Controls: Whether guidance functions execute with isolated state or can access/modify parent context. Default: False
cache (feature-flag) — Controls: Whether to cache compiled grammar functions for performance. Default: False
temperature (hyperparameter) — Controls: Sampling temperature for token generation in gen() calls. Default: null

Technology Stack

llguidance (library)
Rust-based constraint parsing engine that enforces grammar rules during token generation with backtracking support

pydantic (serialization)
Validates and serializes data models for parser responses, model configurations, and tool definitions

jinja2 (library)
Template processing for dynamic prompt generation within guidance functions

numpy (compute)
Handles token probability arrays and numerical operations in constraint parsing

transformers (library)
Interfaces with Hugging Face models for local LLM inference and tokenization

openai (library)
Connects to OpenAI API endpoints for remote model access with streaming and tool calling

svelte (framework)
Powers the interactive visualization widget frontend with reactive component updates

typescript (framework)
Type-safe development for browser widget components and kernel communication interfaces

jupyter-widgets (framework)
Enables bidirectional communication between Python kernel and browser-based visualization components

Key Components

guidance (decorator) — Main decorator that transforms Python functions into grammar-aware generation functions, handling stateless execution, caching, and indentation processing guidance/_guidance.py
TokenParser (processor) — Enforces grammar constraints during token generation using the llguidance library, handles backtracking when constraints are violated, and manages token masking guidance/_parser.py
Model (orchestrator) — Abstract base class that coordinates grammar parsing, token generation, and state management across different LLM backends guidance/models/_base.py
StitchView (adapter) — Jupyter widget that creates sandboxed iframe for rendering generation progress, handles bidirectional communication between Python kernel and browser packages/python/stitch/src/widget.ts
LiteralNode (processor) — Represents fixed text that must appear exactly as specified in the generation, used for prompt templates and structured output formatting guidance/_ast.py
RegexNode (validator) — Enforces regular expression patterns during generation, ensuring output matches specific formats like emails, URLs, or structured data guidance/_ast.py
RuleNode (generator) — Defines generation rules with constraints like stop conditions, temperature, max tokens, and output capture names guidance/_ast.py
SelectNode (dispatcher) — Implements choice between alternative grammar branches, allowing conditional generation paths based on model decisions or explicit selection guidance/_ast.py
Tokenizer (encoder) — Wraps model-specific tokenizers with a common interface, handles special tokens and provides encoding/decoding for constraint parsing guidance/models/_engine/_tokenizer.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare guidance

Related Library Repositories

Frequently Asked Questions

What is guidance used for?

Controls language model generation with grammars and constraints guidance-ai/guidance is a 9-component library written in Jupyter Notebook. Data flows through 6 distinct pipeline stages. The codebase contains 170 files.

How is guidance architected?

guidance is organized into 4 architecture layers: Grammar DSL, AST & Parser, Model Adapters, Visualization. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through guidance?

Data moves through 6 stages: Parse grammar definition → Compile to parser → Generate tokens → Validate constraints → Update model state → .... User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information. This pipeline design reflects a complex multi-stage processing system.

What technologies does guidance use?

The core stack includes llguidance (Rust-based constraint parsing engine that enforces grammar rules during token generation with backtracking support), pydantic (Validates and serializes data models for parser responses, model configurations, and tool definitions), jinja2 (Template processing for dynamic prompt generation within guidance functions), numpy (Handles token probability arrays and numerical operations in constraint parsing), transformers (Interfaces with Hugging Face models for local LLM inference and tokenization), openai (Connects to OpenAI API endpoints for remote model access with streaming and tool calling), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does guidance have?

guidance exhibits 3 data pools (Tag pool, Model state), 2 feedback loops, 5 control points, 3 delays. The feedback loops handle self-correction and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does guidance use?

5 design patterns detected: Grammar compilation, Real-time constraint enforcement, Sandboxed visualization, Backend abstraction, Tag-based embedding.

How does guidance compare to alternatives?

CodeSea has side-by-side architecture comparisons of guidance with dspy. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.

guidance-ai/guidance

How Data Flows Through the System

Data Models

Hidden Assumptions

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Compare guidance

guidance vs Dspy

Related Library Repositories

significant-gravitas/autogpt

ollama/ollama

langflow-ai/langflow

langchain-ai/langchain

ggml-org/llama.cpp

instructkr/claw-code

Frequently Asked Questions