guidance-ai/guidance

A guidance language for controlling large language models.

21,399 stars Jupyter Notebook 9 components

Controls language model generation with grammars and constraints

User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information.

Under the hood, the system uses 2 feedback loops, 3 data pools, 5 control points to manage its runtime behavior.

A 9-component library. 170 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information.

  1. Parse grammar definition — The @guidance decorator intercepts Python function calls, strips indentation, and converts the function body containing library calls (gen(), select(), regex()) into a tree of GrammarNode objects representing the generation constraints [Function → GrammarNode]
  2. Compile to parser — TokenParser.__init__ takes the GrammarNode tree and compiles it into an llguidance.LLInterpreter instance along with the model's tokenizer, creating the constraint enforcement engine [GrammarNode → LLInterpreter]
  3. Generate tokens — The model backend calls TokenParser.process_token for each potential token, which checks if the token violates constraints and returns a GenData object with a mask indicating which tokens are valid for the next step [Token candidates → GenData]
  4. Validate constraints — The parser processes each generated token through LLInterpreter.advance, which may trigger backtracking if constraints are violated, returning an LLInterpreterResponse with new bytes, probabilities, and any required backtrack amount [Generated tokens → LLInterpreterResponse]
  5. Update model state — The Model class processes the LLInterpreterResponse to update its internal state, handle any backtracking by removing tokens, and capture named outputs into the model's variable dictionary [LLInterpreterResponse → Updated model state]
  6. Display in notebook — TokenOutput objects are serialized and sent to the StitchView Jupyter widget, which renders them in a sandboxed iframe showing generation progress, token probabilities, and captured variables in real-time [TokenOutput → Visualization]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

GrammarNode guidance/_ast.py
Abstract base class with subclasses like LiteralNode (fixed text), RegexNode (pattern matching), RuleNode (generation with constraints), SelectNode (alternatives), RepeatNode (loops), and SpecialToken (model-specific tokens)
Created when parsing @guidance decorated functions, compiled into constraint rules, then consumed during token generation to enforce patterns
LLInterpreterResponse guidance/_schema.py
Pydantic model with new_bytes: bytes, is_generated: bool, new_bytes_prob: float, capture_groups: dict, backtrack: NonNegativeInt, latency_ms: NonNegativeInt
Generated by the parser for each token, contains the new bytes to add, whether they came from generation or forced input, probability scores, and any backtracking needed
TokenOutput client/graphpaper-inline/src/stitch.ts
TypeScript interface extending TextOutput with token: Token (containing token string, bytes, prob, masked), top_k: Array<Token> for alternative tokens
Created for each generated token, sent from Python kernel to browser widget, rendered with probability information and alternative options
GenData guidance/_schema.py
Pydantic model with tokens: list[int], mask: bytes (which tokens are valid), temperature: float for sampling control
Passed to model backends to specify which tokens are valid at each generation step according to grammar constraints
Function guidance/_ast.py
Dataclass with name: str, f: Callable (the decorated Python function), kwargs: dict[str, Any] for parameters
Created when @guidance decorator is applied to a Python function, stored in a registry, executed when called to build grammar trees

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract unguarded

The tokenizer parameter has a _ll_tokenizer attribute that is compatible with llguidance.LLInterpreter but never checks if this attribute exists or is the right type

If this fails: When using a custom tokenizer without _ll_tokenizer, the parser initialization fails with AttributeError instead of a clear validation error

guidance/_parser.py:TokenParser.__init__
critical Temporal unguarded

The model.isNew() check will eventually return false within a reasonable time, but uses unbounded polling every 100ms without timeout

If this fails: If model initialization hangs, the widget polls indefinitely consuming CPU and never displays content, making notebooks unresponsive

packages/python/stitch/src/widget.ts:initOnReady
critical Environment weakly guarded

The llguidance module is available and compatible, but only catches import error at module level, not version compatibility at runtime

If this fails: Mismatched llguidance versions can cause silent failures or wrong constraint enforcement behavior without clear error messages

guidance/_parser.py:TokenParser.__init__
critical Ordering unguarded

Tag pool entries are populated before they are referenced in f-strings, but there's no guarantee that tagged functions are defined before use

If this fails: Using {{G|tag_name|G}} syntax before the tagged function is decorated results in KeyError during string parsing

guidance/_ast.py:_parse_tags
warning Resource unguarded

ThreadPoolExecutor with default settings can handle concurrent parser compilation without memory limits or cleanup

If this fails: Heavy grammar compilation workloads can exhaust memory or file descriptors, causing the entire application to become unresponsive

guidance/_parser.py:_parser_cache
warning Contract unguarded

Messages from the iframe contain valid JSON that can be parsed, but event.data structure is not validated

If this fails: Malformed messages from the sandboxed iframe can crash the widget with JSON parse errors or access undefined properties

packages/python/stitch/src/widget.ts:recvFromClient
warning Shape weakly guarded

GenData.mask is a bytes object with length matching the tokenizer vocabulary size, but never validates length or type

If this fails: Mask size mismatches cause numpy indexing errors or wrong tokens being masked, leading to invalid generation results

guidance/_parser.py:TokenParser.process_token
warning Domain unguarded

The tag delimiters {{G| and |G}} will never appear naturally in user strings as literal text to be generated

If this fails: User content containing these exact delimiters gets incorrectly parsed as function tags, breaking generation or causing undefined tag errors

guidance/_ast.py:tag_start and tag_end
warning Temporal unguarded

ContextVar state persists correctly across async boundaries and thread switches without corruption

If this fails: In async environments, stateless flag may leak between different guidance function executions, causing unexpected state sharing

guidance/_guidance.py:_in_stateless_context
info Scale unguarded

100ms polling interval is appropriate for all model initialization speeds and doesn't cause performance issues

If this fails: Fast models waste CPU cycles with unnecessary polling; very slow models appear frozen to users who expect faster feedback

packages/python/stitch/src/widget.ts:refreshTimeMs

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Tag pool (registry)
_tag_pool dict stores Function and GrammarNode objects referenced by tags in f-strings, enabling embedding of guidance constructs inside Python string literals
Model state (state-store)
Model.State accumulates generated tokens, captured variables, and conversation history as generation progresses
Parser cache (cache)
ThreadPoolExecutor caches parser instances to avoid recompilation overhead when the same grammar is used multiple times

Feedback Loops

Delays

Control Points

Technology Stack

llguidance (library)
Rust-based constraint parsing engine that enforces grammar rules during token generation with backtracking support
pydantic (serialization)
Validates and serializes data models for parser responses, model configurations, and tool definitions
jinja2 (library)
Template processing for dynamic prompt generation within guidance functions
numpy (compute)
Handles token probability arrays and numerical operations in constraint parsing
transformers (library)
Interfaces with Hugging Face models for local LLM inference and tokenization
openai (library)
Connects to OpenAI API endpoints for remote model access with streaming and tool calling
svelte (framework)
Powers the interactive visualization widget frontend with reactive component updates
typescript (framework)
Type-safe development for browser widget components and kernel communication interfaces
jupyter-widgets (framework)
Enables bidirectional communication between Python kernel and browser-based visualization components

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare guidance

Related Library Repositories

Frequently Asked Questions

What is guidance used for?

Controls language model generation with grammars and constraints guidance-ai/guidance is a 9-component library written in Jupyter Notebook. Data flows through 6 distinct pipeline stages. The codebase contains 170 files.

How is guidance architected?

guidance is organized into 4 architecture layers: Grammar DSL, AST & Parser, Model Adapters, Visualization. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through guidance?

Data moves through 6 stages: Parse grammar definition → Compile to parser → Generate tokens → Validate constraints → Update model state → .... User defines a grammar using @guidance decorated Python functions containing generation rules and constraints. The decorator compiles these functions into GrammarNode trees, which are passed to TokenParser that works with the model's tokenizer to enforce constraints during generation. Each token is validated against the grammar, with invalid tokens masked out and backtracking applied when needed. Generated tokens flow to visualization widgets that display real-time progress with probability information. This pipeline design reflects a complex multi-stage processing system.

What technologies does guidance use?

The core stack includes llguidance (Rust-based constraint parsing engine that enforces grammar rules during token generation with backtracking support), pydantic (Validates and serializes data models for parser responses, model configurations, and tool definitions), jinja2 (Template processing for dynamic prompt generation within guidance functions), numpy (Handles token probability arrays and numerical operations in constraint parsing), transformers (Interfaces with Hugging Face models for local LLM inference and tokenization), openai (Connects to OpenAI API endpoints for remote model access with streaming and tool calling), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does guidance have?

guidance exhibits 3 data pools (Tag pool, Model state), 2 feedback loops, 5 control points, 3 delays. The feedback loops handle self-correction and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does guidance use?

5 design patterns detected: Grammar compilation, Real-time constraint enforcement, Sandboxed visualization, Backend abstraction, Tag-based embedding.

How does guidance compare to alternatives?

CodeSea has side-by-side architecture comparisons of guidance with dspy. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by .