unslothai/unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

62,230 stars Python 9 components

Provides web UI for training and running AI models locally with optimized kernels

The system supports two primary workflows: chat inference and training data preparation. For chat, user messages flow from the React frontend through WebSocket connections to the FastAPI backend, get processed by optimized model inference using custom Triton kernels, and stream responses back to the UI. For training, users build data recipes using a visual node editor that generates structured payloads, which execute as background jobs with progress updates broadcast via WebSocket to the frontend.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 9-component fullstack. 730 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

User authentication — Frontend sends login credentials to /api/auth/login endpoint, backend validates against stored hashes and returns JWT tokens for subsequent requests [LoginCredentials → AuthToken]
Hardware detection — Backend scans system for GPU capabilities using nvidia-ml-py, detects CUDA compute capability and available VRAM to determine optimal model configurations
Model selection and loading — User selects model from registry, system checks hardware compatibility and loads model weights with appropriate quantization (4-bit/8-bit) based on available VRAM [ModelWeightsConfig → LoadedModel] (config: training_method, load_in_4bit, optimizer)
Recipe graph construction — Visual editor in RecipeStudioStore manages node placement and edge connections, validates graph topology, and builds structured recipe payload with model configs and processing steps [NodeConfig → RecipePayload] (config: batch_size, max_seq_length, preview_size)
Recipe execution — DataRecipeJobManager receives recipe payload, spawns background processing job, and broadcasts progress updates via WebSocket using job status tracking [RecipePayload → RecipeExecutionRecord] (config: target_num_records, merge_batches)
Chat message processing — ChatRuntimeStore sends user messages through AnthropicRequest format to inference router, which applies chat templates and streams token generation back to frontend [AnthropicRequest → AnthropicResponse] (config: temperature, max_tokens, top_p +1)
Model optimization — TritonKernels replace standard PyTorch operations with custom CUDA implementations for attention, RMSNorm, and SwiGLU to achieve 2-5x speedup during training [TensorBatch → OptimizedTensor] (config: gradient_checkpointing)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

AnthropicRequest studio/backend/models/inference.py
Pydantic model with model: str, max_tokens: Optional[int], messages: list[AnthropicMessage], system: Optional[str], tools: Optional[list], temperature: Optional[float], stream: bool
Created by chat UI when user sends messages, processed by inference handler to generate responses

RecipePayload studio/frontend/src/features/recipe-studio/utils/payload/types.ts
TypeScript object with recipe: {model_providers: object[], mcp_providers: object[], model_configs: object[], seed_config?: object, tool_configs: object[], columns: object[], processors: object[]}, run: {rows: number, preview: boolean, output_formats: string[]}, ui: {nodes: object[], edges: object[], layout_direction?: string}
Built by the visual recipe editor from node graph, serialized for backend execution, and stored with execution records

RecipeExecutionRecord studio/frontend/src/features/recipe-studio/execution-types.ts
TypeScript object with id: string, recipeId: string, jobId: string|null, kind: 'preview'|'full', status: RecipeExecutionStatus, progress: RecipeExecutionProgress|null, dataset: object[], analysis: RecipeExecutionAnalysis|null
Created when recipe execution starts, updated via WebSocket events during processing, stored for historical tracking

ModelWeightsConfig studio/backend/utils/hardware/vram_estimation.py
Python dataclass with hidden_size: int, num_hidden_layers: int, num_attention_heads: int, num_key_value_heads: int, intermediate_size: int, vocab_size: int, tie_word_embeddings: bool, num_experts: Optional[int]
Extracted from model config files during model initialization, used to calculate memory requirements for training and inference

TrainingConfig studio/backend/utils/hardware/vram_estimation.py
Python dataclass with training_method: str='qlora', batch_size: int=4, max_seq_length: int=2048, lora_rank: int=16, target_modules: list, gradient_checkpointing: str='unsloth', optimizer: str='adamw_8bit', load_in_4bit: bool=True
Configured by user through UI, validated against available hardware resources, passed to training pipeline

ThreadRecord studio/frontend/src/features/chat/types.ts
TypeScript object with id: string, modelType: string, pairId: string, archived: boolean, createdAt: number, title?: string, modelId?: string
Created when user starts new chat conversation, persisted locally in browser IndexedDB, updated with conversation metadata

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

The _platform_compat module exists and successfully fixes Anaconda/conda-forge platform._sys_version_cache issues before library imports

If this fails: On systems with different Python distributions or missing this module, attrs -> rich -> structlog -> platform imports could crash with cryptic C-level errors during server startup

studio/backend/main.py:platform_compat_import

critical Environment weakly guarded

Windows registry MIME type fixes are only needed on Windows platform and that mimetypes.add_type() is called before StaticFiles instantiation

If this fails: If registry contains incorrect mappings and this fix runs after StaticFiles init, browsers will refuse to execute .js files served as text/plain, resulting in blank frontend pages

studio/backend/main.py:mimetypes_registry_fix

warning Resource weakly guarded

Browser localStorage is available and has sufficient storage quota for inference parameters, tokens, and settings persistence

If this fails: In private browsing, storage-disabled browsers, or when quota exceeded, user preferences silently fail to persist and chat configuration resets on page refresh

studio/frontend/src/features/chat/stores/chat-runtime-store.ts:localStorage_availability

critical Contract unguarded

Node configurations in the configs record always have corresponding entries in the nodes array, and edge connections reference valid node IDs

If this fails: Orphaned configs or dangling edge references cause recipe execution to fail with 'node not found' errors or corrupt the visual graph state

studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts:node_edge_synchronization

info Scale unguarded

The nextId counter for node generation will never overflow JavaScript's MAX_SAFE_INTEGER (9,007,199,254,740,991) during a single session

If this fails: After creating billions of nodes, ID collisions could occur causing node replacement, data loss, or graph corruption

studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts:nextId_overflow

critical Domain unguarded

GPU hardware detection and VRAM estimation modules (utils.hardware) correctly identify all supported GPU architectures and memory configurations

If this fails: Unsupported or incorrectly detected GPUs could cause model loading to fail, use wrong quantization settings, or exceed available memory leading to OOM crashes

studio/backend/main.py:gpu_hardware_detection

warning Temporal weakly guarded

JWT token refresh happens before expiration and that the backend refresh endpoint remains available during token renewal

If this fails: If refresh fails or happens too late, users lose authentication mid-conversation and their chat history/progress is lost without graceful recovery

studio/frontend/src/features/chat/stores/chat-runtime-store.ts:token_refresh_timing

warning Contract weakly guarded

AnthropicRequest format from frontend exactly matches expected message structure with required fields (model, messages) and optional fields in correct formats

If this fails: Malformed requests cause silent failures or wrong model responses rather than clear validation errors, making debugging difficult

studio/backend/routes/inference.py:anthropic_api_compatibility

warning Resource unguarded

WebSocket connections for recipe progress updates can handle concurrent jobs and multiple frontend clients without memory leaks or connection limits

If this fails: Under high load, WebSocket connections could exhaust server resources, causing recipe progress updates to stop working and jobs to appear hung

studio/backend/core/data_recipe/jobs/manager.py:websocket_connection_scaling

critical Ordering weakly guarded

Recipe node graphs maintain valid DAG (Directed Acyclic Graph) structure and that edge connections preserve data flow semantics

If this fails: Circular dependencies or invalid connections cause infinite loops during recipe execution or produce nonsensical data transformations

studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts:graph_topology_validation

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

IndexedDB Chat Storage (in-memory)
Browser-local database storing chat threads, messages, and conversation history using Dexie wrapper

Recipe Execution Cache (state-store)
In-memory tracking of active recipe jobs with progress state and WebSocket connections for real-time updates

Model Weight Cache (file-store)
Local filesystem cache of downloaded model weights and tokenizers to avoid re-downloading on subsequent loads

Authentication Session Store (in-memory)
Runtime storage of active JWT tokens and session state for user authentication validation

Feedback Loops

Recipe Execution Progress (polling, reinforcing) — Trigger: Background job processing batches. Action: DataRecipeJobManager broadcasts progress updates via WebSocket to frontend. Exit: Job completion or cancellation.
Streaming Chat Response (async-processing, reinforcing) — Trigger: User sends chat message. Action: Model generates tokens incrementally, each token streamed to frontend via SSE connection. Exit: Model outputs stop token or reaches max_tokens limit.
Hardware-Based Model Selection (auto-scale, balancing) — Trigger: Model loading request. Action: System checks VRAM availability against model requirements, downgrades to quantized versions if needed. Exit: Compatible model configuration found.
Authentication Token Refresh (cache-invalidation, balancing) — Trigger: JWT token near expiration. Action: Frontend automatically requests new token using refresh token. Exit: New valid token obtained or refresh fails.

Delays

Model Loading Latency (warmup, ~5-30 seconds) — First inference request waits while model weights load into GPU memory
Recipe Job Batching (batch-window, ~configurable) — Small datasets may wait for batch to fill before processing begins
Triton Kernel Compilation (compilation, ~1-3 seconds) — First training step compiles custom CUDA kernels for the specific GPU architecture
Frontend State Persistence (async-processing, ~~100ms) — Recipe editor changes are debounced to avoid excessive localStorage writes

Control Points

Training Method Selection (architecture-switch) — Controls: Whether to use LoRA, QLoRA, or full fine-tuning based on available memory. Default: qlora
Model Quantization (precision-mode) — Controls: Enables 4-bit or 8-bit quantization to reduce VRAM usage at slight accuracy cost. Default: true
Batch Size Scaling (hyperparameter) — Controls: Training throughput vs memory usage tradeoff. Default: 4
Gradient Checkpointing (runtime-toggle) — Controls: Trades compute for memory by recomputing activations during backward pass. Default: unsloth
Chat Auto-Healing (feature-flag) — Controls: Automatically retries failed tool calls with error context. Default: true
Preview Row Limit (threshold) — Controls: Maximum rows shown in dataset preview to prevent UI lag. Default: 10-50

Technology Stack

FastAPI (framework)
Provides HTTP/WebSocket server with automatic OpenAPI documentation and dependency injection for backend APIs

React (framework)
Powers the frontend UI with component-based architecture and state management for complex interfaces

PyTorch (framework)
Core deep learning framework for model loading, training, and inference operations

Triton (compute)
GPU programming language for writing custom CUDA kernels that accelerate model operations

Zustand (library)
Lightweight state management for React components, particularly recipe editor and chat interface

TanStack Router (library)
Type-safe routing with authentication guards and nested route layouts for the frontend

Pydantic (serialization)
Data validation and serialization for API request/response models and configuration schemas

IndexedDB (database)
Browser-local storage for chat history and user preferences using Dexie wrapper library

Transformers (library)
Hugging Face library for model loading, tokenization, and integration with model hub

CUDA (runtime)
GPU compute platform for accelerated training and inference operations

Key Components

FastAPI (gateway) — Main HTTP server that handles all API requests, serves static frontend files, and manages WebSocket connections for real-time updates during model operations studio/backend/main.py
RecipeStudioStore (orchestrator) — Zustand state manager that coordinates the visual recipe editor - managing node graph state, edge connections, configuration panels, and layout operations studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts
ChatRuntimeStore (orchestrator) — Manages chat session state including model selection, inference parameters, conversation history, and real-time streaming responses from the backend studio/frontend/src/features/chat/stores/chat-runtime-store.ts
DataRecipeJobManager (scheduler) — Orchestrates execution of data recipe jobs by managing job queues, progress tracking, and WebSocket event broadcasting to update frontend in real-time studio/backend/core/data_recipe/jobs/manager.py
HardwareDetection (detector) — Detects available GPU hardware, CUDA capabilities, and memory constraints to determine optimal model loading and training configurations studio/backend/utils/hardware/hardware.py
ModelRegistry (registry) — Maintains catalog of supported model architectures and their optimization patches, handles model loading with appropriate kernel substitutions unsloth/models/
TritonKernels (optimizer) — Custom CUDA kernels written in Triton that accelerate attention computation, matrix operations, and gradient updates for 2-5x training speedup unsloth/kernels/
InferenceRouter (adapter) — Converts between Anthropic-compatible API format and internal model interfaces, handles streaming responses and tool calling workflows studio/backend/routes/inference.py
AuthenticationMiddleware (gateway) — JWT-based authentication system that validates tokens, manages session refresh, and enforces password change requirements for secure access studio/backend/auth/authentication.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare unsloth

Related Fullstack Repositories

Frequently Asked Questions

What is unsloth used for?

Provides web UI for training and running AI models locally with optimized kernels unslothai/unsloth is a 9-component fullstack written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 730 files.

How is unsloth architected?

unsloth is organized into 4 architecture layers: Web Frontend, Backend API, Core ML Library, CLI Interface. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through unsloth?

Data moves through 7 stages: User authentication → Hardware detection → Model selection and loading → Recipe graph construction → Recipe execution → .... The system supports two primary workflows: chat inference and training data preparation. For chat, user messages flow from the React frontend through WebSocket connections to the FastAPI backend, get processed by optimized model inference using custom Triton kernels, and stream responses back to the UI. For training, users build data recipes using a visual node editor that generates structured payloads, which execute as background jobs with progress updates broadcast via WebSocket to the frontend. This pipeline design reflects a complex multi-stage processing system.

What technologies does unsloth use?

The core stack includes FastAPI (Provides HTTP/WebSocket server with automatic OpenAPI documentation and dependency injection for backend APIs), React (Powers the frontend UI with component-based architecture and state management for complex interfaces), PyTorch (Core deep learning framework for model loading, training, and inference operations), Triton (GPU programming language for writing custom CUDA kernels that accelerate model operations), Zustand (Lightweight state management for React components, particularly recipe editor and chat interface), TanStack Router (Type-safe routing with authentication guards and nested route layouts for the frontend), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does unsloth have?

unsloth exhibits 4 data pools (IndexedDB Chat Storage, Recipe Execution Cache), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle polling and async-processing. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does unsloth use?

6 design patterns detected: Optimistic UI Updates, WebSocket Event Broadcasting, Hardware-Aware Adaptation, Plugin Architecture, Kernel Substitution, and 1 more.

How does unsloth compare to alternatives?

CodeSea has side-by-side architecture comparisons of unsloth with peft. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.

unslothai/unsloth

How Data Flows Through the System

Data Models

Hidden Assumptions

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Compare unsloth

unsloth vs Peft

Related Fullstack Repositories

tensorflow/tensorflow

automatic1111/stable-diffusion-webui

huggingface/transformers

ggml-org/llama.cpp

pytorch/pytorch

openai/whisper

Frequently Asked Questions