unslothai/unsloth

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

62,230 stars Python 9 components

Provides web UI for training and running AI models locally with optimized kernels

The system supports two primary workflows: chat inference and training data preparation. For chat, user messages flow from the React frontend through WebSocket connections to the FastAPI backend, get processed by optimized model inference using custom Triton kernels, and stream responses back to the UI. For training, users build data recipes using a visual node editor that generates structured payloads, which execute as background jobs with progress updates broadcast via WebSocket to the frontend.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 9-component fullstack. 730 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

The system supports two primary workflows: chat inference and training data preparation. For chat, user messages flow from the React frontend through WebSocket connections to the FastAPI backend, get processed by optimized model inference using custom Triton kernels, and stream responses back to the UI. For training, users build data recipes using a visual node editor that generates structured payloads, which execute as background jobs with progress updates broadcast via WebSocket to the frontend.

  1. User authentication — Frontend sends login credentials to /api/auth/login endpoint, backend validates against stored hashes and returns JWT tokens for subsequent requests [LoginCredentials → AuthToken]
  2. Hardware detection — Backend scans system for GPU capabilities using nvidia-ml-py, detects CUDA compute capability and available VRAM to determine optimal model configurations
  3. Model selection and loading — User selects model from registry, system checks hardware compatibility and loads model weights with appropriate quantization (4-bit/8-bit) based on available VRAM [ModelWeightsConfig → LoadedModel] (config: training_method, load_in_4bit, optimizer)
  4. Recipe graph construction — Visual editor in RecipeStudioStore manages node placement and edge connections, validates graph topology, and builds structured recipe payload with model configs and processing steps [NodeConfig → RecipePayload] (config: batch_size, max_seq_length, preview_size)
  5. Recipe execution — DataRecipeJobManager receives recipe payload, spawns background processing job, and broadcasts progress updates via WebSocket using job status tracking [RecipePayload → RecipeExecutionRecord] (config: target_num_records, merge_batches)
  6. Chat message processing — ChatRuntimeStore sends user messages through AnthropicRequest format to inference router, which applies chat templates and streams token generation back to frontend [AnthropicRequest → AnthropicResponse] (config: temperature, max_tokens, top_p +1)
  7. Model optimization — TritonKernels replace standard PyTorch operations with custom CUDA implementations for attention, RMSNorm, and SwiGLU to achieve 2-5x speedup during training [TensorBatch → OptimizedTensor] (config: gradient_checkpointing)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

AnthropicRequest studio/backend/models/inference.py
Pydantic model with model: str, max_tokens: Optional[int], messages: list[AnthropicMessage], system: Optional[str], tools: Optional[list], temperature: Optional[float], stream: bool
Created by chat UI when user sends messages, processed by inference handler to generate responses
RecipePayload studio/frontend/src/features/recipe-studio/utils/payload/types.ts
TypeScript object with recipe: {model_providers: object[], mcp_providers: object[], model_configs: object[], seed_config?: object, tool_configs: object[], columns: object[], processors: object[]}, run: {rows: number, preview: boolean, output_formats: string[]}, ui: {nodes: object[], edges: object[], layout_direction?: string}
Built by the visual recipe editor from node graph, serialized for backend execution, and stored with execution records
RecipeExecutionRecord studio/frontend/src/features/recipe-studio/execution-types.ts
TypeScript object with id: string, recipeId: string, jobId: string|null, kind: 'preview'|'full', status: RecipeExecutionStatus, progress: RecipeExecutionProgress|null, dataset: object[], analysis: RecipeExecutionAnalysis|null
Created when recipe execution starts, updated via WebSocket events during processing, stored for historical tracking
ModelWeightsConfig studio/backend/utils/hardware/vram_estimation.py
Python dataclass with hidden_size: int, num_hidden_layers: int, num_attention_heads: int, num_key_value_heads: int, intermediate_size: int, vocab_size: int, tie_word_embeddings: bool, num_experts: Optional[int]
Extracted from model config files during model initialization, used to calculate memory requirements for training and inference
TrainingConfig studio/backend/utils/hardware/vram_estimation.py
Python dataclass with training_method: str='qlora', batch_size: int=4, max_seq_length: int=2048, lora_rank: int=16, target_modules: list, gradient_checkpointing: str='unsloth', optimizer: str='adamw_8bit', load_in_4bit: bool=True
Configured by user through UI, validated against available hardware resources, passed to training pipeline
ThreadRecord studio/frontend/src/features/chat/types.ts
TypeScript object with id: string, modelType: string, pairId: string, archived: boolean, createdAt: number, title?: string, modelId?: string
Created when user starts new chat conversation, persisted locally in browser IndexedDB, updated with conversation metadata

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

The _platform_compat module exists and successfully fixes Anaconda/conda-forge platform._sys_version_cache issues before library imports

If this fails: On systems with different Python distributions or missing this module, attrs -> rich -> structlog -> platform imports could crash with cryptic C-level errors during server startup

studio/backend/main.py:platform_compat_import
critical Environment weakly guarded

Windows registry MIME type fixes are only needed on Windows platform and that mimetypes.add_type() is called before StaticFiles instantiation

If this fails: If registry contains incorrect mappings and this fix runs after StaticFiles init, browsers will refuse to execute .js files served as text/plain, resulting in blank frontend pages

studio/backend/main.py:mimetypes_registry_fix
warning Resource weakly guarded

Browser localStorage is available and has sufficient storage quota for inference parameters, tokens, and settings persistence

If this fails: In private browsing, storage-disabled browsers, or when quota exceeded, user preferences silently fail to persist and chat configuration resets on page refresh

studio/frontend/src/features/chat/stores/chat-runtime-store.ts:localStorage_availability
critical Contract unguarded

Node configurations in the configs record always have corresponding entries in the nodes array, and edge connections reference valid node IDs

If this fails: Orphaned configs or dangling edge references cause recipe execution to fail with 'node not found' errors or corrupt the visual graph state

studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts:node_edge_synchronization
info Scale unguarded

The nextId counter for node generation will never overflow JavaScript's MAX_SAFE_INTEGER (9,007,199,254,740,991) during a single session

If this fails: After creating billions of nodes, ID collisions could occur causing node replacement, data loss, or graph corruption

studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts:nextId_overflow
critical Domain unguarded

GPU hardware detection and VRAM estimation modules (utils.hardware) correctly identify all supported GPU architectures and memory configurations

If this fails: Unsupported or incorrectly detected GPUs could cause model loading to fail, use wrong quantization settings, or exceed available memory leading to OOM crashes

studio/backend/main.py:gpu_hardware_detection
warning Temporal weakly guarded

JWT token refresh happens before expiration and that the backend refresh endpoint remains available during token renewal

If this fails: If refresh fails or happens too late, users lose authentication mid-conversation and their chat history/progress is lost without graceful recovery

studio/frontend/src/features/chat/stores/chat-runtime-store.ts:token_refresh_timing
warning Contract weakly guarded

AnthropicRequest format from frontend exactly matches expected message structure with required fields (model, messages) and optional fields in correct formats

If this fails: Malformed requests cause silent failures or wrong model responses rather than clear validation errors, making debugging difficult

studio/backend/routes/inference.py:anthropic_api_compatibility
warning Resource unguarded

WebSocket connections for recipe progress updates can handle concurrent jobs and multiple frontend clients without memory leaks or connection limits

If this fails: Under high load, WebSocket connections could exhaust server resources, causing recipe progress updates to stop working and jobs to appear hung

studio/backend/core/data_recipe/jobs/manager.py:websocket_connection_scaling
critical Ordering weakly guarded

Recipe node graphs maintain valid DAG (Directed Acyclic Graph) structure and that edge connections preserve data flow semantics

If this fails: Circular dependencies or invalid connections cause infinite loops during recipe execution or produce nonsensical data transformations

studio/frontend/src/features/recipe-studio/stores/recipe-studio.ts:graph_topology_validation

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

IndexedDB Chat Storage (in-memory)
Browser-local database storing chat threads, messages, and conversation history using Dexie wrapper
Recipe Execution Cache (state-store)
In-memory tracking of active recipe jobs with progress state and WebSocket connections for real-time updates
Model Weight Cache (file-store)
Local filesystem cache of downloaded model weights and tokenizers to avoid re-downloading on subsequent loads
Authentication Session Store (in-memory)
Runtime storage of active JWT tokens and session state for user authentication validation

Feedback Loops

Delays

Control Points

Technology Stack

FastAPI (framework)
Provides HTTP/WebSocket server with automatic OpenAPI documentation and dependency injection for backend APIs
React (framework)
Powers the frontend UI with component-based architecture and state management for complex interfaces
PyTorch (framework)
Core deep learning framework for model loading, training, and inference operations
Triton (compute)
GPU programming language for writing custom CUDA kernels that accelerate model operations
Zustand (library)
Lightweight state management for React components, particularly recipe editor and chat interface
TanStack Router (library)
Type-safe routing with authentication guards and nested route layouts for the frontend
Pydantic (serialization)
Data validation and serialization for API request/response models and configuration schemas
IndexedDB (database)
Browser-local storage for chat history and user preferences using Dexie wrapper library
Transformers (library)
Hugging Face library for model loading, tokenization, and integration with model hub
CUDA (runtime)
GPU compute platform for accelerated training and inference operations

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare unsloth

Related Fullstack Repositories

Frequently Asked Questions

What is unsloth used for?

Provides web UI for training and running AI models locally with optimized kernels unslothai/unsloth is a 9-component fullstack written in Python. Data flows through 7 distinct pipeline stages. The codebase contains 730 files.

How is unsloth architected?

unsloth is organized into 4 architecture layers: Web Frontend, Backend API, Core ML Library, CLI Interface. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through unsloth?

Data moves through 7 stages: User authentication → Hardware detection → Model selection and loading → Recipe graph construction → Recipe execution → .... The system supports two primary workflows: chat inference and training data preparation. For chat, user messages flow from the React frontend through WebSocket connections to the FastAPI backend, get processed by optimized model inference using custom Triton kernels, and stream responses back to the UI. For training, users build data recipes using a visual node editor that generates structured payloads, which execute as background jobs with progress updates broadcast via WebSocket to the frontend. This pipeline design reflects a complex multi-stage processing system.

What technologies does unsloth use?

The core stack includes FastAPI (Provides HTTP/WebSocket server with automatic OpenAPI documentation and dependency injection for backend APIs), React (Powers the frontend UI with component-based architecture and state management for complex interfaces), PyTorch (Core deep learning framework for model loading, training, and inference operations), Triton (GPU programming language for writing custom CUDA kernels that accelerate model operations), Zustand (Lightweight state management for React components, particularly recipe editor and chat interface), TanStack Router (Type-safe routing with authentication guards and nested route layouts for the frontend), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does unsloth have?

unsloth exhibits 4 data pools (IndexedDB Chat Storage, Recipe Execution Cache), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle polling and async-processing. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does unsloth use?

6 design patterns detected: Optimistic UI Updates, WebSocket Event Broadcasting, Hardware-Aware Adaptation, Plugin Architecture, Kernel Substitution, and 1 more.

How does unsloth compare to alternatives?

CodeSea has side-by-side architecture comparisons of unsloth with peft. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by .