bmaltais/kohya_ss

12,195 stars Python 9 components

Trains custom Stable Diffusion models using LoRA, Dreambooth, and fine-tuning with a Gradio web interface

Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory.

Under the hood, the system uses 4 feedback loops, 4 data pools, 8 control points to manage its runtime behavior.

A 9-component ml training. 86 files analyzed. Data flows through 8 distinct pipeline stages.

How Data Flows Through the System

Load configuration defaults — KohyaSSGUIConfig reads config.toml and populates default values for all training parameters — sets up folder paths, training hyperparameters, and model-specific options based on saved preferences
Collect GUI parameters — Gradio components in BasicTraining, AdvancedTraining, and model-specific classes gather user input — validates parameter ranges, handles conditional visibility based on training type, and ensures parameter compatibility [TrainingConfig → TrainingParameters] (config: training.learning_rate, training.batch_size, training.max_train_epochs +1)
Select training paths — Folders class provides file browsers for selecting training data directory, pretrained model, output location, and optional VAE — creates missing directories and validates path accessibility [TrainingConfig → ModelPaths] (config: folders.output_dir, folders.logging_dir, folders.reg_data_dir)
Prepare sample prompts — SampleImages.create_prompt_file writes user-provided prompts to sample/prompt.txt in the output directory — formats prompts for the training script's validation image generation system [SamplePrompts → Prompt Files]
Assemble training command — CommandExecutor combines all parameters into command-line arguments for the appropriate sd-scripts module — selects train_network.py for LoRA, train_db.py for Dreambooth, or fine_tune.py based on training type [TrainingParameters → Training Command]
Launch training subprocess — CommandExecutor.execute_command runs the assembled command via subprocess.Popen — includes accelerate launch wrapper for distributed training and provides process monitoring with start/stop controls [Training Command → Training Process State] (config: accelerate_launch.num_processes, accelerate_launch.num_machines)
Monitor training progress — Process stdout/stderr streams are captured and displayed in the Gradio interface — training script writes loss values, sample images, and checkpoint saves to the logging directory [Training Process State → Training Logs]
Save trained model — Training script saves the final LoRA adapters or fine-tuned model weights to the output directory — includes safetensors format with optional metadata like title, author, and tags [Training State → Trained Model Files] (config: metadata.title, metadata.author, metadata.description)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

TrainingConfig kohya_gui/class_gui_config.py
TOML-based configuration dict with nested sections: training.learning_rate, training.batch_size, folders.output_dir, sdxl.cache_text_encoder_outputs, accelerate_launch.mixed_precision
Loaded from config.toml on startup, updated through GUI interactions, validated before training execution, and saved back to file

TrainingParameters kohya_gui/class_basic_training.py
Gradio component values dict with keys like learning_rate: float, lr_scheduler: str, train_batch_size: int, max_train_epochs: int, mixed_precision: str, gradient_accumulation_steps: int
Collected from Gradio form inputs, validated for parameter compatibility, then formatted into command-line arguments

ModelPaths kohya_gui/class_folders.py
Dict with train_data_dir: str, output_dir: str, logging_dir: str, pretrained_model_name_or_path: str, vae: str, resume: str — file and directory paths for training data and model artifacts
User selects paths through file browsers, paths are validated for existence and permissions, then passed to training subprocess

LoRAConfig kohya_gui/class_lora_tab.py
Configuration dict for LoRA-specific parameters: network_module: str, network_dim: int, network_alpha: float, network_dropout: float, conv_dim: int, conv_alpha: float
Set through LoRA-specific GUI components, merged with base training config, then used to configure the adapter network architecture

SamplePrompts kohya_gui/class_sample_images.py
Text content written to sample/prompt.txt file containing newline-separated prompts for validation image generation during training
User enters prompts in text area, written to prompt.txt file in output directory, read by training script for periodic sample generation

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

The sd-scripts training modules (train_network.py, train_db.py, fine_tune.py) exist at predictable paths relative to the kohya_ss installation and are executable

If this fails: Training fails silently or with cryptic errors if sd-scripts is missing, installed elsewhere, or permissions prevent execution — user sees 'command not found' without understanding that external dependencies are missing

kohya_gui/class_command_executor.py:execute_command

critical Contract unguarded

The training script expects sample prompts in exactly 'output_dir/sample/prompt.txt' format and reads this specific file path during training

If this fails: If training script changes expected prompt file location or format, sample image generation silently fails during training without user notification — validation images never appear

kohya_gui/class_sample_images.py:create_prompt_file

critical Resource unguarded

The system has at least 'num_processes' GPUs available and sufficient VRAM for the selected mixed precision mode and batch size combination

If this fails: Training process crashes with CUDA out-of-memory errors or hangs indefinitely if GPU resources are insufficient — no validation occurs before launch

kohya_gui/class_accelerate_launch.py:num_processes

warning Contract unguarded

The TOML config file contains only valid parameter keys that match the expected schema — any typos or deprecated keys are ignored silently

If this fails: Invalid config keys get silently dropped, causing user settings to revert to defaults without warning — user thinks their custom settings are applied but training uses different values

kohya_gui/class_gui_config.py:load_config

warning Domain weakly guarded

Default learning rate of '1e-6' is appropriate for all model types (SD 1.5, SDXL, SD3, Flux.1) and training approaches (LoRA, Dreambooth, fine-tuning)

If this fails: Training converges extremely slowly or fails to learn with inappropriate learning rates — SDXL may need 1e-5, LoRA may need 1e-4, but system uses same default for all

kohya_gui/class_basic_training.py:learning_rate_value

warning Temporal weakly guarded

Only one training process should run at a time, but process state tracking relies on a single instance variable that could become stale if the process crashes or is killed externally

If this fails: If training process dies unexpectedly, GUI still thinks it's running and prevents new training starts — user must restart entire GUI to recover

kohya_gui/class_command_executor.py:process

warning Environment unguarded

The user running the GUI has write permissions to create directories in scriptdir/outputs, scriptdir/logs, and scriptdir/reg paths

If this fails: Directory creation fails silently or with permission denied errors, but training continues and then fails when trying to write outputs — confusing delayed failure mode

kohya_gui/class_folders.py:create_directory_if_not_exists

warning Scale unguarded

The selected GPU architecture supports the chosen mixed precision mode — fp16 requires compute capability 7.0+, bf16 requires Ampere+, fp8 requires Hopper+

If this fails: Training fails with cryptic CUDA errors or falls back to slower fp32 without notification — user expects performance benefits but gets degraded training speed

kohya_gui/class_accelerate_launch.py:mixed_precision

info Ordering unguarded

Sample prompts are written to the prompt file before training starts, but the training script may read this file at initialization — race condition if file is created after script launch

If this fails: Sample image generation uses empty or default prompts instead of user-specified ones if timing is wrong — validation images don't match user expectations

kohya_gui/class_sample_images.py:create_prompt_file

info Contract weakly guarded

SDXL parameters like cache_text_encoder_outputs and no_half_vae are only relevant when SDXL mode is enabled, but parameter validation doesn't enforce this constraint

If this fails: Non-SDXL training may receive SDXL-specific flags that are ignored or cause errors — confusing parameter interaction without clear error messages

kohya_gui/class_sdxl_parameters.py:initialize_accordion

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Training Configuration Store (file-store)
Persistent storage for user training preferences including default paths, hyperparameters, and GUI settings — allows sessions to resume with previous configurations

Model Output Directory (file-store)
Accumulates trained model files, training logs, sample images, and checkpoints — organized by training session with timestamped subdirectories

Training Data Cache (cache)
Preprocessed training images with associated caption files — training script caches tokenized captions and latent encodings to accelerate subsequent epochs

Active Process Registry (in-memory)
Tracks currently running training processes with PIDs and status — enables process control and prevents multiple concurrent training runs

Feedback Loops

Training Progress Loop (training-loop, reinforcing) — Trigger: Training subprocess starts. Action: Model processes batches, computes loss, updates weights via backpropagation, generates sample images at specified intervals. Exit: Reaches max epochs or user stops training.
GUI State Refresh (polling, balancing) — Trigger: User interaction or timer. Action: Gradio updates component visibility, validates parameter combinations, refreshes file browser contents. Exit: User closes interface.
Process Monitoring (polling, balancing) — Trigger: Training process active. Action: CommandExecutor checks process status via psutil, updates UI buttons, captures output streams. Exit: Process terminates or is killed.
Parameter Validation Cycle (auto-scale, balancing) — Trigger: Parameter change in GUI. Action: Validates parameter ranges, shows/hides conditional options, updates dependent parameter limits. Exit: Valid configuration reached.

Delays

Model Loading Warmup (warmup, ~30-120 seconds) — Training script loads pretrained model, VAE, and text encoders into GPU memory before first training step
Sample Image Generation (scheduled-job, ~configurable intervals) — Training pauses to generate validation images using current model state — helps monitor training quality but slows overall progress
Checkpoint Saving (checkpoint-save, ~10-60 seconds per save) — Training pauses to write model weights to disk at specified epoch intervals — provides recovery points but interrupts training flow
Configuration File I/O (async-processing, ~milliseconds) — TOML config loading/saving happens during GUI initialization and when user saves preferences — minimal impact on training
Process Launch Latency (compilation, ~5-15 seconds) — Delay between clicking Start Training and actual training beginning — includes Python process startup and import resolution

Control Points

Mixed Precision Mode (precision-mode) — Controls: GPU memory usage and training speed — fp16/bf16 reduces memory but may affect numerical stability. Default: fp16
Training Type Selection (architecture-switch) — Controls: Which sd-scripts module is launched and what parameters are available — fundamentally changes the training approach. Default: LoRA/Dreambooth/Fine-tune
SDXL Model Toggle (feature-flag) — Controls: Enables SDXL-specific parameters and uses different training script paths — affects model architecture and memory requirements. Default: disabled
Learning Rate Schedule (hyperparameter) — Controls: How learning rate changes during training — affects convergence speed and final model quality. Default: constant
Batch Size Scaling (hyperparameter) — Controls: Number of images processed simultaneously — balances training speed with memory usage and gradient stability. Default: user-configured
Multi-GPU Process Count (device-selection) — Controls: Number of parallel training processes and GPU utilization strategy. Default: 1
Output Directory Path (env-var) — Controls: Where trained models and logs are saved — affects file organization and disk space usage. Default: outputs/
Gradient Accumulation Steps (hyperparameter) — Controls: Effective batch size multiplication without increasing memory usage — simulates larger batches. Default: 1

Technology Stack

Gradio (framework)
Provides web-based GUI framework for creating interactive training parameter forms, file browsers, and progress monitoring interfaces

sd-scripts (library)
Core training library that performs the actual model training — wrapped and configured through this GUI system

Hugging Face Accelerate (library)
Handles distributed training coordination and mixed precision across multiple GPUs

PyTorch (library)
Underlying deep learning framework for model weights, tensor operations, and gradient computation

Transformers (library)
Provides pretrained model loading, tokenizers, and model architectures for Stable Diffusion components

Diffusers (library)
Implements diffusion model pipelines and components including schedulers, VAEs, and UNet architectures

TOML (serialization)
Configuration file format for storing user preferences and training parameter defaults

SafeTensors (serialization)
Secure tensor serialization format for saving trained model weights and LoRA adapters

psutil (library)
Process monitoring utilities for tracking training subprocess status and system resource usage

subprocess (runtime)
Python standard library for launching and managing the external training script processes

Key Components

KohyaSSGUIConfig (registry) — Loads and saves training configurations from TOML files — centralizes all persistent settings and provides get/set methods with defaults for missing values kohya_gui/class_gui_config.py
CommandExecutor (executor) — Manages subprocess execution of training scripts with process control — tracks running processes, provides start/stop buttons, and handles command-line argument assembly kohya_gui/class_command_executor.py
BasicTraining (factory) — Creates Gradio components for core training parameters like learning rate, scheduler, epochs, and batch size — handles parameter validation and SDXL-specific configurations kohya_gui/class_basic_training.py
AdvancedTraining (factory) — Provides GUI components for advanced training options including gradient accumulation, weighted captions, token padding control, and precision settings kohya_gui/class_advanced_training.py
AccelerateLaunch (adapter) — Configures Hugging Face Accelerate for distributed training — sets up multi-GPU parameters, mixed precision modes, and process coordination settings kohya_gui/class_accelerate_launch.py
Folders (resolver) — Manages file and directory selection through Gradio file browsers — validates paths, creates missing directories, and maintains default locations kohya_gui/class_folders.py
SDXLParameters (adapter) — Provides SDXL-specific training options like text encoder caching, half-precision VAE control, and fused backward pass — conditionally visible based on model type selection kohya_gui/class_sdxl_parameters.py
SampleImages (processor) — Creates prompt files for validation image generation during training — takes user prompts and writes them to the expected file format for the training script kohya_gui/class_sample_images.py
GradioMergeLoRaTab (processor) — Provides tools for merging multiple LoRA adapters into base models or combining LoRA weights — includes ratio controls and output format selection kohya_gui/merge_lora_gui.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is kohya_ss used for?

Trains custom Stable Diffusion models using LoRA, Dreambooth, and fine-tuning with a Gradio web interface bmaltais/kohya_ss is a 9-component ml training written in Python. Data flows through 8 distinct pipeline stages. The codebase contains 86 files.

How is kohya_ss architected?

kohya_ss is organized into 5 architecture layers: Web Interface, Configuration Management, Command Assembly, Training Execution, and 1 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through kohya_ss?

Data moves through 8 stages: Load configuration defaults → Collect GUI parameters → Select training paths → Prepare sample prompts → Assemble training command → .... Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory. This pipeline design reflects a complex multi-stage processing system.

What technologies does kohya_ss use?

The core stack includes Gradio (Provides web-based GUI framework for creating interactive training parameter forms, file browsers, and progress monitoring interfaces), sd-scripts (Core training library that performs the actual model training — wrapped and configured through this GUI system), Hugging Face Accelerate (Handles distributed training coordination and mixed precision across multiple GPUs), PyTorch (Underlying deep learning framework for model weights, tensor operations, and gradient computation), Transformers (Provides pretrained model loading, tokenizers, and model architectures for Stable Diffusion components), Diffusers (Implements diffusion model pipelines and components including schedulers, VAEs, and UNet architectures), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does kohya_ss have?

kohya_ss exhibits 4 data pools (Training Configuration Store, Model Output Directory), 4 feedback loops, 8 control points, 5 delays. The feedback loops handle training-loop and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does kohya_ss use?

5 design patterns detected: GUI Component Factory, Configuration-Driven Defaults, Subprocess Command Assembly, Conditional UI Visibility, Tool Collection Tabs.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.