bmaltais/kohya_ss
Trains custom Stable Diffusion models using LoRA, Dreambooth, and fine-tuning with a Gradio web interface
Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory.
Under the hood, the system uses 4 feedback loops, 4 data pools, 8 control points to manage its runtime behavior.
A 9-component ml training. 86 files analyzed. Data flows through 8 distinct pipeline stages.
How Data Flows Through the System
Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory.
- Load configuration defaults — KohyaSSGUIConfig reads config.toml and populates default values for all training parameters — sets up folder paths, training hyperparameters, and model-specific options based on saved preferences
- Collect GUI parameters — Gradio components in BasicTraining, AdvancedTraining, and model-specific classes gather user input — validates parameter ranges, handles conditional visibility based on training type, and ensures parameter compatibility [TrainingConfig → TrainingParameters] (config: training.learning_rate, training.batch_size, training.max_train_epochs +1)
- Select training paths — Folders class provides file browsers for selecting training data directory, pretrained model, output location, and optional VAE — creates missing directories and validates path accessibility [TrainingConfig → ModelPaths] (config: folders.output_dir, folders.logging_dir, folders.reg_data_dir)
- Prepare sample prompts — SampleImages.create_prompt_file writes user-provided prompts to sample/prompt.txt in the output directory — formats prompts for the training script's validation image generation system [SamplePrompts → Prompt Files]
- Assemble training command — CommandExecutor combines all parameters into command-line arguments for the appropriate sd-scripts module — selects train_network.py for LoRA, train_db.py for Dreambooth, or fine_tune.py based on training type [TrainingParameters → Training Command]
- Launch training subprocess — CommandExecutor.execute_command runs the assembled command via subprocess.Popen — includes accelerate launch wrapper for distributed training and provides process monitoring with start/stop controls [Training Command → Training Process State] (config: accelerate_launch.num_processes, accelerate_launch.num_machines)
- Monitor training progress — Process stdout/stderr streams are captured and displayed in the Gradio interface — training script writes loss values, sample images, and checkpoint saves to the logging directory [Training Process State → Training Logs]
- Save trained model — Training script saves the final LoRA adapters or fine-tuned model weights to the output directory — includes safetensors format with optional metadata like title, author, and tags [Training State → Trained Model Files] (config: metadata.title, metadata.author, metadata.description)
Data Models
The data structures that flow between stages — the contracts that hold the system together.
kohya_gui/class_gui_config.pyTOML-based configuration dict with nested sections: training.learning_rate, training.batch_size, folders.output_dir, sdxl.cache_text_encoder_outputs, accelerate_launch.mixed_precision
Loaded from config.toml on startup, updated through GUI interactions, validated before training execution, and saved back to file
kohya_gui/class_basic_training.pyGradio component values dict with keys like learning_rate: float, lr_scheduler: str, train_batch_size: int, max_train_epochs: int, mixed_precision: str, gradient_accumulation_steps: int
Collected from Gradio form inputs, validated for parameter compatibility, then formatted into command-line arguments
kohya_gui/class_folders.pyDict with train_data_dir: str, output_dir: str, logging_dir: str, pretrained_model_name_or_path: str, vae: str, resume: str — file and directory paths for training data and model artifacts
User selects paths through file browsers, paths are validated for existence and permissions, then passed to training subprocess
kohya_gui/class_lora_tab.pyConfiguration dict for LoRA-specific parameters: network_module: str, network_dim: int, network_alpha: float, network_dropout: float, conv_dim: int, conv_alpha: float
Set through LoRA-specific GUI components, merged with base training config, then used to configure the adapter network architecture
kohya_gui/class_sample_images.pyText content written to sample/prompt.txt file containing newline-separated prompts for validation image generation during training
User enters prompts in text area, written to prompt.txt file in output directory, read by training script for periodic sample generation
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
The sd-scripts training modules (train_network.py, train_db.py, fine_tune.py) exist at predictable paths relative to the kohya_ss installation and are executable
If this fails: Training fails silently or with cryptic errors if sd-scripts is missing, installed elsewhere, or permissions prevent execution — user sees 'command not found' without understanding that external dependencies are missing
kohya_gui/class_command_executor.py:execute_command
The training script expects sample prompts in exactly 'output_dir/sample/prompt.txt' format and reads this specific file path during training
If this fails: If training script changes expected prompt file location or format, sample image generation silently fails during training without user notification — validation images never appear
kohya_gui/class_sample_images.py:create_prompt_file
The system has at least 'num_processes' GPUs available and sufficient VRAM for the selected mixed precision mode and batch size combination
If this fails: Training process crashes with CUDA out-of-memory errors or hangs indefinitely if GPU resources are insufficient — no validation occurs before launch
kohya_gui/class_accelerate_launch.py:num_processes
The TOML config file contains only valid parameter keys that match the expected schema — any typos or deprecated keys are ignored silently
If this fails: Invalid config keys get silently dropped, causing user settings to revert to defaults without warning — user thinks their custom settings are applied but training uses different values
kohya_gui/class_gui_config.py:load_config
Default learning rate of '1e-6' is appropriate for all model types (SD 1.5, SDXL, SD3, Flux.1) and training approaches (LoRA, Dreambooth, fine-tuning)
If this fails: Training converges extremely slowly or fails to learn with inappropriate learning rates — SDXL may need 1e-5, LoRA may need 1e-4, but system uses same default for all
kohya_gui/class_basic_training.py:learning_rate_value
Only one training process should run at a time, but process state tracking relies on a single instance variable that could become stale if the process crashes or is killed externally
If this fails: If training process dies unexpectedly, GUI still thinks it's running and prevents new training starts — user must restart entire GUI to recover
kohya_gui/class_command_executor.py:process
The user running the GUI has write permissions to create directories in scriptdir/outputs, scriptdir/logs, and scriptdir/reg paths
If this fails: Directory creation fails silently or with permission denied errors, but training continues and then fails when trying to write outputs — confusing delayed failure mode
kohya_gui/class_folders.py:create_directory_if_not_exists
The selected GPU architecture supports the chosen mixed precision mode — fp16 requires compute capability 7.0+, bf16 requires Ampere+, fp8 requires Hopper+
If this fails: Training fails with cryptic CUDA errors or falls back to slower fp32 without notification — user expects performance benefits but gets degraded training speed
kohya_gui/class_accelerate_launch.py:mixed_precision
Sample prompts are written to the prompt file before training starts, but the training script may read this file at initialization — race condition if file is created after script launch
If this fails: Sample image generation uses empty or default prompts instead of user-specified ones if timing is wrong — validation images don't match user expectations
kohya_gui/class_sample_images.py:create_prompt_file
SDXL parameters like cache_text_encoder_outputs and no_half_vae are only relevant when SDXL mode is enabled, but parameter validation doesn't enforce this constraint
If this fails: Non-SDXL training may receive SDXL-specific flags that are ignored or cause errors — confusing parameter interaction without clear error messages
kohya_gui/class_sdxl_parameters.py:initialize_accordion
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Persistent storage for user training preferences including default paths, hyperparameters, and GUI settings — allows sessions to resume with previous configurations
Accumulates trained model files, training logs, sample images, and checkpoints — organized by training session with timestamped subdirectories
Preprocessed training images with associated caption files — training script caches tokenized captions and latent encodings to accelerate subsequent epochs
Tracks currently running training processes with PIDs and status — enables process control and prevents multiple concurrent training runs
Feedback Loops
- Training Progress Loop (training-loop, reinforcing) — Trigger: Training subprocess starts. Action: Model processes batches, computes loss, updates weights via backpropagation, generates sample images at specified intervals. Exit: Reaches max epochs or user stops training.
- GUI State Refresh (polling, balancing) — Trigger: User interaction or timer. Action: Gradio updates component visibility, validates parameter combinations, refreshes file browser contents. Exit: User closes interface.
- Process Monitoring (polling, balancing) — Trigger: Training process active. Action: CommandExecutor checks process status via psutil, updates UI buttons, captures output streams. Exit: Process terminates or is killed.
- Parameter Validation Cycle (auto-scale, balancing) — Trigger: Parameter change in GUI. Action: Validates parameter ranges, shows/hides conditional options, updates dependent parameter limits. Exit: Valid configuration reached.
Delays
- Model Loading Warmup (warmup, ~30-120 seconds) — Training script loads pretrained model, VAE, and text encoders into GPU memory before first training step
- Sample Image Generation (scheduled-job, ~configurable intervals) — Training pauses to generate validation images using current model state — helps monitor training quality but slows overall progress
- Checkpoint Saving (checkpoint-save, ~10-60 seconds per save) — Training pauses to write model weights to disk at specified epoch intervals — provides recovery points but interrupts training flow
- Configuration File I/O (async-processing, ~milliseconds) — TOML config loading/saving happens during GUI initialization and when user saves preferences — minimal impact on training
- Process Launch Latency (compilation, ~5-15 seconds) — Delay between clicking Start Training and actual training beginning — includes Python process startup and import resolution
Control Points
- Mixed Precision Mode (precision-mode) — Controls: GPU memory usage and training speed — fp16/bf16 reduces memory but may affect numerical stability. Default: fp16
- Training Type Selection (architecture-switch) — Controls: Which sd-scripts module is launched and what parameters are available — fundamentally changes the training approach. Default: LoRA/Dreambooth/Fine-tune
- SDXL Model Toggle (feature-flag) — Controls: Enables SDXL-specific parameters and uses different training script paths — affects model architecture and memory requirements. Default: disabled
- Learning Rate Schedule (hyperparameter) — Controls: How learning rate changes during training — affects convergence speed and final model quality. Default: constant
- Batch Size Scaling (hyperparameter) — Controls: Number of images processed simultaneously — balances training speed with memory usage and gradient stability. Default: user-configured
- Multi-GPU Process Count (device-selection) — Controls: Number of parallel training processes and GPU utilization strategy. Default: 1
- Output Directory Path (env-var) — Controls: Where trained models and logs are saved — affects file organization and disk space usage. Default: outputs/
- Gradient Accumulation Steps (hyperparameter) — Controls: Effective batch size multiplication without increasing memory usage — simulates larger batches. Default: 1
Technology Stack
Provides web-based GUI framework for creating interactive training parameter forms, file browsers, and progress monitoring interfaces
Core training library that performs the actual model training — wrapped and configured through this GUI system
Handles distributed training coordination and mixed precision across multiple GPUs
Underlying deep learning framework for model weights, tensor operations, and gradient computation
Provides pretrained model loading, tokenizers, and model architectures for Stable Diffusion components
Implements diffusion model pipelines and components including schedulers, VAEs, and UNet architectures
Configuration file format for storing user preferences and training parameter defaults
Secure tensor serialization format for saving trained model weights and LoRA adapters
Process monitoring utilities for tracking training subprocess status and system resource usage
Python standard library for launching and managing the external training script processes
Key Components
- KohyaSSGUIConfig (registry) — Loads and saves training configurations from TOML files — centralizes all persistent settings and provides get/set methods with defaults for missing values
kohya_gui/class_gui_config.py - CommandExecutor (executor) — Manages subprocess execution of training scripts with process control — tracks running processes, provides start/stop buttons, and handles command-line argument assembly
kohya_gui/class_command_executor.py - BasicTraining (factory) — Creates Gradio components for core training parameters like learning rate, scheduler, epochs, and batch size — handles parameter validation and SDXL-specific configurations
kohya_gui/class_basic_training.py - AdvancedTraining (factory) — Provides GUI components for advanced training options including gradient accumulation, weighted captions, token padding control, and precision settings
kohya_gui/class_advanced_training.py - AccelerateLaunch (adapter) — Configures Hugging Face Accelerate for distributed training — sets up multi-GPU parameters, mixed precision modes, and process coordination settings
kohya_gui/class_accelerate_launch.py - Folders (resolver) — Manages file and directory selection through Gradio file browsers — validates paths, creates missing directories, and maintains default locations
kohya_gui/class_folders.py - SDXLParameters (adapter) — Provides SDXL-specific training options like text encoder caching, half-precision VAE control, and fused backward pass — conditionally visible based on model type selection
kohya_gui/class_sdxl_parameters.py - SampleImages (processor) — Creates prompt files for validation image generation during training — takes user prompts and writes them to the expected file format for the training script
kohya_gui/class_sample_images.py - GradioMergeLoRaTab (processor) — Provides tools for merging multiple LoRA adapters into base models or combining LoRA weights — includes ratio controls and output format selection
kohya_gui/merge_lora_gui.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is kohya_ss used for?
Trains custom Stable Diffusion models using LoRA, Dreambooth, and fine-tuning with a Gradio web interface bmaltais/kohya_ss is a 9-component ml training written in Python. Data flows through 8 distinct pipeline stages. The codebase contains 86 files.
How is kohya_ss architected?
kohya_ss is organized into 5 architecture layers: Web Interface, Configuration Management, Command Assembly, Training Execution, and 1 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through kohya_ss?
Data moves through 8 stages: Load configuration defaults → Collect GUI parameters → Select training paths → Prepare sample prompts → Assemble training command → .... Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory. This pipeline design reflects a complex multi-stage processing system.
What technologies does kohya_ss use?
The core stack includes Gradio (Provides web-based GUI framework for creating interactive training parameter forms, file browsers, and progress monitoring interfaces), sd-scripts (Core training library that performs the actual model training — wrapped and configured through this GUI system), Hugging Face Accelerate (Handles distributed training coordination and mixed precision across multiple GPUs), PyTorch (Underlying deep learning framework for model weights, tensor operations, and gradient computation), Transformers (Provides pretrained model loading, tokenizers, and model architectures for Stable Diffusion components), Diffusers (Implements diffusion model pipelines and components including schedulers, VAEs, and UNet architectures), and 4 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does kohya_ss have?
kohya_ss exhibits 4 data pools (Training Configuration Store, Model Output Directory), 4 feedback loops, 8 control points, 5 delays. The feedback loops handle training-loop and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does kohya_ss use?
5 design patterns detected: GUI Component Factory, Configuration-Driven Defaults, Subprocess Command Assembly, Conditional UI Visibility, Tool Collection Tabs.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.