bmaltais/kohya_ss

12,195 stars Python 9 components

Trains custom Stable Diffusion models using LoRA, Dreambooth, and fine-tuning with a Gradio web interface

Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory.

Under the hood, the system uses 4 feedback loops, 4 data pools, 8 control points to manage its runtime behavior.

A 9-component ml training. 86 files analyzed. Data flows through 8 distinct pipeline stages.

How Data Flows Through the System

Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory.

  1. Load configuration defaults — KohyaSSGUIConfig reads config.toml and populates default values for all training parameters — sets up folder paths, training hyperparameters, and model-specific options based on saved preferences
  2. Collect GUI parameters — Gradio components in BasicTraining, AdvancedTraining, and model-specific classes gather user input — validates parameter ranges, handles conditional visibility based on training type, and ensures parameter compatibility [TrainingConfig → TrainingParameters] (config: training.learning_rate, training.batch_size, training.max_train_epochs +1)
  3. Select training paths — Folders class provides file browsers for selecting training data directory, pretrained model, output location, and optional VAE — creates missing directories and validates path accessibility [TrainingConfig → ModelPaths] (config: folders.output_dir, folders.logging_dir, folders.reg_data_dir)
  4. Prepare sample prompts — SampleImages.create_prompt_file writes user-provided prompts to sample/prompt.txt in the output directory — formats prompts for the training script's validation image generation system [SamplePrompts → Prompt Files]
  5. Assemble training command — CommandExecutor combines all parameters into command-line arguments for the appropriate sd-scripts module — selects train_network.py for LoRA, train_db.py for Dreambooth, or fine_tune.py based on training type [TrainingParameters → Training Command]
  6. Launch training subprocess — CommandExecutor.execute_command runs the assembled command via subprocess.Popen — includes accelerate launch wrapper for distributed training and provides process monitoring with start/stop controls [Training Command → Training Process State] (config: accelerate_launch.num_processes, accelerate_launch.num_machines)
  7. Monitor training progress — Process stdout/stderr streams are captured and displayed in the Gradio interface — training script writes loss values, sample images, and checkpoint saves to the logging directory [Training Process State → Training Logs]
  8. Save trained model — Training script saves the final LoRA adapters or fine-tuned model weights to the output directory — includes safetensors format with optional metadata like title, author, and tags [Training State → Trained Model Files] (config: metadata.title, metadata.author, metadata.description)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

TrainingConfig kohya_gui/class_gui_config.py
TOML-based configuration dict with nested sections: training.learning_rate, training.batch_size, folders.output_dir, sdxl.cache_text_encoder_outputs, accelerate_launch.mixed_precision
Loaded from config.toml on startup, updated through GUI interactions, validated before training execution, and saved back to file
TrainingParameters kohya_gui/class_basic_training.py
Gradio component values dict with keys like learning_rate: float, lr_scheduler: str, train_batch_size: int, max_train_epochs: int, mixed_precision: str, gradient_accumulation_steps: int
Collected from Gradio form inputs, validated for parameter compatibility, then formatted into command-line arguments
ModelPaths kohya_gui/class_folders.py
Dict with train_data_dir: str, output_dir: str, logging_dir: str, pretrained_model_name_or_path: str, vae: str, resume: str — file and directory paths for training data and model artifacts
User selects paths through file browsers, paths are validated for existence and permissions, then passed to training subprocess
LoRAConfig kohya_gui/class_lora_tab.py
Configuration dict for LoRA-specific parameters: network_module: str, network_dim: int, network_alpha: float, network_dropout: float, conv_dim: int, conv_alpha: float
Set through LoRA-specific GUI components, merged with base training config, then used to configure the adapter network architecture
SamplePrompts kohya_gui/class_sample_images.py
Text content written to sample/prompt.txt file containing newline-separated prompts for validation image generation during training
User enters prompts in text area, written to prompt.txt file in output directory, read by training script for periodic sample generation

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

The sd-scripts training modules (train_network.py, train_db.py, fine_tune.py) exist at predictable paths relative to the kohya_ss installation and are executable

If this fails: Training fails silently or with cryptic errors if sd-scripts is missing, installed elsewhere, or permissions prevent execution — user sees 'command not found' without understanding that external dependencies are missing

kohya_gui/class_command_executor.py:execute_command
critical Contract unguarded

The training script expects sample prompts in exactly 'output_dir/sample/prompt.txt' format and reads this specific file path during training

If this fails: If training script changes expected prompt file location or format, sample image generation silently fails during training without user notification — validation images never appear

kohya_gui/class_sample_images.py:create_prompt_file
critical Resource unguarded

The system has at least 'num_processes' GPUs available and sufficient VRAM for the selected mixed precision mode and batch size combination

If this fails: Training process crashes with CUDA out-of-memory errors or hangs indefinitely if GPU resources are insufficient — no validation occurs before launch

kohya_gui/class_accelerate_launch.py:num_processes
warning Contract unguarded

The TOML config file contains only valid parameter keys that match the expected schema — any typos or deprecated keys are ignored silently

If this fails: Invalid config keys get silently dropped, causing user settings to revert to defaults without warning — user thinks their custom settings are applied but training uses different values

kohya_gui/class_gui_config.py:load_config
warning Domain weakly guarded

Default learning rate of '1e-6' is appropriate for all model types (SD 1.5, SDXL, SD3, Flux.1) and training approaches (LoRA, Dreambooth, fine-tuning)

If this fails: Training converges extremely slowly or fails to learn with inappropriate learning rates — SDXL may need 1e-5, LoRA may need 1e-4, but system uses same default for all

kohya_gui/class_basic_training.py:learning_rate_value
warning Temporal weakly guarded

Only one training process should run at a time, but process state tracking relies on a single instance variable that could become stale if the process crashes or is killed externally

If this fails: If training process dies unexpectedly, GUI still thinks it's running and prevents new training starts — user must restart entire GUI to recover

kohya_gui/class_command_executor.py:process
warning Environment unguarded

The user running the GUI has write permissions to create directories in scriptdir/outputs, scriptdir/logs, and scriptdir/reg paths

If this fails: Directory creation fails silently or with permission denied errors, but training continues and then fails when trying to write outputs — confusing delayed failure mode

kohya_gui/class_folders.py:create_directory_if_not_exists
warning Scale unguarded

The selected GPU architecture supports the chosen mixed precision mode — fp16 requires compute capability 7.0+, bf16 requires Ampere+, fp8 requires Hopper+

If this fails: Training fails with cryptic CUDA errors or falls back to slower fp32 without notification — user expects performance benefits but gets degraded training speed

kohya_gui/class_accelerate_launch.py:mixed_precision
info Ordering unguarded

Sample prompts are written to the prompt file before training starts, but the training script may read this file at initialization — race condition if file is created after script launch

If this fails: Sample image generation uses empty or default prompts instead of user-specified ones if timing is wrong — validation images don't match user expectations

kohya_gui/class_sample_images.py:create_prompt_file
info Contract weakly guarded

SDXL parameters like cache_text_encoder_outputs and no_half_vae are only relevant when SDXL mode is enabled, but parameter validation doesn't enforce this constraint

If this fails: Non-SDXL training may receive SDXL-specific flags that are ignored or cause errors — confusing parameter interaction without clear error messages

kohya_gui/class_sdxl_parameters.py:initialize_accordion

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Training Configuration Store (file-store)
Persistent storage for user training preferences including default paths, hyperparameters, and GUI settings — allows sessions to resume with previous configurations
Model Output Directory (file-store)
Accumulates trained model files, training logs, sample images, and checkpoints — organized by training session with timestamped subdirectories
Training Data Cache (cache)
Preprocessed training images with associated caption files — training script caches tokenized captions and latent encodings to accelerate subsequent epochs
Active Process Registry (in-memory)
Tracks currently running training processes with PIDs and status — enables process control and prevents multiple concurrent training runs

Feedback Loops

Delays

Control Points

Technology Stack

Gradio (framework)
Provides web-based GUI framework for creating interactive training parameter forms, file browsers, and progress monitoring interfaces
sd-scripts (library)
Core training library that performs the actual model training — wrapped and configured through this GUI system
Hugging Face Accelerate (library)
Handles distributed training coordination and mixed precision across multiple GPUs
PyTorch (library)
Underlying deep learning framework for model weights, tensor operations, and gradient computation
Transformers (library)
Provides pretrained model loading, tokenizers, and model architectures for Stable Diffusion components
Diffusers (library)
Implements diffusion model pipelines and components including schedulers, VAEs, and UNet architectures
TOML (serialization)
Configuration file format for storing user preferences and training parameter defaults
SafeTensors (serialization)
Secure tensor serialization format for saving trained model weights and LoRA adapters
psutil (library)
Process monitoring utilities for tracking training subprocess status and system resource usage
subprocess (runtime)
Python standard library for launching and managing the external training script processes

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is kohya_ss used for?

Trains custom Stable Diffusion models using LoRA, Dreambooth, and fine-tuning with a Gradio web interface bmaltais/kohya_ss is a 9-component ml training written in Python. Data flows through 8 distinct pipeline stages. The codebase contains 86 files.

How is kohya_ss architected?

kohya_ss is organized into 5 architecture layers: Web Interface, Configuration Management, Command Assembly, Training Execution, and 1 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through kohya_ss?

Data moves through 8 stages: Load configuration defaults → Collect GUI parameters → Select training paths → Prepare sample prompts → Assemble training command → .... Users configure training parameters through the Gradio interface, which loads defaults from config.toml and validates parameter combinations. The system assembles these parameters into command-line arguments, launches the appropriate sd-scripts training module via subprocess, and monitors the training process. Sample images are generated periodically using prompts written to a file, and the final trained model (LoRA adapters or fine-tuned weights) is saved to the specified output directory. This pipeline design reflects a complex multi-stage processing system.

What technologies does kohya_ss use?

The core stack includes Gradio (Provides web-based GUI framework for creating interactive training parameter forms, file browsers, and progress monitoring interfaces), sd-scripts (Core training library that performs the actual model training — wrapped and configured through this GUI system), Hugging Face Accelerate (Handles distributed training coordination and mixed precision across multiple GPUs), PyTorch (Underlying deep learning framework for model weights, tensor operations, and gradient computation), Transformers (Provides pretrained model loading, tokenizers, and model architectures for Stable Diffusion components), Diffusers (Implements diffusion model pipelines and components including schedulers, VAEs, and UNet architectures), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does kohya_ss have?

kohya_ss exhibits 4 data pools (Training Configuration Store, Model Output Directory), 4 feedback loops, 8 control points, 5 delays. The feedback loops handle training-loop and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does kohya_ss use?

5 design patterns detected: GUI Component Factory, Configuration-Driven Defaults, Subprocess Command Assembly, Conditional UI Visibility, Tool Collection Tabs.

Analyzed on April 20, 2026 by CodeSea. Written by .