Hidden Assumptions in kohya_ss
15 assumptions this code never checks · 3 critical · spanning Environment, Contract, Resource, Domain, Temporal, Scale, Ordering, Shape
Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at bmaltais/kohya_ss and picked out the few most likely to cause trouble. The full list is just below.
Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".
Training fails silently or with cryptic errors if sd-scripts is missing, installed elsewhere, or permissions prevent execution — user sees 'command not found' without understanding that external dependencies are missing
If training script changes expected prompt file location or format, sample image generation silently fails during training without user notification — validation images never appear
Training process crashes with CUDA out-of-memory errors or hangs indefinitely if GPU resources are insufficient — no validation occurs before launch
Show everything (12 more)
The TOML config file contains only valid parameter keys that match the expected schema — any typos or deprecated keys are ignored silently
If this fails: Invalid config keys get silently dropped, causing user settings to revert to defaults without warning — user thinks their custom settings are applied but training uses different values
kohya_gui/class_gui_config.py:load_config
Default learning rate of '1e-6' is appropriate for all model types (SD 1.5, SDXL, SD3, Flux.1) and training approaches (LoRA, Dreambooth, fine-tuning)
If this fails: Training converges extremely slowly or fails to learn with inappropriate learning rates — SDXL may need 1e-5, LoRA may need 1e-4, but system uses same default for all
kohya_gui/class_basic_training.py:learning_rate_value
Only one training process should run at a time, but process state tracking relies on a single instance variable that could become stale if the process crashes or is killed externally
If this fails: If training process dies unexpectedly, GUI still thinks it's running and prevents new training starts — user must restart entire GUI to recover
kohya_gui/class_command_executor.py:process
The user running the GUI has write permissions to create directories in scriptdir/outputs, scriptdir/logs, and scriptdir/reg paths
If this fails: Directory creation fails silently or with permission denied errors, but training continues and then fails when trying to write outputs — confusing delayed failure mode
kohya_gui/class_folders.py:create_directory_if_not_exists
The selected GPU architecture supports the chosen mixed precision mode — fp16 requires compute capability 7.0+, bf16 requires Ampere+, fp8 requires Hopper+
If this fails: Training fails with cryptic CUDA errors or falls back to slower fp32 without notification — user expects performance benefits but gets degraded training speed
kohya_gui/class_accelerate_launch.py:mixed_precision
Sample prompts are written to the prompt file before training starts, but the training script may read this file at initialization — race condition if file is created after script launch
If this fails: Sample image generation uses empty or default prompts instead of user-specified ones if timing is wrong — validation images don't match user expectations
kohya_gui/class_sample_images.py:create_prompt_file
SDXL parameters like cache_text_encoder_outputs and no_half_vae are only relevant when SDXL mode is enabled, but parameter validation doesn't enforce this constraint
If this fails: Non-SDXL training may receive SDXL-specific flags that are ignored or cause errors — confusing parameter interaction without clear error messages
kohya_gui/class_sdxl_parameters.py:initialize_accordion
Gradient accumulation steps and batch size are configured independently, but effective batch size = batch_size * gradient_accumulation_steps must fit within GPU memory constraints
If this fails: User can configure valid individual values that combine to exceed VRAM limits — training crashes with memory errors despite valid individual parameters
kohya_gui/class_advanced_training.py:gradient_accumulation_steps
Config file writes are atomic and won't be interrupted — partial writes could corrupt the TOML file if system crashes during save
If this fails: Corrupted config files cause startup failures or silent loss of user settings — no backup or recovery mechanism for configuration data
kohya_gui/class_gui_config.py:save_config
The output directory has sufficient disk space for model checkpoints, logs, and sample images — typical LoRA training can generate several GB of data
If this fails: Training fails mid-process when disk fills up, potentially corrupting checkpoint files — no pre-flight disk space validation
kohya_gui/class_folders.py:current_output_dir
The subprocess environment inherits all necessary environment variables (CUDA_VISIBLE_DEVICES, LD_LIBRARY_PATH) required by the training scripts
If this fails: Training script can't access GPUs or required libraries despite them being available to the GUI process — mysterious 'no GPU found' errors
kohya_gui/class_command_executor.py:execute_command
Sample prompts are plain text strings that can be written to a file with UTF-8 encoding, but prompts containing special characters or null bytes could corrupt the file
If this fails: Training script fails to parse malformed prompt file or generates unexpected results — sample images don't match intended prompts
kohya_gui/class_sample_images.py:create_prompt_file
See the full structural analysis of kohya_ss: the pipeline, data models, and system behavior that put these assumptions in context.
Full analysis of bmaltais/kohya_ss →Frequently Asked Questions
What does kohya_ss assume that could break in production?
The one most likely to cause trouble: The sd-scripts training modules (train_network.py, train_db.py, fine_tune.py) exist at predictable paths relative to the kohya_ss installation and are executable If this fails, Training fails silently or with cryptic errors if sd-scripts is missing, installed elsewhere, or permissions prevent execution — user sees 'command not found' without understanding that external dependencies are missing
How many hidden assumptions does kohya_ss have?
CodeSea found 15 assumptions kohya_ss relies on but never validates, 3 of them critical, spanning Environment, Contract, Resource, Domain, Temporal, Scale, Ordering, Shape. Most are routine — the analysis flags the two or three most likely to actually bite.
What is a hidden assumption?
Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.