axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
Fine-tunes large language models using optimized training loops and LoRA adapters
Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support.
Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.
A 10-component ml training. 655 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support.
- Load configuration — ConfigLoader reads YAML files from disk, validates them against AxolotlInputConfig schema using Pydantic, and applies environment variable overrides and model-specific defaults (config: base_model, model_type, sequence_len +1)
- Initialize model and tokenizer — ModelBuilder loads the transformer model from HuggingFace, applies architecture-specific monkey patches, and optionally wraps with LoRA adapters based on configuration [AxolotlInputConfig → Configured model] (config: base_model, lora_r, lora_alpha +1)
- Preprocess datasets — DatasetManager loads raw datasets, applies prompt strategies to format conversations into ChatMessage objects, then tokenizes using the model's tokenizer into TrainingBatch format [AxolotlInputConfig → TrainingBatch] (config: datasets, sequence_len, chat_template)
- Execute training loop — AxolotlTrainer feeds batches through the model forward pass, computes loss on logits vs labels, accumulates gradients over gradient_accumulation_steps, and updates parameters [TrainingBatch → ModelOutput] (config: batch_size, gradient_accumulation_steps, learning_rate +1)
- Save checkpoints — Trainer periodically saves model state, optimizer state, and LoRA adapter weights to disk in HuggingFace format for resuming or inference [ModelOutput → Model checkpoints] (config: output_dir, save_steps, save_total_limit)
- Serve model — VLLMLoRAServer loads the trained model and any LoRA adapters into vLLM engine, accepts HTTP requests with VLLMRequest format, and returns generated text responses [VLLMRequest → Generated text] (config: model_name, lora_modules)
Data Models
The data structures that flow between stages — the contracts that hold the system together.
src/axolotl/utils/schemas/config.pyPydantic model with base_model: str, model_type: str, tokenizer_type: str, sequence_len: int, datasets: list[DatasetConfig], lora_r: int, lora_alpha: int, learning_rate: float, batch_size: int, gradient_accumulation_steps: int, num_epochs: int
Loaded from YAML files at startup, validated by Pydantic, passed through training pipeline to configure all components
src/axolotl/core/trainer_builder.pydict with input_ids: torch.Tensor[B, seq_len], attention_mask: torch.Tensor[B, seq_len], labels: torch.Tensor[B, seq_len], position_ids: torch.Tensor[B, seq_len]
Created by dataset collation from tokenized examples, fed to model during training steps, consumed by loss functions
src/axolotl/prompt_strategies/dict with role: str (user|assistant|system), content: str, optionally name: str for function calls
Parsed from raw dataset conversations, formatted using chat templates, converted to token sequences
src/axolotl/core/adapters/dict with r: int, alpha: int, target_modules: list[str], dropout: float, bias: str, task_type: str, modules_to_save: list[str]
Extracted from main config, used to initialize PEFT LoRA adapters on transformer layers, saves only adapter weights
transformers libraryobject with loss: torch.Tensor, logits: torch.Tensor[B, seq_len, vocab_size], hidden_states: tuple[torch.Tensor], attentions: tuple[torch.Tensor]
Generated by model forward pass, loss extracted for backpropagation, logits used for evaluation metrics
src/axolotl/scripts/vllm_serve_lora.pyPydantic model with messages: list[list[dict]], n: int, temperature: float, top_p: float, max_tokens: int, generation_kwargs: dict
Parsed from incoming HTTP requests, validated, passed to vLLM engine for text generation
examples/ebft/dict with prompt: list[dict] (chat messages), ground_truth: str for completion tasks, or input_ids: list[int] for pretrain tasks
Created by EBFT-specific dataset transforms, used in strided training loops where anchors are placed at completion boundaries
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
BASE_VOLUME environment variable points to a writable directory with sufficient disk space for model outputs, checkpoints, and datasets
If this fails: If BASE_VOLUME points to read-only filesystem or runs out of space during training, checkpoint saves will silently fail or crash mid-training with cryptic I/O errors
.runpod/src/handler.py:BASE_VOLUME
CUDA memory allocator configuration is appropriate for the available GPU memory and model size being trained
If this fails: If set_pytorch_cuda_alloc_conf sets memory fractions too high for the actual GPU, training crashes with OOM errors after successful startup, wasting preprocessing time
src/axolotl/cli/main.py:set_pytorch_cuda_alloc_conf
Preprocessing step completes before training timeout, and cached preprocessed data remains valid between preprocessing and training phases
If this fails: If preprocessing takes longer than job timeout or cached data becomes stale, training starts with corrupted/incomplete datasets producing wrong model outputs
.runpod/src/train.py:preprocess command
RunPod job input contains 'args' dict with all required training parameters (base_model, datasets, learning_rate, etc.) matching AxolotlInputConfig schema
If this fails: Missing required config keys cause Pydantic validation to fail during config loading, but error happens after preprocessing completes, wasting computation time
.runpod/src/handler.py:inputs.get('args', {})
GPU with specified gpu_id exists and is not already occupied by another process
If this fails: If GPU is busy or doesn't exist, CUDA operations fail with device errors, but process may hang instead of failing fast
.runpod/src/train.py:CUDA_VISIBLE_DEVICES
Preprocessing must complete successfully before training can begin, and no concurrent access to dataset cache occurs
If this fails: If preprocessing partially fails but returns success code, training proceeds with incomplete tokenized data leading to silent training degradation
.runpod/src/train.py:preprocess then train sequence
BASE_VOLUME has unlimited subdirectory creation permissions and no filesystem limits on directory depth
If this fails: If filesystem limits directory creation or run_id contains path traversal characters, output_dir creation silently fails causing checkpoint loss
.runpod/src/handler.py:output_dir creation
All values in args dict are YAML-serializable and contain no sensitive data that should be redacted from logs
If this fails: If args contains non-serializable objects or API keys, yaml.dump fails with cryptic errors or exposes secrets in config files
.runpod/src/handler.py:yaml.dump(args)
Environment variables loaded from .env file don't conflict with system environment and are applied before any config processing
If this fails: If .env overrides critical system variables or loads after config validation, training may use wrong model paths or authentication fails
src/axolotl/cli/main.py:load_dotenv
Config file path is accessible at startup time and remains readable throughout the training process
If this fails: If config file is on network filesystem that becomes unavailable, training cannot resume from checkpoints as config is re-read on restart
src/axolotl/cli/main.py:click.Path(exists=True)
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Periodic saves of model weights, optimizer state, and LoRA adapters during training for resumption and deployment
Preprocessed and tokenized datasets cached to avoid reprocessing on subsequent runs
Downloaded model weights and tokenizer files cached locally to avoid re-downloading
In-memory mapping of LoRA adapter names to loaded adapter weights for multi-adapter serving
Feedback Loops
- Training convergence (training-loop, balancing) — Trigger: Forward pass completion. Action: Compute loss, backpropagate gradients, update parameters using optimizer. Exit: Reach max steps or loss threshold.
- Gradient accumulation (gradient-accumulation, reinforcing) — Trigger: Each training step. Action: Accumulate gradients without parameter updates until accumulation_steps reached. Exit: gradient_accumulation_steps batches processed.
- Learning rate scheduling (auto-scale, balancing) — Trigger: Each optimizer step. Action: Adjust learning rate based on schedule (linear, cosine, etc.). Exit: Training completion.
- EBFT strided training (recursive, reinforcing) — Trigger: Batch processing in EBFT mode. Action: Process overlapping sequence windows with anchor tokens to maintain context. Exit: All sequence windows processed.
Delays
- Model compilation (compilation, ~10-60 seconds) — First training step takes longer as CUDA kernels compile and graph optimization occurs
- Checkpoint saving (checkpoint-save, ~5-30 seconds per save) — Training pauses while model state is serialized to disk
- Dataset preprocessing (batch-window, ~Minutes to hours) — First run tokenizes and caches datasets before training can begin
- vLLM adapter loading (warmup, ~1-10 seconds per adapter) — Initial serving requests wait for LoRA adapters to load into GPU memory
Control Points
- Learning rate (hyperparameter) — Controls: Speed of parameter updates and training convergence. Default: config-dependent
- LoRA rank (architecture-switch) — Controls: Number of trainable parameters and adapter capacity. Default: 8-64 typical
- Gradient accumulation steps (hyperparameter) — Controls: Effective batch size and memory usage. Default: 1-32 typical
- Mixed precision mode (precision-mode) — Controls: Memory usage and training speed vs numerical precision. Default: bf16 recommended
- Flash attention (feature-flag) — Controls: Memory-efficient attention computation enabling longer sequences. Default: true if supported
- Sequence length (architecture-switch) — Controls: Maximum context window and memory requirements. Default: 2048-32768 typical
Technology Stack
Core tensor computation and automatic differentiation for model training
Pre-trained model loading and transformer architecture implementations
Parameter-efficient fine-tuning methods like LoRA and AdaLoRA
High-performance inference engine with LoRA adapter support for serving
Custom CUDA kernel development for optimized operations like scatter-MoE
Configuration validation and parsing with type checking
Command-line interface framework for the axolotl CLI tool
Experiment tracking and metrics logging during training
Distributed training and memory optimization for large models
Key Components
- AxolotlTrainer (orchestrator) — Main training coordinator that wraps HuggingFace Trainer with Axolotl-specific optimizations like gradient checkpointing, mixed precision, and custom loss functions
src/axolotl/core/trainers/base.py - ModelBuilder (factory) — Creates and configures transformer models with architecture-specific patches, LoRA adapters, and optimization settings based on configuration
src/axolotl/core/model_builder.py - DatasetManager (processor) — Loads, tokenizes, and formats datasets using prompt strategies and chat templates to create training-ready batches
src/axolotl/datasets/ - PromptStrategy (transformer) — Converts raw text conversations into structured chat messages and applies model-specific chat templates for consistent formatting
src/axolotl/prompt_strategies/ - LoRAAdapter (adapter) — Implements parameter-efficient fine-tuning by adding low-rank decomposition matrices to transformer layers while freezing base weights
src/axolotl/core/adapters/ - VLLMLoRAServer (gateway) — HTTP server that loads multiple LoRA adapters into vLLM engine and serves generation requests with adapter selection
src/axolotl/scripts/vllm_serve_lora.py - ScatterMoELoRA (optimizer) — Custom CUDA kernel that efficiently combines mixture-of-experts routing with LoRA computation using scatter-gather operations
src/axolotl/integrations/kernels/libs/scattermoe_lora/ - ConfigLoader (loader) — Loads and validates YAML configuration files, applies environment variable overrides, and resolves model-specific defaults
src/axolotl/utils/config/ - MonkeyPatcher (adapter) — Runtime modification of transformer model behaviors including attention mechanisms, embedding layers, and loss functions for specific architectures
src/axolotl/monkeypatch/ - EBFTTrainer (executor) — Specialized trainer for Elastic Batch Fine-Tuning that uses strided training with anchor tokens to handle variable-length sequences efficiently
src/axolotl/core/trainers/ebft.py
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is axolotl used for?
Fine-tunes large language models using optimized training loops and LoRA adapters axolotl-ai-cloud/axolotl is a 10-component ml training written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 655 files.
How is axolotl architected?
axolotl is organized into 6 architecture layers: CLI Interface, Training Orchestration, Model Adapters, Dataset Processing, and 2 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through axolotl?
Data moves through 6 stages: Load configuration → Initialize model and tokenizer → Preprocess datasets → Execute training loop → Save checkpoints → .... Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support. This pipeline design reflects a complex multi-stage processing system.
What technologies does axolotl use?
The core stack includes PyTorch (Core tensor computation and automatic differentiation for model training), HuggingFace Transformers (Pre-trained model loading and transformer architecture implementations), HuggingFace PEFT (Parameter-efficient fine-tuning methods like LoRA and AdaLoRA), vLLM (High-performance inference engine with LoRA adapter support for serving), Triton (Custom CUDA kernel development for optimized operations like scatter-MoE), Pydantic (Configuration validation and parsing with type checking), and 3 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does axolotl have?
axolotl exhibits 4 data pools (Model checkpoint storage, Dataset cache), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle training-loop and gradient-accumulation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does axolotl use?
5 design patterns detected: Monkey Patching, Strategy Pattern, Factory Pattern, Adapter Pattern, Template Method.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.