axolotl-ai-cloud/axolotl

Go ahead and axolotl questions

11,724 stars Python 10 components

Fine-tunes large language models using optimized training loops and LoRA adapters

Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 10-component ml training. 655 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Load configuration — ConfigLoader reads YAML files from disk, validates them against AxolotlInputConfig schema using Pydantic, and applies environment variable overrides and model-specific defaults (config: base_model, model_type, sequence_len +1)
Initialize model and tokenizer — ModelBuilder loads the transformer model from HuggingFace, applies architecture-specific monkey patches, and optionally wraps with LoRA adapters based on configuration [AxolotlInputConfig → Configured model] (config: base_model, lora_r, lora_alpha +1)
Preprocess datasets — DatasetManager loads raw datasets, applies prompt strategies to format conversations into ChatMessage objects, then tokenizes using the model's tokenizer into TrainingBatch format [AxolotlInputConfig → TrainingBatch] (config: datasets, sequence_len, chat_template)
Execute training loop — AxolotlTrainer feeds batches through the model forward pass, computes loss on logits vs labels, accumulates gradients over gradient_accumulation_steps, and updates parameters [TrainingBatch → ModelOutput] (config: batch_size, gradient_accumulation_steps, learning_rate +1)
Save checkpoints — Trainer periodically saves model state, optimizer state, and LoRA adapter weights to disk in HuggingFace format for resuming or inference [ModelOutput → Model checkpoints] (config: output_dir, save_steps, save_total_limit)
Serve model — VLLMLoRAServer loads the trained model and any LoRA adapters into vLLM engine, accepts HTTP requests with VLLMRequest format, and returns generated text responses [VLLMRequest → Generated text] (config: model_name, lora_modules)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

AxolotlInputConfig src/axolotl/utils/schemas/config.py
Pydantic model with base_model: str, model_type: str, tokenizer_type: str, sequence_len: int, datasets: list[DatasetConfig], lora_r: int, lora_alpha: int, learning_rate: float, batch_size: int, gradient_accumulation_steps: int, num_epochs: int
Loaded from YAML files at startup, validated by Pydantic, passed through training pipeline to configure all components

TrainingBatch src/axolotl/core/trainer_builder.py
dict with input_ids: torch.Tensor[B, seq_len], attention_mask: torch.Tensor[B, seq_len], labels: torch.Tensor[B, seq_len], position_ids: torch.Tensor[B, seq_len]
Created by dataset collation from tokenized examples, fed to model during training steps, consumed by loss functions

ChatMessage src/axolotl/prompt_strategies/
dict with role: str (user|assistant|system), content: str, optionally name: str for function calls
Parsed from raw dataset conversations, formatted using chat templates, converted to token sequences

LoRAConfig src/axolotl/core/adapters/
dict with r: int, alpha: int, target_modules: list[str], dropout: float, bias: str, task_type: str, modules_to_save: list[str]
Extracted from main config, used to initialize PEFT LoRA adapters on transformer layers, saves only adapter weights

ModelOutput transformers library
object with loss: torch.Tensor, logits: torch.Tensor[B, seq_len, vocab_size], hidden_states: tuple[torch.Tensor], attentions: tuple[torch.Tensor]
Generated by model forward pass, loss extracted for backpropagation, logits used for evaluation metrics

VLLMRequest src/axolotl/scripts/vllm_serve_lora.py
Pydantic model with messages: list[list[dict]], n: int, temperature: float, top_p: float, max_tokens: int, generation_kwargs: dict
Parsed from incoming HTTP requests, validated, passed to vLLM engine for text generation

EBFTExample examples/ebft/
dict with prompt: list[dict] (chat messages), ground_truth: str for completion tasks, or input_ids: list[int] for pretrain tasks
Created by EBFT-specific dataset transforms, used in strided training loops where anchors are placed at completion boundaries

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

BASE_VOLUME environment variable points to a writable directory with sufficient disk space for model outputs, checkpoints, and datasets

If this fails: If BASE_VOLUME points to read-only filesystem or runs out of space during training, checkpoint saves will silently fail or crash mid-training with cryptic I/O errors

.runpod/src/handler.py:BASE_VOLUME

critical Resource unguarded

CUDA memory allocator configuration is appropriate for the available GPU memory and model size being trained

If this fails: If set_pytorch_cuda_alloc_conf sets memory fractions too high for the actual GPU, training crashes with OOM errors after successful startup, wasting preprocessing time

src/axolotl/cli/main.py:set_pytorch_cuda_alloc_conf

critical Temporal unguarded

Preprocessing step completes before training timeout, and cached preprocessed data remains valid between preprocessing and training phases

If this fails: If preprocessing takes longer than job timeout or cached data becomes stale, training starts with corrupted/incomplete datasets producing wrong model outputs

.runpod/src/train.py:preprocess command

warning Contract unguarded

RunPod job input contains 'args' dict with all required training parameters (base_model, datasets, learning_rate, etc.) matching AxolotlInputConfig schema

If this fails: Missing required config keys cause Pydantic validation to fail during config loading, but error happens after preprocessing completes, wasting computation time

.runpod/src/handler.py:inputs.get('args', {})

warning Environment unguarded

GPU with specified gpu_id exists and is not already occupied by another process

If this fails: If GPU is busy or doesn't exist, CUDA operations fail with device errors, but process may hang instead of failing fast

.runpod/src/train.py:CUDA_VISIBLE_DEVICES

warning Ordering weakly guarded

Preprocessing must complete successfully before training can begin, and no concurrent access to dataset cache occurs

If this fails: If preprocessing partially fails but returns success code, training proceeds with incomplete tokenized data leading to silent training degradation

.runpod/src/train.py:preprocess then train sequence

warning Resource unguarded

BASE_VOLUME has unlimited subdirectory creation permissions and no filesystem limits on directory depth

If this fails: If filesystem limits directory creation or run_id contains path traversal characters, output_dir creation silently fails causing checkpoint loss

.runpod/src/handler.py:output_dir creation

warning Domain unguarded

All values in args dict are YAML-serializable and contain no sensitive data that should be redacted from logs

If this fails: If args contains non-serializable objects or API keys, yaml.dump fails with cryptic errors or exposes secrets in config files

.runpod/src/handler.py:yaml.dump(args)

info Temporal unguarded

Environment variables loaded from .env file don't conflict with system environment and are applied before any config processing

If this fails: If .env overrides critical system variables or loads after config validation, training may use wrong model paths or authentication fails

src/axolotl/cli/main.py:load_dotenv

info Contract weakly guarded

Config file path is accessible at startup time and remains readable throughout the training process

If this fails: If config file is on network filesystem that becomes unavailable, training cannot resume from checkpoints as config is re-read on restart

src/axolotl/cli/main.py:click.Path(exists=True)

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Model checkpoint storage (checkpoint)
Periodic saves of model weights, optimizer state, and LoRA adapters during training for resumption and deployment

Dataset cache (cache)
Preprocessed and tokenized datasets cached to avoid reprocessing on subsequent runs

Model hub cache (cache)
Downloaded model weights and tokenizer files cached locally to avoid re-downloading

vLLM adapter registry (registry)
In-memory mapping of LoRA adapter names to loaded adapter weights for multi-adapter serving

Feedback Loops

Training convergence (training-loop, balancing) — Trigger: Forward pass completion. Action: Compute loss, backpropagate gradients, update parameters using optimizer. Exit: Reach max steps or loss threshold.
Gradient accumulation (gradient-accumulation, reinforcing) — Trigger: Each training step. Action: Accumulate gradients without parameter updates until accumulation_steps reached. Exit: gradient_accumulation_steps batches processed.
Learning rate scheduling (auto-scale, balancing) — Trigger: Each optimizer step. Action: Adjust learning rate based on schedule (linear, cosine, etc.). Exit: Training completion.
EBFT strided training (recursive, reinforcing) — Trigger: Batch processing in EBFT mode. Action: Process overlapping sequence windows with anchor tokens to maintain context. Exit: All sequence windows processed.

Delays

Model compilation (compilation, ~10-60 seconds) — First training step takes longer as CUDA kernels compile and graph optimization occurs
Checkpoint saving (checkpoint-save, ~5-30 seconds per save) — Training pauses while model state is serialized to disk
Dataset preprocessing (batch-window, ~Minutes to hours) — First run tokenizes and caches datasets before training can begin
vLLM adapter loading (warmup, ~1-10 seconds per adapter) — Initial serving requests wait for LoRA adapters to load into GPU memory

Control Points

Learning rate (hyperparameter) — Controls: Speed of parameter updates and training convergence. Default: config-dependent
LoRA rank (architecture-switch) — Controls: Number of trainable parameters and adapter capacity. Default: 8-64 typical
Gradient accumulation steps (hyperparameter) — Controls: Effective batch size and memory usage. Default: 1-32 typical
Mixed precision mode (precision-mode) — Controls: Memory usage and training speed vs numerical precision. Default: bf16 recommended
Flash attention (feature-flag) — Controls: Memory-efficient attention computation enabling longer sequences. Default: true if supported
Sequence length (architecture-switch) — Controls: Maximum context window and memory requirements. Default: 2048-32768 typical

Technology Stack

PyTorch (framework)
Core tensor computation and automatic differentiation for model training

HuggingFace Transformers (library)
Pre-trained model loading and transformer architecture implementations

HuggingFace PEFT (library)
Parameter-efficient fine-tuning methods like LoRA and AdaLoRA

vLLM (runtime)
High-performance inference engine with LoRA adapter support for serving

Triton (compute)
Custom CUDA kernel development for optimized operations like scatter-MoE

Pydantic (library)
Configuration validation and parsing with type checking

Click (framework)
Command-line interface framework for the axolotl CLI tool

WandB (infra)
Experiment tracking and metrics logging during training

DeepSpeed (framework)
Distributed training and memory optimization for large models

Key Components

AxolotlTrainer (orchestrator) — Main training coordinator that wraps HuggingFace Trainer with Axolotl-specific optimizations like gradient checkpointing, mixed precision, and custom loss functions src/axolotl/core/trainers/base.py
ModelBuilder (factory) — Creates and configures transformer models with architecture-specific patches, LoRA adapters, and optimization settings based on configuration src/axolotl/core/model_builder.py
DatasetManager (processor) — Loads, tokenizes, and formats datasets using prompt strategies and chat templates to create training-ready batches src/axolotl/datasets/
PromptStrategy (transformer) — Converts raw text conversations into structured chat messages and applies model-specific chat templates for consistent formatting src/axolotl/prompt_strategies/
LoRAAdapter (adapter) — Implements parameter-efficient fine-tuning by adding low-rank decomposition matrices to transformer layers while freezing base weights src/axolotl/core/adapters/
VLLMLoRAServer (gateway) — HTTP server that loads multiple LoRA adapters into vLLM engine and serves generation requests with adapter selection src/axolotl/scripts/vllm_serve_lora.py
ScatterMoELoRA (optimizer) — Custom CUDA kernel that efficiently combines mixture-of-experts routing with LoRA computation using scatter-gather operations src/axolotl/integrations/kernels/libs/scattermoe_lora/
ConfigLoader (loader) — Loads and validates YAML configuration files, applies environment variable overrides, and resolves model-specific defaults src/axolotl/utils/config/
MonkeyPatcher (adapter) — Runtime modification of transformer model behaviors including attention mechanisms, embedding layers, and loss functions for specific architectures src/axolotl/monkeypatch/
EBFTTrainer (executor) — Specialized trainer for Elastic Batch Fine-Tuning that uses strided training with anchor tokens to handle variable-length sequences efficiently src/axolotl/core/trainers/ebft.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is axolotl used for?

Fine-tunes large language models using optimized training loops and LoRA adapters axolotl-ai-cloud/axolotl is a 10-component ml training written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 655 files.

How is axolotl architected?

axolotl is organized into 6 architecture layers: CLI Interface, Training Orchestration, Model Adapters, Dataset Processing, and 2 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through axolotl?

Data moves through 6 stages: Load configuration → Initialize model and tokenizer → Preprocess datasets → Execute training loop → Save checkpoints → .... Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support. This pipeline design reflects a complex multi-stage processing system.

What technologies does axolotl use?

The core stack includes PyTorch (Core tensor computation and automatic differentiation for model training), HuggingFace Transformers (Pre-trained model loading and transformer architecture implementations), HuggingFace PEFT (Parameter-efficient fine-tuning methods like LoRA and AdaLoRA), vLLM (High-performance inference engine with LoRA adapter support for serving), Triton (Custom CUDA kernel development for optimized operations like scatter-MoE), Pydantic (Configuration validation and parsing with type checking), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does axolotl have?

axolotl exhibits 4 data pools (Model checkpoint storage, Dataset cache), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle training-loop and gradient-accumulation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does axolotl use?

5 design patterns detected: Monkey Patching, Strategy Pattern, Factory Pattern, Adapter Pattern, Template Method.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.