axolotl-ai-cloud/axolotl

Go ahead and axolotl questions

11,724 stars Python 10 components

Fine-tunes large language models using optimized training loops and LoRA adapters

Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support.

Under the hood, the system uses 4 feedback loops, 4 data pools, 6 control points to manage its runtime behavior.

A 10-component ml training. 655 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support.

  1. Load configuration — ConfigLoader reads YAML files from disk, validates them against AxolotlInputConfig schema using Pydantic, and applies environment variable overrides and model-specific defaults (config: base_model, model_type, sequence_len +1)
  2. Initialize model and tokenizer — ModelBuilder loads the transformer model from HuggingFace, applies architecture-specific monkey patches, and optionally wraps with LoRA adapters based on configuration [AxolotlInputConfig → Configured model] (config: base_model, lora_r, lora_alpha +1)
  3. Preprocess datasets — DatasetManager loads raw datasets, applies prompt strategies to format conversations into ChatMessage objects, then tokenizes using the model's tokenizer into TrainingBatch format [AxolotlInputConfig → TrainingBatch] (config: datasets, sequence_len, chat_template)
  4. Execute training loop — AxolotlTrainer feeds batches through the model forward pass, computes loss on logits vs labels, accumulates gradients over gradient_accumulation_steps, and updates parameters [TrainingBatch → ModelOutput] (config: batch_size, gradient_accumulation_steps, learning_rate +1)
  5. Save checkpoints — Trainer periodically saves model state, optimizer state, and LoRA adapter weights to disk in HuggingFace format for resuming or inference [ModelOutput → Model checkpoints] (config: output_dir, save_steps, save_total_limit)
  6. Serve model — VLLMLoRAServer loads the trained model and any LoRA adapters into vLLM engine, accepts HTTP requests with VLLMRequest format, and returns generated text responses [VLLMRequest → Generated text] (config: model_name, lora_modules)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

AxolotlInputConfig src/axolotl/utils/schemas/config.py
Pydantic model with base_model: str, model_type: str, tokenizer_type: str, sequence_len: int, datasets: list[DatasetConfig], lora_r: int, lora_alpha: int, learning_rate: float, batch_size: int, gradient_accumulation_steps: int, num_epochs: int
Loaded from YAML files at startup, validated by Pydantic, passed through training pipeline to configure all components
TrainingBatch src/axolotl/core/trainer_builder.py
dict with input_ids: torch.Tensor[B, seq_len], attention_mask: torch.Tensor[B, seq_len], labels: torch.Tensor[B, seq_len], position_ids: torch.Tensor[B, seq_len]
Created by dataset collation from tokenized examples, fed to model during training steps, consumed by loss functions
ChatMessage src/axolotl/prompt_strategies/
dict with role: str (user|assistant|system), content: str, optionally name: str for function calls
Parsed from raw dataset conversations, formatted using chat templates, converted to token sequences
LoRAConfig src/axolotl/core/adapters/
dict with r: int, alpha: int, target_modules: list[str], dropout: float, bias: str, task_type: str, modules_to_save: list[str]
Extracted from main config, used to initialize PEFT LoRA adapters on transformer layers, saves only adapter weights
ModelOutput transformers library
object with loss: torch.Tensor, logits: torch.Tensor[B, seq_len, vocab_size], hidden_states: tuple[torch.Tensor], attentions: tuple[torch.Tensor]
Generated by model forward pass, loss extracted for backpropagation, logits used for evaluation metrics
VLLMRequest src/axolotl/scripts/vllm_serve_lora.py
Pydantic model with messages: list[list[dict]], n: int, temperature: float, top_p: float, max_tokens: int, generation_kwargs: dict
Parsed from incoming HTTP requests, validated, passed to vLLM engine for text generation
EBFTExample examples/ebft/
dict with prompt: list[dict] (chat messages), ground_truth: str for completion tasks, or input_ids: list[int] for pretrain tasks
Created by EBFT-specific dataset transforms, used in strided training loops where anchors are placed at completion boundaries

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

BASE_VOLUME environment variable points to a writable directory with sufficient disk space for model outputs, checkpoints, and datasets

If this fails: If BASE_VOLUME points to read-only filesystem or runs out of space during training, checkpoint saves will silently fail or crash mid-training with cryptic I/O errors

.runpod/src/handler.py:BASE_VOLUME
critical Resource unguarded

CUDA memory allocator configuration is appropriate for the available GPU memory and model size being trained

If this fails: If set_pytorch_cuda_alloc_conf sets memory fractions too high for the actual GPU, training crashes with OOM errors after successful startup, wasting preprocessing time

src/axolotl/cli/main.py:set_pytorch_cuda_alloc_conf
critical Temporal unguarded

Preprocessing step completes before training timeout, and cached preprocessed data remains valid between preprocessing and training phases

If this fails: If preprocessing takes longer than job timeout or cached data becomes stale, training starts with corrupted/incomplete datasets producing wrong model outputs

.runpod/src/train.py:preprocess command
warning Contract unguarded

RunPod job input contains 'args' dict with all required training parameters (base_model, datasets, learning_rate, etc.) matching AxolotlInputConfig schema

If this fails: Missing required config keys cause Pydantic validation to fail during config loading, but error happens after preprocessing completes, wasting computation time

.runpod/src/handler.py:inputs.get('args', {})
warning Environment unguarded

GPU with specified gpu_id exists and is not already occupied by another process

If this fails: If GPU is busy or doesn't exist, CUDA operations fail with device errors, but process may hang instead of failing fast

.runpod/src/train.py:CUDA_VISIBLE_DEVICES
warning Ordering weakly guarded

Preprocessing must complete successfully before training can begin, and no concurrent access to dataset cache occurs

If this fails: If preprocessing partially fails but returns success code, training proceeds with incomplete tokenized data leading to silent training degradation

.runpod/src/train.py:preprocess then train sequence
warning Resource unguarded

BASE_VOLUME has unlimited subdirectory creation permissions and no filesystem limits on directory depth

If this fails: If filesystem limits directory creation or run_id contains path traversal characters, output_dir creation silently fails causing checkpoint loss

.runpod/src/handler.py:output_dir creation
warning Domain unguarded

All values in args dict are YAML-serializable and contain no sensitive data that should be redacted from logs

If this fails: If args contains non-serializable objects or API keys, yaml.dump fails with cryptic errors or exposes secrets in config files

.runpod/src/handler.py:yaml.dump(args)
info Temporal unguarded

Environment variables loaded from .env file don't conflict with system environment and are applied before any config processing

If this fails: If .env overrides critical system variables or loads after config validation, training may use wrong model paths or authentication fails

src/axolotl/cli/main.py:load_dotenv
info Contract weakly guarded

Config file path is accessible at startup time and remains readable throughout the training process

If this fails: If config file is on network filesystem that becomes unavailable, training cannot resume from checkpoints as config is re-read on restart

src/axolotl/cli/main.py:click.Path(exists=True)

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Model checkpoint storage (checkpoint)
Periodic saves of model weights, optimizer state, and LoRA adapters during training for resumption and deployment
Dataset cache (cache)
Preprocessed and tokenized datasets cached to avoid reprocessing on subsequent runs
Model hub cache (cache)
Downloaded model weights and tokenizer files cached locally to avoid re-downloading
vLLM adapter registry (registry)
In-memory mapping of LoRA adapter names to loaded adapter weights for multi-adapter serving

Feedback Loops

Delays

Control Points

Technology Stack

PyTorch (framework)
Core tensor computation and automatic differentiation for model training
HuggingFace Transformers (library)
Pre-trained model loading and transformer architecture implementations
HuggingFace PEFT (library)
Parameter-efficient fine-tuning methods like LoRA and AdaLoRA
vLLM (runtime)
High-performance inference engine with LoRA adapter support for serving
Triton (compute)
Custom CUDA kernel development for optimized operations like scatter-MoE
Pydantic (library)
Configuration validation and parsing with type checking
Click (framework)
Command-line interface framework for the axolotl CLI tool
WandB (infra)
Experiment tracking and metrics logging during training
DeepSpeed (framework)
Distributed training and memory optimization for large models

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is axolotl used for?

Fine-tunes large language models using optimized training loops and LoRA adapters axolotl-ai-cloud/axolotl is a 10-component ml training written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 655 files.

How is axolotl architected?

axolotl is organized into 6 architecture layers: CLI Interface, Training Orchestration, Model Adapters, Dataset Processing, and 2 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through axolotl?

Data moves through 6 stages: Load configuration → Initialize model and tokenizer → Preprocess datasets → Execute training loop → Save checkpoints → .... Training starts by loading YAML configuration files that specify model architecture, datasets, and training parameters. Raw datasets are preprocessed using prompt strategies to convert conversations into chat message format, then tokenized into input_ids and labels tensors. The training loop feeds these batches to the model (optionally with LoRA adapters), computes loss on the output logits, and backpropagates gradients. Checkpoints are saved periodically, and the final model can be served via vLLM with LoRA adapter support. This pipeline design reflects a complex multi-stage processing system.

What technologies does axolotl use?

The core stack includes PyTorch (Core tensor computation and automatic differentiation for model training), HuggingFace Transformers (Pre-trained model loading and transformer architecture implementations), HuggingFace PEFT (Parameter-efficient fine-tuning methods like LoRA and AdaLoRA), vLLM (High-performance inference engine with LoRA adapter support for serving), Triton (Custom CUDA kernel development for optimized operations like scatter-MoE), Pydantic (Configuration validation and parsing with type checking), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does axolotl have?

axolotl exhibits 4 data pools (Model checkpoint storage, Dataset cache), 4 feedback loops, 6 control points, 4 delays. The feedback loops handle training-loop and gradient-accumulation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does axolotl use?

5 design patterns detected: Monkey Patching, Strategy Pattern, Factory Pattern, Adapter Pattern, Template Method.

Analyzed on April 20, 2026 by CodeSea. Written by .