axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
Open-source LLM fine-tuning framework with configuration-driven training pipeline
Configuration files drive dataset preprocessing, model initialization, training execution, and optional model serving through a linear pipeline with extensive customization points.
Under the hood, the system uses 3 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.
Structural Verdict
A 10-component ml training with 4 connections. 639 files analyzed. Loosely coupled — components are relatively independent.
How Data Flows Through the System
Configuration files drive dataset preprocessing, model initialization, training execution, and optional model serving through a linear pipeline with extensive customization points.
- Config Loading — YAML config parsed and validated against AxolotlInputConfig schema
- Dataset Preprocessing — Raw datasets transformed using configurable prompt strategies (config: datasets, dataset_prepared_path, prompt_strategies)
- Model Initialization — Base model loaded with specified architecture and optimization settings (config: base_model, model_type, load_in_8bit +1)
- Training Execution — Model fine-tuned using specified training parameters and techniques (config: learning_rate, num_epochs, batch_size +2)
- Model Output — Trained model saved to configured output directory (config: output_dir, save_strategy)
- Optional Serving — Model deployed for inference using vLLM or other serving backends (config: inference_engine)
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Downloaded pretrained models and tokenizers
Preprocessed training datasets
Checkpoints, final models, and training artifacts
Persistent storage for cloud training jobs
Feedback Loops
- Training Loop (training-loop, balancing) — Trigger: Training start command. Action: Forward pass, loss calculation, backward pass, optimizer step. Exit: Max epochs or convergence reached.
- Checkpoint Saving (polling, balancing) — Trigger: Save strategy interval. Action: Save model state to disk. Exit: Training completion.
- Learning Rate Schedule (convergence, balancing) — Trigger: Each training step. Action: Adjust learning rate based on schedule. Exit: Training completion.
Delays & Async Processing
- Dataset Preprocessing (batch-window, ~varies by dataset size) — Training waits for dataset preparation
- Model Loading (async-processing, ~varies by model size) — Training startup delay
- Checkpoint Saving (async-processing, ~varies by model size) — Brief training pause during saves
- Cloud Provisioning (queue-drain, ~varies by cloud provider) — Job waits in queue before execution
Control Points
- Learning Rate (threshold) — Controls: Training convergence speed and stability. Default: configurable
- Batch Size (threshold) — Controls: Memory usage and gradient noise. Default: configurable
- LoRA Rank (threshold) — Controls: Adapter capacity and training efficiency. Default: configurable
- Save Strategy (env-var) — Controls: Checkpoint frequency and storage. Default: configurable
- GPU Memory Management (env-var) — Controls: CUDA memory allocation strategy. Default: configurable
Technology Stack
Deep learning framework
Hugging Face model library
Configuration validation
CLI framework
GPU kernel optimization
Cloud compute orchestration
GPU cloud platform
Testing framework
Experiment tracking
Inference serving
Key Components
- cli.main (cli-command) — Provides CLI commands for train, preprocess, quantize, shard, and serve operations
src/axolotl/cli/main.py - AxolotlInputConfig (type-def) — Pydantic schema defining all valid configuration parameters for training
src/axolotl/utils/schemas/config.py - handler (handler) — Runpod serverless entry point that processes training jobs from cloud requests
.runpod/src/handler.py - train (function) — Async training orchestrator that runs preprocessing and training commands
.runpod/src/train.py - AxolotlTrainer (class) — Base trainer class extending HuggingFace Trainer with Axolotl-specific features
src/axolotl/core/trainers/base.py - entropy_from_logits (function) — Triton-optimized entropy calculation for training metrics
axolotl/monkeypatch/trainer/utils.py - ScatterMoELoRA (class) — LoRA implementation for Mixture of Experts models with optimized kernels
axolotl/integrations/kernels/libs/scattermoe_lora/parallel_linear_lora.py - swanlab_profile (middleware) — Decorator for adding performance profiling to training methods
axolotl/integrations/swanlab/profiling.py - QuartoGenerator (utility) — Generates documentation from Pydantic model schemas automatically
docs/scripts/generate_config_docs.py - transform (function) — Dataset transformation function for EBFT training with code instruction data
examples/ebft/ebft_opencode.py
Sub-Modules
Cloud-based serverless training deployment system
Automated testing and deployment infrastructure using Modal
Automatic documentation generation from Pydantic schemas
Kernel and optimization benchmarking suite
Configuration
_quarto.yml (yaml)
project.type(string, unknown) — default: websiteproject.pre-render(array, unknown) — default: docs/scripts/generate_config_docs.py,docs/scripts/generate_examples_docs.pyquartodoc.dir(string, unknown) — default: docs/apiquartodoc.package(string, unknown) — default: axolotlquartodoc.title(string, unknown) — default: API Referencequartodoc.parser(string, unknown) — default: googlequartodoc.sections(array, unknown) — default: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]website.title(string, unknown) — default: Axolotl- +22 more parameters
codecov.yml (yaml)
codecov.require_ci_to_pass(string, unknown) — default: yescodecov.notify.wait_for_ci(boolean, unknown) — default: truecoverage.precision(number, unknown) — default: 2coverage.round(string, unknown) — default: downcoverage.range(string, unknown) — default: 70...100coverage.status.project.default.target(string, unknown) — default: autocoverage.status.project.default.threshold(string, unknown) — default: 1%coverage.status.project.default.base(string, unknown) — default: auto- +29 more parameters
docker-compose.yaml (yaml)
services.axolotl.build.context(string, unknown) — default: .services.axolotl.build.dockerfile(string, unknown) — default: ./docker/Dockerfileservices.axolotl.volumes(array, unknown) — default: .:/workspace/axolotl,~/.cache/huggingface/:/root/.cache/huggingface/services.axolotl.environment(array, unknown) — default: GIT_AUTHOR_NAME=${GIT_AUTHOR_NAME},GIT_AUTHOR_EMAIL=${GIT_AUTHOR_EMAIL},GIT_COMMITTER_NAME=${GIT_COMMITTER_NAME},GIT_COMMITTER_EMAIL=${GIT_COMMITTER_EMAIL},WANDB_API_KEY=${WANDB_API_KEY}services.axolotl.deploy.resources.reservations.devices(array, unknown) — default: [object Object]services.axolotl.command(string, unknown) — default: tail -f /dev/null
src/axolotl/scripts/vllm_serve_lora.py (python-pydantic)
n(int, unknown) — default: 1repetition_penalty(float, unknown) — default: 1.0temperature(float, unknown) — default: 1.0top_p(float, unknown) — default: 1.0top_k(int, unknown) — default: -1min_p(float, unknown) — default: 0.0max_tokens(int, unknown) — default: 16generation_kwargs(dict, unknown) — default: PydanticField(default_factory=dict)
Science Pipeline
- Load Dataset — load_dataset then apply transform functions [variable (depends on dataset) → tokenized sequences of shape (batch_size, sequence_len)]
src/axolotl/utils/data.py - Tokenize & Pad — tokenizer encode with padding to max_length [raw text strings → (batch_size, sequence_len) with attention masks]
examples/ebft/ebft_pretrain.py - Model Forward Pass — transformer forward through attention layers and MLP/MoE [(batch_size, sequence_len) → (batch_size, sequence_len, vocab_size) logits]
src/axolotl/core/trainers/base.py - Loss Calculation — cross entropy loss with label smoothing and entropy regularization [(batch_size, sequence_len, vocab_size) logits + labels → scalar loss value]
axolotl/monkeypatch/trainer/utils.py - Gradient Update — backward pass through LoRA adapters or full model [loss scalar → updated model parameters]
src/axolotl/core/trainers/base.py
Assumptions & Constraints
- [warning] Assumes logits tensor can be reshaped to (-1, num_classes) but no explicit shape validation (shape)
- [warning] Assumes sequence_len parameter fits within tokenizer limits without validation (value-range)
- [critical] Hardcoded DEVICE = 'cuda' assumes GPU availability without fallback (device)
- [info] Assumes specific tensor dtypes for MoE routing without explicit casting (dtype)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is axolotl used for?
Open-source LLM fine-tuning framework with configuration-driven training pipeline axolotl-ai-cloud/axolotl is a 10-component ml training written in Python. Loosely coupled — components are relatively independent. The codebase contains 639 files.
How is axolotl architected?
axolotl is organized into 5 architecture layers: CLI Interface, Core Framework, Configuration Layer, Integrations, and 1 more. Loosely coupled — components are relatively independent. This layered structure keeps concerns separated and modules independent.
How does data flow through axolotl?
Data moves through 6 stages: Config Loading → Dataset Preprocessing → Model Initialization → Training Execution → Model Output → .... Configuration files drive dataset preprocessing, model initialization, training execution, and optional model serving through a linear pipeline with extensive customization points. This pipeline design reflects a complex multi-stage processing system.
What technologies does axolotl use?
The core stack includes PyTorch (Deep learning framework), Transformers (Hugging Face model library), Pydantic (Configuration validation), Click (CLI framework), Triton (GPU kernel optimization), Modal (Cloud compute orchestration), and 4 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does axolotl have?
axolotl exhibits 4 data pools (HuggingFace Model Cache, Dataset Cache), 3 feedback loops, 5 control points, 4 delays. The feedback loops handle training-loop and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does axolotl use?
5 design patterns detected: Configuration-Driven Architecture, Plugin System, Monkeypatch Optimizations, Cloud-First Design, Triton Kernel Optimization.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.