hiyouga/llamafactory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Unified framework for fine-tuning 100+ large language models
Data flows from raw datasets through formatting and tokenization, then through model training or inference, with support for multi-modal inputs and various output formats.
Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.
Structural Verdict
A 10-component ml training with 11 connections. 278 files analyzed. Well-connected — clear data flow between components.
How Data Flows Through the System
Data flows from raw datasets through formatting and tokenization, then through model training or inference, with support for multi-modal inputs and various output formats.
- Dataset Loading — Load datasets from HuggingFace datasets or local files (config: dataset, dataset_dir)
- Data Formatting — Apply conversation templates and format messages using Template and FormatterPlugin (config: template, system_message)
- Multi-modal Processing — Process images, audio, and video inputs through MMPlugin processors (config: mm_plugin, image_processor)
- Tokenization — Convert formatted text to token IDs with proper attention masks and labels (config: cutoff_len, tokenizer_class)
- Training/Inference — Pass through model for fine-tuning with CustomSeq2SeqTrainer or inference with ChatModel (config: learning_rate, num_train_epochs, lora_rank +1)
- Output Generation — Generate responses through API endpoints or save trained model checkpoints (config: output_dir, max_new_tokens, temperature)
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Trained model weights, LoRA adapters, and configuration files accumulate during training
Preprocessed datasets and tokenized examples cached for repeated use
Images, audio, video files referenced by dataset entries
Feedback Loops
- Training Loop (training-loop, reinforcing) — Trigger: CustomSeq2SeqTrainer.train(). Action: Forward pass, loss calculation, backprop, optimizer step. Exit: num_train_epochs reached or early stopping.
- Memory Cleanup (polling, balancing) — Trigger: lifespan context manager. Action: torch_gc() every 300 seconds. Exit: FastAPI app shutdown.
- Gradient Accumulation (convergence, reinforcing) — Trigger: Batch processing. Action: Accumulate gradients over micro-batches. Exit: gradient_accumulation_steps reached.
Delays & Async Processing
- Model Loading (async-processing, ~varies by model size) — Initial startup delay for chat interface and API
- Memory Cleanup Interval (scheduled-job, ~300 seconds) — Regular GPU memory garbage collection during API serving
- Dataset Processing (batch-window, ~varies by dataset size) — Preprocessing delay before training begins
Control Points
- Learning Rate (threshold) — Controls: Training step size and convergence behavior. Default: configurable
- Flash Attention (feature-flag) — Controls: Memory-efficient attention implementation. Default: auto
- API Key (env-var) — Controls: API access authentication. Default: os.getenv('API_KEY')
- Max New Tokens (threshold) — Controls: Maximum response length during inference. Default: configurable
Technology Stack
Core deep learning framework
Model implementations and tokenizers
Parameter-efficient fine-tuning (LoRA, etc.)
Transformer reinforcement learning
Distributed training and mixed precision
REST API server
Web UI interface
Dataset loading and processing
Training optimization and FLOPS profiling
Unit and integration testing
Key Components
- ChatModel (class) — Main interface for model inference and chat interactions
src/llamafactory/chat/base_engine.py - create_app (function) — Creates FastAPI application with chat completion endpoints
src/llamafactory/api/app.py - CustomSeq2SeqTrainer (class) — Custom trainer for supervised fine-tuning with LoRA and other techniques
src/llamafactory/train/sft/trainer.py - ChatCompletionRequest (type-def) — Pydantic model defining OpenAI-compatible chat completion request format
src/llamafactory/api/protocol.py - FormatterPlugin (class) — Handles conversation formatting and template application for different models
src/llamafactory/data/formatter.py - MMPlugin (class) — Processes multi-modal inputs including images, videos, and audio for vision-language models
src/llamafactory/data/mm_plugin.py - Template (class) — Defines conversation templates with system prompts and formatting rules for different models
src/llamafactory/data/template.py - get_train_args (function) — Parses and validates training arguments from config files and command line
src/llamafactory/hparams/__init__.py - DummyDataset (class) — Synthetic dataset for performance benchmarking with configurable sequence lengths
scripts/bench_qwen.py - convert_mca_to_hf (function) — Converts Megatron-Core checkpoints to HuggingFace format for model interoperability
scripts/megatron_merge.py
Configuration
src/llamafactory/api/protocol.py (python-pydantic)
id(str, unknown)created(int, unknown) — default: Field(default_factory=lambda: int(time.time()))
src/llamafactory/api/protocol.py (python-pydantic)
data(list[ModelCard], unknown) — default: []
src/llamafactory/api/protocol.py (python-pydantic)
name(str, unknown)arguments(str, unknown)
src/llamafactory/api/protocol.py (python-pydantic)
name(str, unknown)description(str, unknown)parameters(dict[str, Any], unknown)
Science Pipeline
- Load Raw Dataset — datasets.load_dataset with various formats (json, parquet, arrow) [variable records → structured dataset]
src/llamafactory/data/loader.py - Apply Conversation Template — Template.format_example with system/user/assistant roles [raw conversations → formatted messages]
src/llamafactory/data/template.py - Process Multi-modal Content — MMPlugin processors handle images/audio/video with tokenization [mixed text+media → tokenized sequences]
src/llamafactory/data/mm_plugin.py - Tokenize and Encode — transformers.tokenizer with attention masks and labels [formatted text → (batch_size, seq_len)]
src/llamafactory/data/processor/supervised.py - Model Forward Pass — transformer forward with loss computation for training or generation [(batch_size, seq_len, hidden_size) → logits or generated tokens]
src/llamafactory/train/sft/trainer.py
Assumptions & Constraints
- [warning] Assumes image tensors have consistent dimensions after processing but no explicit validation (shape)
- [info] Hardcodes vocab_size=32768 and image_token_num calculation without model validation (value-range)
- [info] Assumes CUDA availability for distributed training but has CPU fallbacks (device)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is LlamaFactory used for?
Unified framework for fine-tuning 100+ large language models hiyouga/llamafactory is a 10-component ml training written in Python. Well-connected — clear data flow between components. The codebase contains 278 files.
How is LlamaFactory architected?
LlamaFactory is organized into 5 architecture layers: API Layer, Chat Interface, Training System, Data Processing, and 1 more. Well-connected — clear data flow between components. This layered structure enables tight integration between components.
How does data flow through LlamaFactory?
Data moves through 6 stages: Dataset Loading → Data Formatting → Multi-modal Processing → Tokenization → Training/Inference → .... Data flows from raw datasets through formatting and tokenization, then through model training or inference, with support for multi-modal inputs and various output formats. This pipeline design reflects a complex multi-stage processing system.
What technologies does LlamaFactory use?
The core stack includes PyTorch (Core deep learning framework), Transformers (Model implementations and tokenizers), PEFT (Parameter-efficient fine-tuning (LoRA, etc.)), TRL (Transformer reinforcement learning), Accelerate (Distributed training and mixed precision), FastAPI (REST API server), and 4 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does LlamaFactory have?
LlamaFactory exhibits 3 data pools (Model Checkpoints, Dataset Cache), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle training-loop and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does LlamaFactory use?
4 design patterns detected: Plugin Architecture, Template System, Adapter Pattern, Configuration Dataclasses.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.