hiyouga/llamafactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

69,311 stars Python 10 components 11 connections

Unified framework for fine-tuning 100+ large language models

Data flows from raw datasets through formatting and tokenization, then through model training or inference, with support for multi-modal inputs and various output formats.

Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

Structural Verdict

A 10-component ml training with 11 connections. 278 files analyzed. Well-connected — clear data flow between components.

How Data Flows Through the System

Data flows from raw datasets through formatting and tokenization, then through model training or inference, with support for multi-modal inputs and various output formats.

  1. Dataset Loading — Load datasets from HuggingFace datasets or local files (config: dataset, dataset_dir)
  2. Data Formatting — Apply conversation templates and format messages using Template and FormatterPlugin (config: template, system_message)
  3. Multi-modal Processing — Process images, audio, and video inputs through MMPlugin processors (config: mm_plugin, image_processor)
  4. Tokenization — Convert formatted text to token IDs with proper attention masks and labels (config: cutoff_len, tokenizer_class)
  5. Training/Inference — Pass through model for fine-tuning with CustomSeq2SeqTrainer or inference with ChatModel (config: learning_rate, num_train_epochs, lora_rank +1)
  6. Output Generation — Generate responses through API endpoints or save trained model checkpoints (config: output_dir, max_new_tokens, temperature)

System Behavior

How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Model Checkpoints (file-store)
Trained model weights, LoRA adapters, and configuration files accumulate during training
Dataset Cache (cache)
Preprocessed datasets and tokenized examples cached for repeated use
Multi-modal Assets (file-store)
Images, audio, video files referenced by dataset entries

Feedback Loops

Delays & Async Processing

Control Points

Technology Stack

PyTorch (framework)
Core deep learning framework
Transformers (library)
Model implementations and tokenizers
PEFT (library)
Parameter-efficient fine-tuning (LoRA, etc.)
TRL (library)
Transformer reinforcement learning
Accelerate (library)
Distributed training and mixed precision
FastAPI (framework)
REST API server
Gradio (framework)
Web UI interface
Datasets (library)
Dataset loading and processing
DeepSpeed (library)
Training optimization and FLOPS profiling
pytest (testing)
Unit and integration testing

Key Components

Configuration

src/llamafactory/api/protocol.py (python-pydantic)

src/llamafactory/api/protocol.py (python-pydantic)

src/llamafactory/api/protocol.py (python-pydantic)

src/llamafactory/api/protocol.py (python-pydantic)

Science Pipeline

  1. Load Raw Dataset — datasets.load_dataset with various formats (json, parquet, arrow) [variable records → structured dataset] src/llamafactory/data/loader.py
  2. Apply Conversation Template — Template.format_example with system/user/assistant roles [raw conversations → formatted messages] src/llamafactory/data/template.py
  3. Process Multi-modal Content — MMPlugin processors handle images/audio/video with tokenization [mixed text+media → tokenized sequences] src/llamafactory/data/mm_plugin.py
  4. Tokenize and Encode — transformers.tokenizer with attention masks and labels [formatted text → (batch_size, seq_len)] src/llamafactory/data/processor/supervised.py
  5. Model Forward Pass — transformer forward with loss computation for training or generation [(batch_size, seq_len, hidden_size) → logits or generated tokens] src/llamafactory/train/sft/trainer.py

Assumptions & Constraints

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Ml Training Repositories

Frequently Asked Questions

What is LlamaFactory used for?

Unified framework for fine-tuning 100+ large language models hiyouga/llamafactory is a 10-component ml training written in Python. Well-connected — clear data flow between components. The codebase contains 278 files.

How is LlamaFactory architected?

LlamaFactory is organized into 5 architecture layers: API Layer, Chat Interface, Training System, Data Processing, and 1 more. Well-connected — clear data flow between components. This layered structure enables tight integration between components.

How does data flow through LlamaFactory?

Data moves through 6 stages: Dataset Loading → Data Formatting → Multi-modal Processing → Tokenization → Training/Inference → .... Data flows from raw datasets through formatting and tokenization, then through model training or inference, with support for multi-modal inputs and various output formats. This pipeline design reflects a complex multi-stage processing system.

What technologies does LlamaFactory use?

The core stack includes PyTorch (Core deep learning framework), Transformers (Model implementations and tokenizers), PEFT (Parameter-efficient fine-tuning (LoRA, etc.)), TRL (Transformer reinforcement learning), Accelerate (Distributed training and mixed precision), FastAPI (REST API server), and 4 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does LlamaFactory have?

LlamaFactory exhibits 3 data pools (Model Checkpoints, Dataset Cache), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle training-loop and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does LlamaFactory use?

4 design patterns detected: Plugin Architecture, Template System, Adapter Pattern, Configuration Dataclasses.

Analyzed on March 31, 2026 by CodeSea. Written by .