huggingface/peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

20,974 stars Python 8 components

Applies parameter-efficient fine-tuning methods to pretrained models using various adapter architectures

Users create a PeftConfig specifying the adapter method and hyperparameters, then call get_peft_model() which wraps their base model with adapter functionality. During training, input tensors flow through the base model with active adapters adding their learned parameters to the forward pass. Adapter weights are saved separately from the base model and can be loaded, merged, or composed with other adapters.

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

A 8-component library. 381 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Users create a PeftConfig specifying the adapter method and hyperparameters, then call get_peft_model() which wraps their base model with adapter functionality. During training, input tensors flow through the base model with active adapters adding their learned parameters to the forward pass. Adapter weights are saved separately from the base model and can be loaded, merged, or composed with other adapters.

  1. Configuration creation — User instantiates a method-specific config like LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj']) specifying which layers to adapt and hyperparameters (config: lora_r, lora_alpha, lora_dropout +1)
  2. Model wrapping — get_peft_model() uses PEFT_TYPE_TO_MODEL_MAPPING to find the correct tuner class, then creates a PeftModel wrapper that replaces target modules with adapter-enabled versions [PeftConfig → PeftModel] (config: peft_type, task_type, target_modules)
  3. Layer replacement — The tuner implementation (e.g., LoraModel) identifies target modules by name patterns, replaces them with adapter layers (e.g., LoRALayer), and initializes adapter parameters [PeftModel → LoRALayer] (config: target_modules, r, lora_alpha +1)
  4. Forward pass adaptation — During training, input tensors flow through base model layers, but adapter layers compute additional outputs like (B @ A) * scaling and add them to the original layer output [TrainingBatch → TrainingBatch] (config: lora_alpha, scaling)
  5. Gradient accumulation — Gradients are computed only for adapter parameters while base model weights remain frozen, enabling parameter-efficient training with much smaller memory footprint [TrainingBatch]
  6. Adapter persistence — save_pretrained() extracts only the adapter weights (not base model) and saves them with the config as a small checkpoint, typically <1GB vs >10GB for full models [PeftModel]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

PeftConfig src/peft/config.py
Base dataclass with peft_type: PeftType, task_type: TaskType, inference_mode: bool, plus method-specific subclasses like LoraConfig with r: int, lora_alpha: int, target_modules: list[str], lora_dropout: float
Created by user code, passed to get_peft_model(), used by tuner implementations to configure adapter parameters
PeftModel src/peft/peft_model.py
Wrapper class containing base_model: PreTrainedModel, peft_config: dict[str, PeftConfig], active_adapters: list[str], plus state tracking for merged/enabled adapters
Created by get_peft_model(), used throughout training/inference, manages adapter lifecycle and state transitions
LoRALayer src/peft/tuners/lora/layer.py
Module containing lora_A: nn.ModuleDict[str, nn.Linear], lora_B: nn.ModuleDict[str, nn.Linear], scaling: dict[str, float] for low-rank decomposition W + (B @ A) * scaling
Replaces original linear layers in base model, accumulates gradients during training, can be merged back into base weights
ModuleInfo src/peft/peft_model.py
Dataclass with name: str, module_type: str, enabled: bool, active_adapters: list[str], merged_adapters: list[str], available_adapters: list[str], devices: dict[str, list[str]]
Generated by get_model_info(), provides runtime state of all adapters attached to a PeftModel
TrainingBatch examples/*/
Dict with input_ids: torch.Tensor[batch_size, seq_len], attention_mask: torch.Tensor[batch_size, seq_len], labels: torch.Tensor[batch_size, seq_len] for language modeling tasks
Tokenized from raw text, batched by DataCollator, consumed by model.forward() during training

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Shape unguarded

The dataframe df contains exactly two numeric columns with names matching metric_x and metric_y parameters, and these columns have no NaN or infinite values

If this fails: If metric columns are missing, contain NaN values, or have mismatched data types, the Pareto frontier computation silently fails or produces wrong dominance relationships between model configurations

examples/adamss_finetuning/glue_adamss_asa_example.py:compute_pareto_frontier
critical Contract unguarded

The manual update_and_allocate() calls happen at exactly the right training steps and are never called simultaneously with AdamssAsaCallback - the code assumes users follow the exclusive usage pattern documented in comments

If this fails: If both manual calls and callback are used together, ASA state becomes inconsistent leading to incorrect subspace allocation decisions and degraded adapter performance

examples/adamss_finetuning/glue_adamss_asa_manual_example.py:update_and_allocate
critical Environment weakly guarded

CUDA device 'cuda:0' exists and is available when torch.cuda.is_available() returns True, with sufficient VRAM for face alignment model plus ControlNet inference

If this fails: If cuda:0 is occupied by another process or lacks sufficient memory, face alignment initialization fails with cryptic CUDA out-of-memory errors during evaluation

examples/boft_controlnet/eval.py:device_selection
warning Resource unguarded

The system has sufficient memory to load a 4-bit quantized phi3-mini model plus multiple task-specific adapters simultaneously - typically requiring 8-12GB GPU memory

If this fails: On systems with <8GB VRAM, model loading fails with CUDA OOM during adapter composition, but the error message doesn't indicate the specific memory requirements

examples/arrow_multitask/arrow_phi3_mini.py:quantization
warning Domain unguarded

All metrics follow the hardcoded preference directions in metric_preferences dict (e.g., 'test_accuracy' should be maximized, 'train_loss' minimized) and forgetting metrics contain asterisks for pattern matching

If this fails: If new metrics are added without updating preferences or existing metrics change semantics (e.g., a loss that should be maximized), Pareto frontier computation inverts dominance relationships and recommends worse model configurations

method_comparison/app.py:metric_preferences
warning Ordering unguarded

The local server at localhost:8000 can handle num_requests (default 32) concurrent connections without rate limiting or connection refused errors

If this fails: When the server connection pool is exhausted, some async requests hang indefinitely while others succeed, leading to incomplete performance measurements and timeouts

examples/bdlora_finetuning/chat.py:async_requests
critical Contract unguarded

Adapter checkpoint files are in safetensors format with specific key naming conventions that match the PeftModel.from_pretrained() expectations

If this fails: If checkpoints are saved in different formats or have mismatched tensor names, loading fails silently and the model runs with random adapter weights instead of trained ones

examples/boft_controlnet/test_controlnet.py:safetensors_loading
warning Temporal unguarded

ASA callback triggers happen at consistent intervals during training and the model's subspace allocation state remains coherent across callback invocations

If this fails: If training is interrupted and resumed, or if callback timing is inconsistent due to hardware issues, subspace allocation becomes desynchronized leading to suboptimal adapter performance

examples/adamss_finetuning/image_classification_adamss_asa.py:asa_callback
info Scale unguarded

The hardcoded subset sizes (100 train samples, 50 validation samples) provide meaningful signal for AdaMSS hyperparameter validation across different model architectures

If this fails: For models requiring larger datasets to show adapter effectiveness, the tiny test dataset produces misleading results that don't correlate with full-scale performance

examples/adamss_finetuning/test_adamss_quick.py:dataset_subset
warning Environment unguarded

HF_TOKEN environment variable contains a valid HuggingFace authentication token when push_to_hub is True, and the token has write permissions to the specified hub_model_id repository

If this fails: Upload attempts fail with authentication errors but training continues normally, resulting in trained adapters that exist only locally without any error indication

examples/alora_finetuning/alora_finetuning.py:hf_token

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Adapter Registry (registry)
PeftModel tracks multiple adapters by name, maintaining their configs, states (enabled/disabled/merged), and device locations
Base Model Cache (state-store)
Stores reference to original base model and can restore original weights when adapters are unmerged
HuggingFace Hub Cache (cache)
Downloads and caches adapter checkpoints from HF Hub using transformers' caching mechanism

Feedback Loops

Delays

Control Points

Technology Stack

PyTorch (framework)
Provides tensor operations, autograd, and nn.Module base classes for implementing adapter layers and managing gradients
Transformers (library)
Supplies base model architectures, tokenizers, and training utilities that PEFT adapters are applied to
Accelerate (library)
Handles distributed training, mixed precision, and device management for PEFT models across multiple GPUs/TPUs
Safetensors (serialization)
Serializes adapter weights in a secure, efficient format for saving/loading checkpoints
HuggingFace Hub (infra)
Hosts and serves pretrained adapters, enabling easy sharing and reuse of PEFT checkpoints
BitsAndBytes (library)
Enables quantized training with PEFT adapters on top of 4-bit/8-bit base models for memory efficiency

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare peft

Related Library Repositories

Frequently Asked Questions

What is peft used for?

Applies parameter-efficient fine-tuning methods to pretrained models using various adapter architectures huggingface/peft is a 8-component library written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 381 files.

How is peft architected?

peft is organized into 4 architecture layers: Configuration Layer, Model Wrapper Layer, Tuner Implementation Layer, Integration Layer. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through peft?

Data moves through 6 stages: Configuration creation → Model wrapping → Layer replacement → Forward pass adaptation → Gradient accumulation → .... Users create a PeftConfig specifying the adapter method and hyperparameters, then call get_peft_model() which wraps their base model with adapter functionality. During training, input tensors flow through the base model with active adapters adding their learned parameters to the forward pass. Adapter weights are saved separately from the base model and can be loaded, merged, or composed with other adapters. This pipeline design reflects a complex multi-stage processing system.

What technologies does peft use?

The core stack includes PyTorch (Provides tensor operations, autograd, and nn.Module base classes for implementing adapter layers and managing gradients), Transformers (Supplies base model architectures, tokenizers, and training utilities that PEFT adapters are applied to), Accelerate (Handles distributed training, mixed precision, and device management for PEFT models across multiple GPUs/TPUs), Safetensors (Serializes adapter weights in a secure, efficient format for saving/loading checkpoints), HuggingFace Hub (Hosts and serves pretrained adapters, enabling easy sharing and reuse of PEFT checkpoints), BitsAndBytes (Enables quantized training with PEFT adapters on top of 4-bit/8-bit base models for memory efficiency). A focused set of dependencies that keeps the build manageable.

What system dynamics does peft have?

peft exhibits 3 data pools (Adapter Registry, Base Model Cache), 2 feedback loops, 4 control points, 2 delays. The feedback loops handle training-loop and recursive. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does peft use?

4 design patterns detected: Adapter Pattern, Strategy Pattern, Registry Pattern, Mixin Pattern.

How does peft compare to alternatives?

CodeSea has side-by-side architecture comparisons of peft with unsloth, trl. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by .