huggingface/peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

20,974 stars Python 8 components

Applies parameter-efficient fine-tuning methods to pretrained models using various adapter architectures

Users create a PeftConfig specifying the adapter method and hyperparameters, then call get_peft_model() which wraps their base model with adapter functionality. During training, input tensors flow through the base model with active adapters adding their learned parameters to the forward pass. Adapter weights are saved separately from the base model and can be loaded, merged, or composed with other adapters.

Under the hood, the system uses 2 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.

A 8-component library. 381 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Configuration creation — User instantiates a method-specific config like LoraConfig(r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj']) specifying which layers to adapt and hyperparameters (config: lora_r, lora_alpha, lora_dropout +1)
Model wrapping — get_peft_model() uses PEFT_TYPE_TO_MODEL_MAPPING to find the correct tuner class, then creates a PeftModel wrapper that replaces target modules with adapter-enabled versions [PeftConfig → PeftModel] (config: peft_type, task_type, target_modules)
Layer replacement — The tuner implementation (e.g., LoraModel) identifies target modules by name patterns, replaces them with adapter layers (e.g., LoRALayer), and initializes adapter parameters [PeftModel → LoRALayer] (config: target_modules, r, lora_alpha +1)
Forward pass adaptation — During training, input tensors flow through base model layers, but adapter layers compute additional outputs like (B @ A) * scaling and add them to the original layer output [TrainingBatch → TrainingBatch] (config: lora_alpha, scaling)
Gradient accumulation — Gradients are computed only for adapter parameters while base model weights remain frozen, enabling parameter-efficient training with much smaller memory footprint [TrainingBatch]
Adapter persistence — save_pretrained() extracts only the adapter weights (not base model) and saves them with the config as a small checkpoint, typically <1GB vs >10GB for full models [PeftModel]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

PeftConfig src/peft/config.py
Base dataclass with peft_type: PeftType, task_type: TaskType, inference_mode: bool, plus method-specific subclasses like LoraConfig with r: int, lora_alpha: int, target_modules: list[str], lora_dropout: float
Created by user code, passed to get_peft_model(), used by tuner implementations to configure adapter parameters

PeftModel src/peft/peft_model.py
Wrapper class containing base_model: PreTrainedModel, peft_config: dict[str, PeftConfig], active_adapters: list[str], plus state tracking for merged/enabled adapters
Created by get_peft_model(), used throughout training/inference, manages adapter lifecycle and state transitions

LoRALayer src/peft/tuners/lora/layer.py
Module containing lora_A: nn.ModuleDict[str, nn.Linear], lora_B: nn.ModuleDict[str, nn.Linear], scaling: dict[str, float] for low-rank decomposition W + (B @ A) * scaling
Replaces original linear layers in base model, accumulates gradients during training, can be merged back into base weights

ModuleInfo src/peft/peft_model.py
Dataclass with name: str, module_type: str, enabled: bool, active_adapters: list[str], merged_adapters: list[str], available_adapters: list[str], devices: dict[str, list[str]]
Generated by get_model_info(), provides runtime state of all adapters attached to a PeftModel

TrainingBatch examples/*/
Dict with input_ids: torch.Tensor[batch_size, seq_len], attention_mask: torch.Tensor[batch_size, seq_len], labels: torch.Tensor[batch_size, seq_len] for language modeling tasks
Tokenized from raw text, batched by DataCollator, consumed by model.forward() during training

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Shape unguarded

The dataframe df contains exactly two numeric columns with names matching metric_x and metric_y parameters, and these columns have no NaN or infinite values

If this fails: If metric columns are missing, contain NaN values, or have mismatched data types, the Pareto frontier computation silently fails or produces wrong dominance relationships between model configurations

examples/adamss_finetuning/glue_adamss_asa_example.py:compute_pareto_frontier

critical Contract unguarded

The manual update_and_allocate() calls happen at exactly the right training steps and are never called simultaneously with AdamssAsaCallback - the code assumes users follow the exclusive usage pattern documented in comments

If this fails: If both manual calls and callback are used together, ASA state becomes inconsistent leading to incorrect subspace allocation decisions and degraded adapter performance

examples/adamss_finetuning/glue_adamss_asa_manual_example.py:update_and_allocate

critical Environment weakly guarded

CUDA device 'cuda:0' exists and is available when torch.cuda.is_available() returns True, with sufficient VRAM for face alignment model plus ControlNet inference

If this fails: If cuda:0 is occupied by another process or lacks sufficient memory, face alignment initialization fails with cryptic CUDA out-of-memory errors during evaluation

examples/boft_controlnet/eval.py:device_selection

warning Resource unguarded

The system has sufficient memory to load a 4-bit quantized phi3-mini model plus multiple task-specific adapters simultaneously - typically requiring 8-12GB GPU memory

If this fails: On systems with <8GB VRAM, model loading fails with CUDA OOM during adapter composition, but the error message doesn't indicate the specific memory requirements

examples/arrow_multitask/arrow_phi3_mini.py:quantization

warning Domain unguarded

All metrics follow the hardcoded preference directions in metric_preferences dict (e.g., 'test_accuracy' should be maximized, 'train_loss' minimized) and forgetting metrics contain asterisks for pattern matching

If this fails: If new metrics are added without updating preferences or existing metrics change semantics (e.g., a loss that should be maximized), Pareto frontier computation inverts dominance relationships and recommends worse model configurations

method_comparison/app.py:metric_preferences

warning Ordering unguarded

The local server at localhost:8000 can handle num_requests (default 32) concurrent connections without rate limiting or connection refused errors

If this fails: When the server connection pool is exhausted, some async requests hang indefinitely while others succeed, leading to incomplete performance measurements and timeouts

examples/bdlora_finetuning/chat.py:async_requests

critical Contract unguarded

Adapter checkpoint files are in safetensors format with specific key naming conventions that match the PeftModel.from_pretrained() expectations

If this fails: If checkpoints are saved in different formats or have mismatched tensor names, loading fails silently and the model runs with random adapter weights instead of trained ones

examples/boft_controlnet/test_controlnet.py:safetensors_loading

warning Temporal unguarded

ASA callback triggers happen at consistent intervals during training and the model's subspace allocation state remains coherent across callback invocations

If this fails: If training is interrupted and resumed, or if callback timing is inconsistent due to hardware issues, subspace allocation becomes desynchronized leading to suboptimal adapter performance

examples/adamss_finetuning/image_classification_adamss_asa.py:asa_callback

info Scale unguarded

The hardcoded subset sizes (100 train samples, 50 validation samples) provide meaningful signal for AdaMSS hyperparameter validation across different model architectures

If this fails: For models requiring larger datasets to show adapter effectiveness, the tiny test dataset produces misleading results that don't correlate with full-scale performance

examples/adamss_finetuning/test_adamss_quick.py:dataset_subset

warning Environment unguarded

HF_TOKEN environment variable contains a valid HuggingFace authentication token when push_to_hub is True, and the token has write permissions to the specified hub_model_id repository

If this fails: Upload attempts fail with authentication errors but training continues normally, resulting in trained adapters that exist only locally without any error indication

examples/alora_finetuning/alora_finetuning.py:hf_token

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Adapter Registry (registry)
PeftModel tracks multiple adapters by name, maintaining their configs, states (enabled/disabled/merged), and device locations

Base Model Cache (state-store)
Stores reference to original base model and can restore original weights when adapters are unmerged

HuggingFace Hub Cache (cache)
Downloads and caches adapter checkpoints from HF Hub using transformers' caching mechanism

Feedback Loops

Adaptive Rank Adjustment (training-loop, balancing) — Trigger: AdaLoRA importance score computation. Action: Prune low-importance adapter dimensions and grow high-importance ones. Exit: Training completion or rank convergence.
Multi-Adapter Composition (recursive, reinforcing) — Trigger: Multiple adapters enabled simultaneously. Action: Each adapter layer applies its transformation to the output of the previous adapter or base layer. Exit: All active adapters processed.

Delays

Model Download (cache-ttl, ~Seconds to minutes) — First-time loading of base models or adapters from HuggingFace Hub requires download time
Layer Replacement (compilation, ~Milliseconds) — Initial wrapping with get_peft_model() requires traversing model graph and replacing target modules

Control Points

Adapter Enable/Disable (runtime-toggle) — Controls: Whether specific adapters contribute to forward pass computations. Default: Per-adapter boolean state
Inference Mode (feature-flag) — Controls: Whether adapters are merged into base weights for faster inference vs kept separate for flexibility. Default: inference_mode boolean in PeftConfig
Target Module Selection (architecture-switch) — Controls: Which layers receive adapters (query/value projections, all linear layers, specific named modules). Default: target_modules list in config
Rank Hyperparameter (hyperparameter) — Controls: Bottleneck dimension determining adapter capacity and trainable parameter count. Default: r integer in LoraConfig

Technology Stack

PyTorch (framework)
Provides tensor operations, autograd, and nn.Module base classes for implementing adapter layers and managing gradients

Transformers (library)
Supplies base model architectures, tokenizers, and training utilities that PEFT adapters are applied to

Accelerate (library)
Handles distributed training, mixed precision, and device management for PEFT models across multiple GPUs/TPUs

Safetensors (serialization)
Serializes adapter weights in a secure, efficient format for saving/loading checkpoints

HuggingFace Hub (infra)
Hosts and serves pretrained adapters, enabling easy sharing and reuse of PEFT checkpoints

BitsAndBytes (library)
Enables quantized training with PEFT adapters on top of 4-bit/8-bit base models for memory efficiency

Key Components

get_peft_model (factory) — Creates a PeftModel wrapper around a base model using the specified adapter configuration - the main entry point for applying PEFT methods src/peft/mapping.py
PeftModel (orchestrator) — Manages multiple adapters attached to a base model, handles adapter state transitions (enable/disable/merge), and routes forward passes through active adapters src/peft/peft_model.py
LoraModel (transformer) — Implements Low-Rank Adaptation by replacing target linear layers with LoRALayer modules that learn low-rank updates W + (B @ A) * scaling src/peft/tuners/lora/model.py
TaskType (registry) — Enum defining supported task types (CAUSAL_LM, SEQ_2_SEQ_LM, TOKEN_CLS, etc.) that determine which model outputs to use for different objectives src/peft/utils/peft_types.py
AdaLoraModel (optimizer) — Extends LoRA with adaptive rank selection - dynamically adjusts the rank of each adapter during training based on importance scores src/peft/tuners/adalora/model.py
PromptTuningModel (transformer) — Prepends learnable prompt tokens to input embeddings instead of modifying model weights - useful for frozen model adaptation src/peft/tuners/prompt_tuning/model.py
HubMixin (adapter) — Provides save_pretrained() and from_pretrained() methods for uploading/downloading adapter weights to/from HuggingFace Hub src/peft/utils/hub_utils.py
PEFT_TYPE_TO_MODEL_MAPPING (registry) — Maps PeftType enum values to their corresponding implementation classes (LORA -> LoraModel, ADALORA -> AdaLoraModel, etc.) src/peft/mapping.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Compare peft

Related Library Repositories

Frequently Asked Questions

What is peft used for?

Applies parameter-efficient fine-tuning methods to pretrained models using various adapter architectures huggingface/peft is a 8-component library written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 381 files.

How is peft architected?

peft is organized into 4 architecture layers: Configuration Layer, Model Wrapper Layer, Tuner Implementation Layer, Integration Layer. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through peft?

Data moves through 6 stages: Configuration creation → Model wrapping → Layer replacement → Forward pass adaptation → Gradient accumulation → .... Users create a PeftConfig specifying the adapter method and hyperparameters, then call get_peft_model() which wraps their base model with adapter functionality. During training, input tensors flow through the base model with active adapters adding their learned parameters to the forward pass. Adapter weights are saved separately from the base model and can be loaded, merged, or composed with other adapters. This pipeline design reflects a complex multi-stage processing system.

What technologies does peft use?

The core stack includes PyTorch (Provides tensor operations, autograd, and nn.Module base classes for implementing adapter layers and managing gradients), Transformers (Supplies base model architectures, tokenizers, and training utilities that PEFT adapters are applied to), Accelerate (Handles distributed training, mixed precision, and device management for PEFT models across multiple GPUs/TPUs), Safetensors (Serializes adapter weights in a secure, efficient format for saving/loading checkpoints), HuggingFace Hub (Hosts and serves pretrained adapters, enabling easy sharing and reuse of PEFT checkpoints), BitsAndBytes (Enables quantized training with PEFT adapters on top of 4-bit/8-bit base models for memory efficiency). A focused set of dependencies that keeps the build manageable.

What system dynamics does peft have?

peft exhibits 3 data pools (Adapter Registry, Base Model Cache), 2 feedback loops, 4 control points, 2 delays. The feedback loops handle training-loop and recursive. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does peft use?

4 design patterns detected: Adapter Pattern, Strategy Pattern, Registry Pattern, Mixin Pattern.

How does peft compare to alternatives?

CodeSea has side-by-side architecture comparisons of peft with unsloth, trl. These comparisons show tech stack differences, pipeline design, system behavior, and code patterns. See the comparison pages above for detailed analysis.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.

huggingface/peft

How Data Flows Through the System

Data Models

Hidden Assumptions

System Behavior

Data Pools

Feedback Loops

Delays

Control Points

Technology Stack

Key Components

Explore the interactive analysis

Compare peft

peft vs Unsloth

peft vs Trl

Related Library Repositories

tensorflow/tensorflow

automatic1111/stable-diffusion-webui

huggingface/transformers

ggml-org/llama.cpp

pytorch/pytorch

openai/whisper

Frequently Asked Questions