facebookresearch/detectron2
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Facebook's next-generation object detection and instance segmentation research platform
Images flow through backbone feature extraction, proposal generation (for two-stage), ROI processing, and final predictions with loss computation during training
Under the hood, the system uses 3 feedback loops, 3 data pools, 4 control points to manage its runtime behavior.
Structural Verdict
A 10-component ml training with 5 connections. 512 files analyzed. Loosely coupled — components are relatively independent.
How Data Flows Through the System
Images flow through backbone feature extraction, proposal generation (for two-stage), ROI processing, and final predictions with loss computation during training
- Load Dataset — Load COCO annotations and images with dataset-specific formatting (config: DATASETS.TRAIN, DATASETS.TEST)
- Data Augmentation — Apply random augmentations like resize, flip, and crop to training images (config: INPUT.MIN_SIZE_TRAIN, INPUT.MAX_SIZE_TRAIN)
- Backbone Feature Extraction — Extract multi-scale features using ResNet/FPN or other backbone architectures (config: MODEL.BACKBONE.NAME, MODEL.RESNETS.DEPTH)
- Proposal Generation — Generate object proposals using RPN or use precomputed proposals (config: MODEL.RPN.PRE_NMS_TOPK_TRAIN, MODEL.RPN.POST_NMS_TOPK_TRAIN)
- ROI Processing — Extract ROI features and predict boxes/masks/keypoints for each proposal (config: MODEL.ROI_HEADS.NAME, MODEL.ROI_HEADS.NUM_CLASSES)
- Loss Computation — Compute classification, regression, and auxiliary losses during training (config: MODEL.ROI_BOX_HEAD.SMOOTH_L1_BETA, MODEL.RPN.SMOOTH_L1_BETA)
- Optimization — Update model parameters using SGD or other optimizers with learning rate scheduling (config: SOLVER.BASE_LR, SOLVER.IMS_PER_BATCH, SOLVER.STEPS)
- Evaluation — Compute COCO AP metrics on validation set during training (config: TEST.EVAL_PERIOD, MODEL.ROI_HEADS.SCORE_THRESH_TEST)
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Cached pre-trained models downloaded from remote URLs
Preprocessed dataset metadata and annotations
Model weights and training state snapshots
Feedback Loops
- Learning Rate Scheduler (convergence, balancing) — Trigger: Training step milestone. Action: Reduce learning rate by factor. Exit: Training completion.
- Validation Monitoring (polling, balancing) — Trigger: EVAL_PERIOD iterations. Action: Run evaluation on validation set. Exit: Training completion.
- Loss Backpropagation (training-loop, reinforcing) — Trigger: Forward pass completion. Action: Compute gradients and update weights. Exit: MAX_ITER reached.
Delays & Async Processing
- Model Download (async-processing, ~Variable (model size dependent)) — Training startup blocked until pre-trained weights download
- Evaluation Period (scheduled-job, ~EVAL_PERIOD iterations) — Periodic validation runs during training
- Checkpoint Saving (batch-window, ~CHECKPOINT_PERIOD iterations) — Training state persisted at intervals
Control Points
- Learning Rate (threshold) — Controls: Optimizer step size and convergence rate. Default: 0.02
- Batch Size (threshold) — Controls: Memory usage and gradient stability. Default: 16
- Score Threshold (threshold) — Controls: Detection confidence filtering. Default: 0.05
- NMS Threshold (threshold) — Controls: Duplicate detection suppression. Default: 0.5
Technology Stack
Deep learning framework
Configuration management
Dataset evaluation
Facebook research utilities
Computer vision operations
Unit testing
Key Components
- GeneralizedRCNN (model) — Meta-architecture that combines backbone, proposal generator, and ROI heads for two-stage detectors
detectron2/modeling/meta_arch/rcnn.py - DatasetMapper (class) — Transforms raw dataset annotations into model-ready format with augmentations
detectron2/data/dataset_mapper.py - DefaultTrainer (class) — Standard training loop with built-in hooks for logging, evaluation, and checkpointing
detectron2/engine/defaults.py - LazyConfig (config) — New configuration system enabling programmatic config composition with delayed instantiation
detectron2/config/lazy.py - FPN (model) — Feature Pyramid Network for multi-scale feature extraction from backbone
detectron2/modeling/backbone/fpn.py - StandardROIHeads (model) — ROI heads for box prediction and mask prediction in two-stage detectors
detectron2/modeling/roi_heads/roi_heads.py - COCOEvaluator (class) — Evaluates object detection performance using COCO metrics (AP, AR)
detectron2/evaluation/coco_evaluation.py - build_detection_train_loader (function) — Constructs training data loader with sampling, batching, and worker management
detectron2/data/build.py - RetinaNet (model) — Single-stage detector with focal loss for dense object detection
detectron2/modeling/meta_arch/retinanet.py - model_zoo (module) — Pre-trained model registry with automatic download and caching
detectron2/model_zoo/model_zoo.py
Sub-Modules
Dense human pose estimation extension with specialized models and data handling
Vision Transformer-based object detection with specialized backbone and training
Semantic and panoptic segmentation using DeepLab architecture
Configuration
configs/Base-RCNN-C4.yaml (yaml)
MODEL.META_ARCHITECTURE(string, unknown) — default: GeneralizedRCNNMODEL.RPN.PRE_NMS_TOPK_TEST(number, unknown) — default: 6000MODEL.RPN.POST_NMS_TOPK_TEST(number, unknown) — default: 1000MODEL.ROI_HEADS.NAME(string, unknown) — default: Res5ROIHeadsDATASETS.TRAIN(string, unknown) — default: ("coco_2017_train",)DATASETS.TEST(string, unknown) — default: ("coco_2017_val",)SOLVER.IMS_PER_BATCH(number, unknown) — default: 16SOLVER.BASE_LR(number, unknown) — default: 0.02- +4 more parameters
configs/Base-RCNN-DilatedC5.yaml (yaml)
MODEL.META_ARCHITECTURE(string, unknown) — default: GeneralizedRCNNMODEL.RESNETS.OUT_FEATURES(array, unknown) — default: res5MODEL.RESNETS.RES5_DILATION(number, unknown) — default: 2MODEL.RPN.IN_FEATURES(array, unknown) — default: res5MODEL.RPN.PRE_NMS_TOPK_TEST(number, unknown) — default: 6000MODEL.RPN.POST_NMS_TOPK_TEST(number, unknown) — default: 1000MODEL.ROI_HEADS.NAME(string, unknown) — default: StandardROIHeadsMODEL.ROI_HEADS.IN_FEATURES(array, unknown) — default: res5- +14 more parameters
configs/Base-RCNN-FPN.yaml (yaml)
MODEL.META_ARCHITECTURE(string, unknown) — default: GeneralizedRCNNMODEL.BACKBONE.NAME(string, unknown) — default: build_resnet_fpn_backboneMODEL.RESNETS.OUT_FEATURES(array, unknown) — default: res2,res3,res4,res5MODEL.FPN.IN_FEATURES(array, unknown) — default: res2,res3,res4,res5MODEL.ANCHOR_GENERATOR.SIZES(array, unknown) — default: 32,64,128,256,512MODEL.ANCHOR_GENERATOR.ASPECT_RATIOS(array, unknown) — default: 0.5,1,2MODEL.RPN.IN_FEATURES(array, unknown) — default: p2,p3,p4,p5,p6MODEL.RPN.PRE_NMS_TOPK_TRAIN(number, unknown) — default: 2000- +19 more parameters
configs/Cityscapes/mask_rcnn_R_50_FPN.yaml (yaml)
_BASE_(string, unknown) — default: ../Base-RCNN-FPN.yamlMODEL.WEIGHTS(string, unknown) — default: detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pklMODEL.MASK_ON(boolean, unknown) — default: trueMODEL.ROI_HEADS.NUM_CLASSES(number, unknown) — default: 8INPUT.MIN_SIZE_TRAIN(string, unknown) — default: (800, 832, 864, 896, 928, 960, 992, 1024)INPUT.MIN_SIZE_TRAIN_SAMPLING(string, unknown) — default: choiceINPUT.MIN_SIZE_TEST(number, unknown) — default: 1024INPUT.MAX_SIZE_TRAIN(number, unknown) — default: 2048- +8 more parameters
Science Pipeline
- Load Images — cv2.imread then convert BGR->RGB if needed [Variable (H, W, 3) → (H, W, 3)]
detectron2/data/detection_utils.py - Resize Transform — ResizeShortestEdge maintains aspect ratio within bounds [(H, W, 3) → (H', W', 3)]
detectron2/data/transforms/transform.py - Backbone Forward — ResNet/FPN extracts multi-scale features [(N, 3, H, W) → Dict[str, (N, C, H/stride, W/stride)]]
detectron2/modeling/backbone/ - RPN Proposal Generation — Generate object proposals from feature maps [Dict[str, (N, C, H, W)] → (N, num_proposals, 4)]
detectron2/modeling/proposal_generator/rpn.py - ROI Feature Extraction — ROIAlign extracts fixed-size features from proposals [Features + (N, num_proposals, 4) → (N*num_proposals, C, pool_size, pool_size)]
detectron2/modeling/poolers.py - Box/Mask Prediction — FC layers predict class scores and box regression [(N*num_proposals, C, pool_size, pool_size) → scores: (N*num_proposals, num_classes), boxes: (N*num_proposals, 4*num_classes)]
detectron2/modeling/roi_heads/
Assumptions & Constraints
- [warning] Assumes input tensors are (N, C, H, W) format but no explicit assertion in forward pass (shape)
- [info] Expects dataset dict with 'image', 'instances' keys but validation is minimal (format)
- [warning] Assumes box coordinates are normalized to image size but no bounds checking (value-range)
- [critical] Assumes box format is (x1, y1, x2, y2) but no validation of coordinate ordering (format)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Ml Training Repositories
Frequently Asked Questions
What is detectron2 used for?
Facebook's next-generation object detection and instance segmentation research platform facebookresearch/detectron2 is a 10-component ml training written in Python. Loosely coupled — components are relatively independent. The codebase contains 512 files.
How is detectron2 architected?
detectron2 is organized into 5 architecture layers: Configuration Layer, Modeling Layer, Data Layer, Training Engine, and 1 more. Loosely coupled — components are relatively independent. This layered structure keeps concerns separated and modules independent.
How does data flow through detectron2?
Data moves through 8 stages: Load Dataset → Data Augmentation → Backbone Feature Extraction → Proposal Generation → ROI Processing → .... Images flow through backbone feature extraction, proposal generation (for two-stage), ROI processing, and final predictions with loss computation during training This pipeline design reflects a complex multi-stage processing system.
What technologies does detectron2 use?
The core stack includes PyTorch (Deep learning framework), OmegaConf (Configuration management), COCO API (Dataset evaluation), FVCore (Facebook research utilities), OpenCV (Computer vision operations), pytest (Unit testing). A focused set of dependencies that keeps the build manageable.
What system dynamics does detectron2 have?
detectron2 exhibits 3 data pools (Model Registry, Dataset Cache), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle convergence and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does detectron2 use?
4 design patterns detected: Lazy Configuration, Registry Pattern, Hook System, Config Inheritance.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.