Hidden Assumptions in detectron2

11 assumptions this code never checks · 4 critical · spanning Shape, Domain, Contract, Scale, Resource, Temporal, Ordering, Environment

Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at facebookresearch/detectron2 and picked out the few most likely to cause trouble. The full list is just below.

Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".

Worth your attention first

If backbone produces feature maps with unexpected spatial ratios, FPN's lateral connections and top-down pathway will misalign features, causing detection heads to process spatially inconsistent features and produce wrong bounding box coordinates

Worth your attention first

If input contains HDR images, medical images with 16-bit depth, or pre-normalized images in [0,1] range, the normalization will push pixel values far outside expected ranges, causing backbone features to saturate and models to fail silently

Worth your attention first

If RPN generates out-of-bounds proposals due to anchor misconfiguration or image resizing bugs, ROIAlign will sample features from invalid memory locations, causing crashes or corrupted gradients during training

Show everything (8 more)

Scale

Training assumes balanced positive/negative anchor sampling with hardcoded ratios (positive_fraction=0.5, batch_size_per_image=256) but never checks if the dataset actually contains sufficient positive anchors

If this fails: On datasets with very small objects or sparse annotations, most images may have <128 positive anchors available, causing the sampler to pad with duplicate positives or fall back to fewer samples, leading to unstable gradients and poor convergence

detectron2/modeling/proposal_generator/rpn.py:RPN.forward_training

Resource

DataLoader assumes sufficient GPU memory to hold batch_size * max_image_size * num_workers worth of preprocessed images, but never estimates or validates memory requirements

If this fails: Large images (>2000px) or high batch sizes can cause CUDA out-of-memory errors that manifest as cryptic RuntimeError messages mid-training, losing hours of training progress without clear memory usage guidance

detectron2/engine/defaults.py:DefaultTrainer.build_train_loader

Temporal

Checkpoint loading assumes model architecture hasn't changed between save and load - specifically that all parameter names and shapes match exactly

If this fails: If config changes backbone depth (ResNet50->ResNet101) or adds new heads between training runs, checkpoint loading fails with KeyError or shape mismatch, but error messages don't clearly indicate which architectural change caused the incompatibility

detectron2/checkpoint/checkpoint.py:Checkpointer.load

Ordering

DataLoader iteration assumes dataset records can be accessed in any order via __getitem__(index), but some dataset implementations may expect sequential access or have stateful transforms

If this fails: Multi-worker data loading with random sampling can break datasets that maintain internal state or cache, causing inconsistent augmentations or corrupted batches that lead to training instability

detectron2/data/build.py:build_detection_train_loader

Environment

All configs hardcode pixel normalization constants (pixel_mean=[103.530, 116.280, 123.675]) assuming BGR channel order and ImageNet statistics, but never validate actual dataset statistics

If this fails: If dataset uses RGB order, different camera sensors, or domain-specific images (medical, satellite), the hardcoded normalization will shift the data distribution, causing pretrained features to activate incorrectly and reducing model accuracy

configs/common/models/mask_rcnn_fpn.py:model.pixel_mean

Scale

COCO evaluation assumes detection scores are well-calibrated probabilities in [0,1] range and uses fixed IoU thresholds (0.5:0.95) without checking score distribution

If this fails: Models that output uncalibrated confidence scores or use different output ranges may appear to perform poorly in evaluation even if spatial predictions are accurate, masking model quality issues

detectron2/evaluation/coco_evaluation.py:COCOEvaluator._eval_predictions

Contract

Image batching assumes all input tensors have the same number of channels (3 for RGB/BGR) and will pad spatial dimensions to match the largest image in the batch

If this fails: If batch contains grayscale (1-channel) or RGBA (4-channel) images mixed with RGB, tensor concatenation will fail with shape mismatch errors that don't clearly indicate the channel dimension issue

detectron2/structures/image_list.py:ImageList.from_tensors

Domain

Anchor generation assumes object scales follow COCO distribution with default sizes=[32, 64, 128, 256, 512] and aspect ratios=[0.5, 1.0, 2.0], but never adapts to actual dataset object statistics

If this fails: On datasets with very different object scales (e.g., microscopy with tiny objects or aerial imagery with large structures), the fixed anchor sizes will have poor recall, causing the detector to miss objects systematically

detectron2/modeling/anchor_generator.py:DefaultAnchorGenerator.forward

See the full structural analysis of detectron2: the pipeline, data models, and system behavior that put these assumptions in context.

Full analysis of facebookresearch/detectron2 →

Frequently Asked Questions

What does detectron2 assume that could break in production?

The one most likely to cause trouble: All input feature maps from backbone have consistent spatial dimensions that align with expected strides (4, 8, 16, 32, 64) but FPN never validates that input['p2'] has height/width that is exactly 4x smaller than input['p1'] If this fails, If backbone produces feature maps with unexpected spatial ratios, FPN's lateral connections and top-down pathway will misalign features, causing detection heads to process spatially inconsistent features and produce wrong bounding box coordinates

How many hidden assumptions does detectron2 have?

CodeSea found 11 assumptions detectron2 relies on but never validates, 4 of them critical, spanning Shape, Domain, Contract, Scale, Resource, Temporal, Ordering, Environment. Most are routine — the analysis flags the two or three most likely to actually bite.

What is a hidden assumption?

Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.