Hidden Assumptions in xarray-spatial

13 assumptions this code never checks · 5 critical · spanning Shape, Domain, Scale, Contract, Resource, Temporal, Ordering, Environment

Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at xarray-contrib/xarray-spatial and picked out the few most likely to cause trouble. The full list is just below.

Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".

Worth your attention first

If input has different dimension names ('lat'/'lon', 'row'/'col') or wrong order ('x','y'), the polygonize algorithm will silently process wrong axes or crash with confusing dimension errors

Worth your attention first

If input contains inverted values (depth instead of elevation) or non-physical data (categorical codes), flow direction vectors will point uphill instead of downhill, producing completely wrong watershed boundaries

Worth your attention first

If source raster is dense with many non-zero values, the priority queue-based expansion will consume excessive memory and run orders of magnitude slower than expected

Show everything (10 more)

Contract

All spatial algorithm implementations across numpy/dask/cupy backends produce numerically identical results for the same input data

If this fails: Users switching backends expecting consistent outputs may get different results due to floating-point precision differences or algorithmic variations between CPU and GPU implementations

xrspatial/utils.py:dispatch_backend

Resource

Memory can hold N_sources cost surfaces simultaneously, each the same size as input raster

If this fails: For large rasters with many source points, memory usage scales as (raster_size × N_sources), causing OOM crashes on systems with insufficient RAM

xrspatial/balanced_allocation.py:balanced_allocation

Temporal

RTX triangulation cache remains valid between calls with identical input point coordinates - reuses cached triangulation for performance

If this fails: If point coordinates are numerically identical but values changed, cached triangulation produces stale interpolation results instead of recomputing with new values

xrspatial/gpu_rtx/interpolate.py:RTXTriangulator

Ordering

Input bins array is monotonically increasing and new_values array has same length as bins - uses bins as break points for reclassification

If this fails: If bins are unordered or arrays have mismatched lengths, reclassification assigns wrong class values or crashes with index errors during value lookup

xrspatial/classify.py:reclassify

Domain

Input red and near-infrared band DataArrays represent surface reflectance values in range [0,1] or digital numbers that can be directly used in vegetation index formulas

If this fails: If bands contain top-of-atmosphere radiance or atmospherically uncorrected values, computed NDVI values will be biased and not comparable to standard vegetation indices

xrspatial/multispectral.py:ndvi

Environment

When type='cupy' is requested, CUDA GPU is available and CuPy is properly installed with compatible CUDA version

If this fails: Benchmarks fail with NotImplementedError if CUDA environment is misconfigured, but error message doesn't specify which CUDA/CuPy version mismatch caused the failure

benchmarks/benchmarks/common.py:get_xr_dataarray

Scale

Erosion simulation iterations parameter is reasonable (hundreds to low thousands) - each iteration processes entire raster for thermal and hydraulic erosion

If this fails: Large iteration counts combined with big rasters cause exponential runtime growth that can run for hours without progress indication or early termination options

xrspatial/erosion.py:erode

Contract

ConvolutionKernel has odd dimensions (3x3, 5x5, etc.) so kernel center is unambiguous for neighborhood operations

If this fails: Even-dimensioned kernels will have undefined center point leading to asymmetric filtering and spatial shifts in focal statistics results

xrspatial/focal.py:apply_kernel

Domain

Input DataArray coordinate reference system follows standard EPSG codes and geotransform uses north-up orientation with regular pixel spacing

If this fails: Rotated or skewed rasters with non-standard CRS get written with incorrect spatial metadata, making output files unusable in GIS software that expects standard projections

xrspatial/geotiff/writer.py:GeoTiffWriter

Resource

Upscaling operation using np.repeat can fit intermediate arrays in memory - creates (ny//8+1, nx//8+1) base then expands to full (ny, nx) size

If this fails: For very large benchmark rasters, the repeat operation temporarily doubles memory usage which may cause benchmark setup to fail with OOM before actual polygonize testing

benchmarks/benchmark_polygonize.py:make_few_large

See the full structural analysis of xarray-spatial: the pipeline, data models, and system behavior that put these assumptions in context.

Full analysis of xarray-contrib/xarray-spatial →

Frequently Asked Questions

What does xarray-spatial assume that could break in production?

The one most likely to cause trouble: Input raster DataArray has exactly 2 spatial dimensions named 'y' and 'x' in that order, with y representing rows and x representing columns in a regular grid If this fails, If input has different dimension names ('lat'/'lon', 'row'/'col') or wrong order ('x','y'), the polygonize algorithm will silently process wrong axes or crash with confusing dimension errors

How many hidden assumptions does xarray-spatial have?

CodeSea found 13 assumptions xarray-spatial relies on but never validates, 5 of them critical, spanning Shape, Domain, Scale, Contract, Resource, Temporal, Ordering, Environment. Most are routine — the analysis flags the two or three most likely to actually bite.

What is a hidden assumption?

Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.