Hidden Assumptions in xarray-spatial
13 assumptions this code never checks · 5 critical · spanning Shape, Domain, Scale, Contract, Resource, Temporal, Ordering, Environment
Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at xarray-contrib/xarray-spatial and picked out the few most likely to cause trouble. The full list is just below.
Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".
If input has different dimension names ('lat'/'lon', 'row'/'col') or wrong order ('x','y'), the polygonize algorithm will silently process wrong axes or crash with confusing dimension errors
If input contains inverted values (depth instead of elevation) or non-physical data (categorical codes), flow direction vectors will point uphill instead of downhill, producing completely wrong watershed boundaries
If source raster is dense with many non-zero values, the priority queue-based expansion will consume excessive memory and run orders of magnitude slower than expected
Show everything (10 more)
All spatial algorithm implementations across numpy/dask/cupy backends produce numerically identical results for the same input data
If this fails: Users switching backends expecting consistent outputs may get different results due to floating-point precision differences or algorithmic variations between CPU and GPU implementations
xrspatial/utils.py:dispatch_backend
Memory can hold N_sources cost surfaces simultaneously, each the same size as input raster
If this fails: For large rasters with many source points, memory usage scales as (raster_size × N_sources), causing OOM crashes on systems with insufficient RAM
xrspatial/balanced_allocation.py:balanced_allocation
RTX triangulation cache remains valid between calls with identical input point coordinates - reuses cached triangulation for performance
If this fails: If point coordinates are numerically identical but values changed, cached triangulation produces stale interpolation results instead of recomputing with new values
xrspatial/gpu_rtx/interpolate.py:RTXTriangulator
Input bins array is monotonically increasing and new_values array has same length as bins - uses bins as break points for reclassification
If this fails: If bins are unordered or arrays have mismatched lengths, reclassification assigns wrong class values or crashes with index errors during value lookup
xrspatial/classify.py:reclassify
Input red and near-infrared band DataArrays represent surface reflectance values in range [0,1] or digital numbers that can be directly used in vegetation index formulas
If this fails: If bands contain top-of-atmosphere radiance or atmospherically uncorrected values, computed NDVI values will be biased and not comparable to standard vegetation indices
xrspatial/multispectral.py:ndvi
When type='cupy' is requested, CUDA GPU is available and CuPy is properly installed with compatible CUDA version
If this fails: Benchmarks fail with NotImplementedError if CUDA environment is misconfigured, but error message doesn't specify which CUDA/CuPy version mismatch caused the failure
benchmarks/benchmarks/common.py:get_xr_dataarray
Erosion simulation iterations parameter is reasonable (hundreds to low thousands) - each iteration processes entire raster for thermal and hydraulic erosion
If this fails: Large iteration counts combined with big rasters cause exponential runtime growth that can run for hours without progress indication or early termination options
xrspatial/erosion.py:erode
ConvolutionKernel has odd dimensions (3x3, 5x5, etc.) so kernel center is unambiguous for neighborhood operations
If this fails: Even-dimensioned kernels will have undefined center point leading to asymmetric filtering and spatial shifts in focal statistics results
xrspatial/focal.py:apply_kernel
Input DataArray coordinate reference system follows standard EPSG codes and geotransform uses north-up orientation with regular pixel spacing
If this fails: Rotated or skewed rasters with non-standard CRS get written with incorrect spatial metadata, making output files unusable in GIS software that expects standard projections
xrspatial/geotiff/writer.py:GeoTiffWriter
Upscaling operation using np.repeat can fit intermediate arrays in memory - creates (ny//8+1, nx//8+1) base then expands to full (ny, nx) size
If this fails: For very large benchmark rasters, the repeat operation temporarily doubles memory usage which may cause benchmark setup to fail with OOM before actual polygonize testing
benchmarks/benchmark_polygonize.py:make_few_large
See the full structural analysis of xarray-spatial: the pipeline, data models, and system behavior that put these assumptions in context.
Full analysis of xarray-contrib/xarray-spatial →Frequently Asked Questions
What does xarray-spatial assume that could break in production?
The one most likely to cause trouble: Input raster DataArray has exactly 2 spatial dimensions named 'y' and 'x' in that order, with y representing rows and x representing columns in a regular grid If this fails, If input has different dimension names ('lat'/'lon', 'row'/'col') or wrong order ('x','y'), the polygonize algorithm will silently process wrong axes or crash with confusing dimension errors
How many hidden assumptions does xarray-spatial have?
CodeSea found 13 assumptions xarray-spatial relies on but never validates, 5 of them critical, spanning Shape, Domain, Scale, Contract, Resource, Temporal, Ordering, Environment. Most are routine — the analysis flags the two or three most likely to actually bite.
What is a hidden assumption?
Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.