xarray-contrib/xarray-spatial

Spatial analysis algorithms for xarray implemented in numba

936 stars Python 10 components

High-performance spatial analysis algorithms for raster data in Python

Raster data enters as xarray.DataArray objects from files or in-memory arrays. The dispatch system examines the underlying data type (numpy, dask, cupy) and selects the appropriate backend implementation. Spatial algorithms apply domain-specific transformations using numba-compiled kernels, preserving coordinate reference systems and spatial metadata. Results return as new DataArrays that can be chained through additional operations or exported to standard geospatial formats.

Under the hood, the system uses 2 feedback loops, 2 data pools, 2 control points to manage its runtime behavior.

A 10-component library. 292 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Raster data enters as xarray.DataArray objects from files or in-memory arrays. The dispatch system examines the underlying data type (numpy, dask, cupy) and selects the appropriate backend implementation. Spatial algorithms apply domain-specific transformations using numba-compiled kernels, preserving coordinate reference systems and spatial metadata. Results return as new DataArrays that can be chained through additional operations or exported to standard geospatial formats.

  1. Load raster data into xarray DataArray — GeoTIFF reader parses file headers to extract spatial metadata (CRS, geotransform), loads pixel data into numpy/dask/cupy arrays, creates DataArray with proper coordinate dimensions
  2. Backend dispatch based on array type — dispatch_backend function inspects DataArray.data type - numpy arrays use CPU kernels, dask arrays enable distributed processing, cupy arrays trigger GPU computation paths [SpatialDataArray]
  3. Execute spatial algorithm with selected backend — Algorithm functions like flow_direction or reclassify call backend-specific implementations - numba.jit kernels for numpy, distributed tasks for dask, CUDA kernels for cupy [SpatialDataArray → SpatialDataArray]
  4. Apply focal operations and convolutions — apply_kernel slides ConvolutionKernel over raster using optimized boundary handling, computing statistics or filtering within neighborhood windows [ConvolutionKernel → Filtered Raster Results]
  5. Generate derived products — Hydrology algorithms chain flow_direction→flow_accumulation→watershed, terrain analysis computes slope→aspect→curvature, multispectral indices combine bands using standard formulas [FlowDirectionGrid → Derived Analysis Products]
  6. Export results to geospatial formats — GeoTiffWriter serializes final DataArrays with preserved spatial metadata, coordinate systems, and compression options for interoperability with GIS software [SpatialDataArray]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

SpatialDataArray xrspatial/_common.py
xarray.DataArray with dims ['y', 'x'], coords containing spatial references (CRS, geotransform), and .data attribute containing numpy.ndarray, dask.array.Array, or cupy.ndarray
Created from raster files or in-memory arrays, passed through dispatch system to select backend, processed by numba-compiled kernels, returned as new DataArray with preserved spatial metadata
ConvolutionKernel xrspatial/convolution.py
2D numpy array with shape (height, width) containing filter weights, wrapped in kernel object with metadata like divisor and offset
Defined as numpy arrays, validated for odd dimensions, applied to raster data through sliding window operations
FlowDirectionGrid xrspatial/hydro/flow_direction.py
Integer-coded DataArray where each cell contains D8 direction values (1,2,4,8,16,32,64,128) indicating flow direction to one of 8 neighbors
Generated from elevation DEM using steepest descent, consumed by downstream hydrology algorithms for watershed delineation and flow routing
ClassificationBins xrspatial/classify.py
numpy array of break values for reclassification, paired with corresponding new_values array of same length
Created from statistical analysis (quantiles, natural breaks, equal intervals), used to map continuous raster values to discrete classes
GeospatialTransform xrspatial/geotiff/_geotags.py
dataclass with 6-parameter affine transformation (scale_x, shear_x, offset_x, shear_y, scale_y, offset_y) plus CRS information
Extracted from raster file headers, used to convert between pixel coordinates and real-world geographic coordinates

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Shape unguarded

Input raster DataArray has exactly 2 spatial dimensions named 'y' and 'x' in that order, with y representing rows and x representing columns in a regular grid

If this fails: If input has different dimension names ('lat'/'lon', 'row'/'col') or wrong order ('x','y'), the polygonize algorithm will silently process wrong axes or crash with confusing dimension errors

xrspatial/polygonize.py:polygonize
critical Domain unguarded

Input elevation DataArray contains physical elevation values where higher numbers mean higher terrain - algorithm computes steepest descent assuming this relationship

If this fails: If input contains inverted values (depth instead of elevation) or non-physical data (categorical codes), flow direction vectors will point uphill instead of downhill, producing completely wrong watershed boundaries

xrspatial/hydro/flow_direction.py:flow_direction
critical Scale unguarded

Source raster contains sparse seed points (non-zero values) with most cells being zero - algorithm efficiency depends on having few source locations to expand from

If this fails: If source raster is dense with many non-zero values, the priority queue-based expansion will consume excessive memory and run orders of magnitude slower than expected

xrspatial/cost_distance.py:cost_distance
critical Contract unguarded

All spatial algorithm implementations across numpy/dask/cupy backends produce numerically identical results for the same input data

If this fails: Users switching backends expecting consistent outputs may get different results due to floating-point precision differences or algorithmic variations between CPU and GPU implementations

xrspatial/utils.py:dispatch_backend
critical Resource unguarded

Memory can hold N_sources cost surfaces simultaneously, each the same size as input raster

If this fails: For large rasters with many source points, memory usage scales as (raster_size × N_sources), causing OOM crashes on systems with insufficient RAM

xrspatial/balanced_allocation.py:balanced_allocation
warning Temporal weakly guarded

RTX triangulation cache remains valid between calls with identical input point coordinates - reuses cached triangulation for performance

If this fails: If point coordinates are numerically identical but values changed, cached triangulation produces stale interpolation results instead of recomputing with new values

xrspatial/gpu_rtx/interpolate.py:RTXTriangulator
warning Ordering weakly guarded

Input bins array is monotonically increasing and new_values array has same length as bins - uses bins as break points for reclassification

If this fails: If bins are unordered or arrays have mismatched lengths, reclassification assigns wrong class values or crashes with index errors during value lookup

xrspatial/classify.py:reclassify
warning Domain unguarded

Input red and near-infrared band DataArrays represent surface reflectance values in range [0,1] or digital numbers that can be directly used in vegetation index formulas

If this fails: If bands contain top-of-atmosphere radiance or atmospherically uncorrected values, computed NDVI values will be biased and not comparable to standard vegetation indices

xrspatial/multispectral.py:ndvi
warning Environment weakly guarded

When type='cupy' is requested, CUDA GPU is available and CuPy is properly installed with compatible CUDA version

If this fails: Benchmarks fail with NotImplementedError if CUDA environment is misconfigured, but error message doesn't specify which CUDA/CuPy version mismatch caused the failure

benchmarks/benchmarks/common.py:get_xr_dataarray
warning Scale unguarded

Erosion simulation iterations parameter is reasonable (hundreds to low thousands) - each iteration processes entire raster for thermal and hydraulic erosion

If this fails: Large iteration counts combined with big rasters cause exponential runtime growth that can run for hours without progress indication or early termination options

xrspatial/erosion.py:erode

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Raster Cache (in-memory)
Intermediate raster results cached during multi-step algorithms to avoid recomputation
Kernel Registry (registry)
Pre-compiled numba kernels indexed by operation type and backend for fast dispatch

Feedback Loops

Delays

Control Points

Technology Stack

xarray (library)
Provides labeled multi-dimensional arrays with coordinate systems for representing georeferenced raster data
numba (compute)
JIT compiles Python functions to native code for high-performance spatial algorithm kernels
dask (compute)
Enables distributed and out-of-core processing of large raster datasets through lazy evaluation and task graphs
cupy (compute)
GPU-accelerated array operations providing CUDA backend for spatial algorithms on NVIDIA hardware
numpy (library)
Foundation array library providing CPU-based numerical computations and memory layout for raster data
rtx (compute)
Hardware-accelerated ray tracing operations for advanced spatial interpolation and rendering on RTX GPUs

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is xarray-spatial used for?

High-performance spatial analysis algorithms for raster data in Python xarray-contrib/xarray-spatial is a 10-component library written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 292 files.

How is xarray-spatial architected?

xarray-spatial is organized into 4 architecture layers: Algorithm Modules, Backend Dispatch, Kernel Definitions, I/O and Format Support. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through xarray-spatial?

Data moves through 6 stages: Load raster data into xarray DataArray → Backend dispatch based on array type → Execute spatial algorithm with selected backend → Apply focal operations and convolutions → Generate derived products → .... Raster data enters as xarray.DataArray objects from files or in-memory arrays. The dispatch system examines the underlying data type (numpy, dask, cupy) and selects the appropriate backend implementation. Spatial algorithms apply domain-specific transformations using numba-compiled kernels, preserving coordinate reference systems and spatial metadata. Results return as new DataArrays that can be chained through additional operations or exported to standard geospatial formats. This pipeline design reflects a complex multi-stage processing system.

What technologies does xarray-spatial use?

The core stack includes xarray (Provides labeled multi-dimensional arrays with coordinate systems for representing georeferenced raster data), numba (JIT compiles Python functions to native code for high-performance spatial algorithm kernels), dask (Enables distributed and out-of-core processing of large raster datasets through lazy evaluation and task graphs), cupy (GPU-accelerated array operations providing CUDA backend for spatial algorithms on NVIDIA hardware), numpy (Foundation array library providing CPU-based numerical computations and memory layout for raster data), rtx (Hardware-accelerated ray tracing operations for advanced spatial interpolation and rendering on RTX GPUs). A focused set of dependencies that keeps the build manageable.

What system dynamics does xarray-spatial have?

xarray-spatial exhibits 2 data pools (Raster Cache, Kernel Registry), 2 feedback loops, 2 control points, 2 delays. The feedback loops handle convergence and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does xarray-spatial use?

4 design patterns detected: Backend Polymorphism, Numba Acceleration, Spatial Metadata Preservation, Comprehensive Benchmarking.

Analyzed on April 20, 2026 by CodeSea. Written by .