xarray-contrib/xarray-spatial

Spatial analysis algorithms for xarray implemented in numba

936 stars Python 10 components

High-performance spatial analysis algorithms for raster data in Python

Raster data enters as xarray.DataArray objects from files or in-memory arrays. The dispatch system examines the underlying data type (numpy, dask, cupy) and selects the appropriate backend implementation. Spatial algorithms apply domain-specific transformations using numba-compiled kernels, preserving coordinate reference systems and spatial metadata. Results return as new DataArrays that can be chained through additional operations or exported to standard geospatial formats.

Under the hood, the system uses 2 feedback loops, 2 data pools, 2 control points to manage its runtime behavior.

A 10-component library. 292 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Load raster data into xarray DataArray — GeoTIFF reader parses file headers to extract spatial metadata (CRS, geotransform), loads pixel data into numpy/dask/cupy arrays, creates DataArray with proper coordinate dimensions
Backend dispatch based on array type — dispatch_backend function inspects DataArray.data type - numpy arrays use CPU kernels, dask arrays enable distributed processing, cupy arrays trigger GPU computation paths [SpatialDataArray]
Execute spatial algorithm with selected backend — Algorithm functions like flow_direction or reclassify call backend-specific implementations - numba.jit kernels for numpy, distributed tasks for dask, CUDA kernels for cupy [SpatialDataArray → SpatialDataArray]
Apply focal operations and convolutions — apply_kernel slides ConvolutionKernel over raster using optimized boundary handling, computing statistics or filtering within neighborhood windows [ConvolutionKernel → Filtered Raster Results]
Generate derived products — Hydrology algorithms chain flow_direction→flow_accumulation→watershed, terrain analysis computes slope→aspect→curvature, multispectral indices combine bands using standard formulas [FlowDirectionGrid → Derived Analysis Products]
Export results to geospatial formats — GeoTiffWriter serializes final DataArrays with preserved spatial metadata, coordinate systems, and compression options for interoperability with GIS software [SpatialDataArray]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

SpatialDataArray xrspatial/_common.py
xarray.DataArray with dims ['y', 'x'], coords containing spatial references (CRS, geotransform), and .data attribute containing numpy.ndarray, dask.array.Array, or cupy.ndarray
Created from raster files or in-memory arrays, passed through dispatch system to select backend, processed by numba-compiled kernels, returned as new DataArray with preserved spatial metadata

ConvolutionKernel xrspatial/convolution.py
2D numpy array with shape (height, width) containing filter weights, wrapped in kernel object with metadata like divisor and offset
Defined as numpy arrays, validated for odd dimensions, applied to raster data through sliding window operations

FlowDirectionGrid xrspatial/hydro/flow_direction.py
Integer-coded DataArray where each cell contains D8 direction values (1,2,4,8,16,32,64,128) indicating flow direction to one of 8 neighbors
Generated from elevation DEM using steepest descent, consumed by downstream hydrology algorithms for watershed delineation and flow routing

ClassificationBins xrspatial/classify.py
numpy array of break values for reclassification, paired with corresponding new_values array of same length
Created from statistical analysis (quantiles, natural breaks, equal intervals), used to map continuous raster values to discrete classes

GeospatialTransform xrspatial/geotiff/_geotags.py
dataclass with 6-parameter affine transformation (scale_x, shear_x, offset_x, shear_y, scale_y, offset_y) plus CRS information
Extracted from raster file headers, used to convert between pixel coordinates and real-world geographic coordinates

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Shape unguarded

Input raster DataArray has exactly 2 spatial dimensions named 'y' and 'x' in that order, with y representing rows and x representing columns in a regular grid

If this fails: If input has different dimension names ('lat'/'lon', 'row'/'col') or wrong order ('x','y'), the polygonize algorithm will silently process wrong axes or crash with confusing dimension errors

xrspatial/polygonize.py:polygonize

critical Domain unguarded

Input elevation DataArray contains physical elevation values where higher numbers mean higher terrain - algorithm computes steepest descent assuming this relationship

If this fails: If input contains inverted values (depth instead of elevation) or non-physical data (categorical codes), flow direction vectors will point uphill instead of downhill, producing completely wrong watershed boundaries

xrspatial/hydro/flow_direction.py:flow_direction

critical Scale unguarded

Source raster contains sparse seed points (non-zero values) with most cells being zero - algorithm efficiency depends on having few source locations to expand from

If this fails: If source raster is dense with many non-zero values, the priority queue-based expansion will consume excessive memory and run orders of magnitude slower than expected

xrspatial/cost_distance.py:cost_distance

critical Contract unguarded

All spatial algorithm implementations across numpy/dask/cupy backends produce numerically identical results for the same input data

If this fails: Users switching backends expecting consistent outputs may get different results due to floating-point precision differences or algorithmic variations between CPU and GPU implementations

xrspatial/utils.py:dispatch_backend

critical Resource unguarded

Memory can hold N_sources cost surfaces simultaneously, each the same size as input raster

If this fails: For large rasters with many source points, memory usage scales as (raster_size × N_sources), causing OOM crashes on systems with insufficient RAM

xrspatial/balanced_allocation.py:balanced_allocation

warning Temporal weakly guarded

RTX triangulation cache remains valid between calls with identical input point coordinates - reuses cached triangulation for performance

If this fails: If point coordinates are numerically identical but values changed, cached triangulation produces stale interpolation results instead of recomputing with new values

xrspatial/gpu_rtx/interpolate.py:RTXTriangulator

warning Ordering weakly guarded

Input bins array is monotonically increasing and new_values array has same length as bins - uses bins as break points for reclassification

If this fails: If bins are unordered or arrays have mismatched lengths, reclassification assigns wrong class values or crashes with index errors during value lookup

xrspatial/classify.py:reclassify

warning Domain unguarded

Input red and near-infrared band DataArrays represent surface reflectance values in range [0,1] or digital numbers that can be directly used in vegetation index formulas

If this fails: If bands contain top-of-atmosphere radiance or atmospherically uncorrected values, computed NDVI values will be biased and not comparable to standard vegetation indices

xrspatial/multispectral.py:ndvi

warning Environment weakly guarded

When type='cupy' is requested, CUDA GPU is available and CuPy is properly installed with compatible CUDA version

If this fails: Benchmarks fail with NotImplementedError if CUDA environment is misconfigured, but error message doesn't specify which CUDA/CuPy version mismatch caused the failure

benchmarks/benchmarks/common.py:get_xr_dataarray

warning Scale unguarded

Erosion simulation iterations parameter is reasonable (hundreds to low thousands) - each iteration processes entire raster for thermal and hydraulic erosion

If this fails: Large iteration counts combined with big rasters cause exponential runtime growth that can run for hours without progress indication or early termination options

xrspatial/erosion.py:erode

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Raster Cache (in-memory)
Intermediate raster results cached during multi-step algorithms to avoid recomputation

Kernel Registry (registry)
Pre-compiled numba kernels indexed by operation type and backend for fast dispatch

Feedback Loops

Iterative Flow Accumulation (convergence, reinforcing) — Trigger: Flow direction calculation identifies pour points. Action: Traverse upstream cells accumulating drainage area until reaching watershed boundaries. Exit: All cells processed or no more upstream contributors.
Erosion Simulation Steps (training-loop, balancing) — Trigger: User-specified iteration count and random seed. Action: Apply thermal and hydraulic erosion operators to elevation surface, update heights based on sediment transport. Exit: Iteration count reached.

Delays

Numba JIT Compilation (compilation, ~First call only, ~1-5 seconds) — Initial function calls trigger kernel compilation, subsequent calls execute at native speed
Dask Task Scheduling (async-processing, ~Variable based on cluster size) — Large raster operations split into tasks, results materialized when .compute() called

Control Points

Backend Selection (architecture-switch) — Controls: Whether operations run on CPU (numpy), distributed (dask), or GPU (cupy/rtx). Default: Runtime detection
Chunk Size Configuration (hyperparameter) — Controls: Memory usage and parallelization granularity for large raster operations. Default: max(1, dimension//2)

Technology Stack

xarray (library)
Provides labeled multi-dimensional arrays with coordinate systems for representing georeferenced raster data

numba (compute)
JIT compiles Python functions to native code for high-performance spatial algorithm kernels

dask (compute)
Enables distributed and out-of-core processing of large raster datasets through lazy evaluation and task graphs

cupy (compute)
GPU-accelerated array operations providing CUDA backend for spatial algorithms on NVIDIA hardware

numpy (library)
Foundation array library providing CPU-based numerical computations and memory layout for raster data

rtx (compute)
Hardware-accelerated ray tracing operations for advanced spatial interpolation and rendering on RTX GPUs

Key Components

dispatch_backend (dispatcher) — Inspects input DataArray.data type and routes computation to numpy, dask, cupy, or rtx backend implementations xrspatial/utils.py
apply_kernel (executor) — Performs sliding window convolution operations on raster data using numba-compiled kernels for focal statistics and filtering xrspatial/focal.py
flow_direction (processor) — Computes D8 flow direction from digital elevation model using steepest descent algorithm xrspatial/hydro/flow_direction.py
flow_accumulation (processor) — Calculates upstream contributing area for each cell by following flow direction paths and accumulating drainage area xrspatial/hydro/flow_accumulation.py
reclassify (transformer) — Maps continuous raster values to discrete classes using user-defined or statistically-derived break points xrspatial/classify.py
GeoTiffWriter (serializer) — Writes xarray DataArrays to GeoTIFF format with proper spatial metadata, tags, and compression without requiring GDAL xrspatial/geotiff/writer.py
multispectral_indices (processor) — Computes vegetation and land cover indices (NDVI, EVI, ARVI, etc.) from multispectral satellite imagery bands xrspatial/multispectral.py
polygonize (transformer) — Converts raster regions with identical values into vector polygon geometries using contour tracing algorithms xrspatial/polygonize.py
surface_curvature (processor) — Calculates terrain curvature measures (plan, profile, mean) from elevation surfaces using second derivatives xrspatial/terrain.py
RTXTriangulator (accelerator) — Uses RTX GPU hardware-accelerated triangulation for fast spatial interpolation of scattered point data xrspatial/gpu_rtx/interpolate.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is xarray-spatial used for?

High-performance spatial analysis algorithms for raster data in Python xarray-contrib/xarray-spatial is a 10-component library written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 292 files.

How is xarray-spatial architected?

xarray-spatial is organized into 4 architecture layers: Algorithm Modules, Backend Dispatch, Kernel Definitions, I/O and Format Support. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through xarray-spatial?

Data moves through 6 stages: Load raster data into xarray DataArray → Backend dispatch based on array type → Execute spatial algorithm with selected backend → Apply focal operations and convolutions → Generate derived products → .... Raster data enters as xarray.DataArray objects from files or in-memory arrays. The dispatch system examines the underlying data type (numpy, dask, cupy) and selects the appropriate backend implementation. Spatial algorithms apply domain-specific transformations using numba-compiled kernels, preserving coordinate reference systems and spatial metadata. Results return as new DataArrays that can be chained through additional operations or exported to standard geospatial formats. This pipeline design reflects a complex multi-stage processing system.

What technologies does xarray-spatial use?

The core stack includes xarray (Provides labeled multi-dimensional arrays with coordinate systems for representing georeferenced raster data), numba (JIT compiles Python functions to native code for high-performance spatial algorithm kernels), dask (Enables distributed and out-of-core processing of large raster datasets through lazy evaluation and task graphs), cupy (GPU-accelerated array operations providing CUDA backend for spatial algorithms on NVIDIA hardware), numpy (Foundation array library providing CPU-based numerical computations and memory layout for raster data), rtx (Hardware-accelerated ray tracing operations for advanced spatial interpolation and rendering on RTX GPUs). A focused set of dependencies that keeps the build manageable.

What system dynamics does xarray-spatial have?

xarray-spatial exhibits 2 data pools (Raster Cache, Kernel Registry), 2 feedback loops, 2 control points, 2 delays. The feedback loops handle convergence and training-loop. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does xarray-spatial use?

4 design patterns detected: Backend Polymorphism, Numba Acceleration, Spatial Metadata Preservation, Comprehensive Benchmarking.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.