scitools/iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data

713 stars Python 9 components

Loads, analyses, and visualizes multi-dimensional Earth science datasets from various formats

Data enters Iris through format-specific loaders that detect file types and create data proxies for lazy loading. These proxies are wrapped in Cube objects with coordinates extracted from file metadata. Analysis operations transform the cubes by manipulating their data arrays and coordinate systems, often triggering lazy evaluation only when results are needed. Finally, data exits through format-specific writers or visualization functions that render the multi-dimensional arrays with proper geospatial context.

Under the hood, the system uses 2 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.

A 9-component library. 721 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Data enters Iris through format-specific loaders that detect file types and create data proxies for lazy loading. These proxies are wrapped in Cube objects with coordinates extracted from file metadata. Analysis operations transform the cubes by manipulating their data arrays and coordinate systems, often triggering lazy evaluation only when results are needed. Finally, data exits through format-specific writers or visualization functions that render the multi-dimensional arrays with proper geospatial context.

  1. Format detection and file parsing — The load function examines file extensions and headers to determine the appropriate format loader (NetCDF, GRIB, UM, etc.), then delegates to format-specific parsers that extract metadata without loading large data arrays
  2. Data proxy creation — Format loaders create data proxy objects (like NetCDFDataProxy) that provide array-like interfaces to file data without loading it into memory, enabling lazy evaluation throughout the pipeline [File metadata → LazyArray]
  3. Coordinate extraction and standardization — Coordinate variables from files are converted into DimCoord and AuxCoord objects with proper units, standard names, and CF metadata compliance, establishing the dimensional structure of the dataset [File metadata → DimCoord]
  4. Cube construction and metadata enrichment — The Cube constructor combines data arrays, coordinates, and metadata into a unified object, applying CF conventions and resolving coordinate systems for proper geospatial reference [LazyArray → Cube]
  5. Analysis operations and transformations — User-requested operations like aggregation, interpolation, or regridding are applied to Cube objects, manipulating data arrays and coordinate systems while maintaining metadata consistency and lazy evaluation where possible [Cube → Cube]
  6. Data realization and output — When concrete results are needed (for saving or plotting), lazy arrays are computed through dask, data is written to output formats with appropriate metadata, or passed to matplotlib/cartopy for visualization [Cube]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Cube lib/iris/cube.py
Container with data: np.ndarray or dask.array (shape varies), dim_coords: list[DimCoord], aux_coords: list[AuxCoord], cell_measures: list[CellMeasure], ancillary_variables: list[AncillaryVariable], attributes: dict, cell_methods: list[CellMethod]
Created by file loaders from raw data, enriched with coordinates and metadata, passed through analysis operations that may transform dimensions or data, and serialized back to files or rendered as plots
DimCoord lib/iris/coords.py
Coordinate with points: np.ndarray (1D, length matches cube dimension), bounds: np.ndarray or None (2D for intervals), standard_name: str, units: cf_units.Unit, attributes: dict
Created during file loading from dimension variables, used to define cube structure and enable coordinate-based indexing and operations like aggregation along axes
AuxCoord lib/iris/coords.py
Coordinate with points: np.ndarray (N-D, arbitrary shape), bounds: np.ndarray or None, standard_name: str or None, long_name: str or None, units: cf_units.Unit, attributes: dict
Created for auxiliary coordinate variables that don't define cube dimensions but provide additional spatial or temporal reference information
Mesh lib/iris/experimental/ugrid/mesh.py
UGRID mesh with topology_dimension: int, node_coords_and_axes: list[tuple[Coord, str]], face_node_connectivity: Connectivity, face_coords_and_axes: list[tuple[Coord, str]], edge_node_connectivity: Connectivity or None
Created from UGRID mesh files defining connectivity between nodes, edges, and faces, used to create MeshCoords for unstructured data analysis
LazyArray lib/iris/_lazy_data.py
Dask array wrapper with array: dask.array.Array, dtype: np.dtype, shape: tuple[int], chunks: tuple[tuple[int]]
Created during lazy loading to defer data reading from disk, manipulated through dask operations during analysis, and realized to numpy arrays only when needed

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract weakly guarded

The DATA_GEN_PYTHON environment variable points to a Python executable that has all required dependencies (this repo, Mule, test modules) installed in its environment

If this fails: Data generation functions will fail with import errors or missing dependencies at runtime, corrupting benchmark datasets or causing benchmark failures

benchmarks/benchmarks/generate_data/__init__.py:DATA_GEN_PYTHON
critical Temporal unguarded

The external Python environment remains stable and available throughout the entire benchmarking run sequence, which may span multiple commits and hours

If this fails: Mid-benchmark failures when the external environment becomes unavailable, requiring full benchmark restart and invalidating timing comparisons across commits

benchmarks/benchmarks/generate_data/__init__.py:run_function_elsewhere
warning Resource unguarded

System has sufficient memory to load UM files with shape (1920, 2560) of float32 data (~19MB per cube) plus coordinate arrays without memory pressure

If this fails: Silent memory swapping causes benchmark timing to include disk I/O, making results unreliable, or OOM kills benchmark process

benchmarks/benchmarks/cperf/__init__.py:_UM_DIMS_YX
warning Domain guarded

UM files always load longitude/latitude as DimCoords (which are always realized) while LFRic files load them as MeshCoords (which are lazy by default)

If this fails: Benchmark assertions fail if file format behavior changes, and timing comparisons become invalid if coordinate realization strategy differs between formats

benchmarks/benchmarks/cperf/load.py:time_load
critical Environment unguarded

The iris.tests.stock.netcdf module exists and contains the expected functions at the time run_function_elsewhere executes

If this fails: Data generation fails with AttributeError when checking out older Iris commits that don't have expected stock functions, breaking benchmark runs across commit history

benchmarks/benchmarks/generate_data/stock.py:_create_file__xios_common
warning Ordering unguarded

Coordinate dimensions returned by c.cube_dims(source_cube) remain stable throughout the lifetime of the benchmark setup and match the source_cube's dimensional structure

If this fails: Cube construction fails with dimension mismatch errors if source cube's coordinate mapping changes between setup and benchmark execution

benchmarks/benchmarks/cube.py:setup
warning Temporal unguarded

Previously generated benchmark data files remain valid and compatible with current Iris version when REUSE_DATA is enabled

If this fails: Benchmarks use stale data that doesn't match current Iris behavior, producing misleading performance measurements or silent failures due to format incompatibilities

benchmarks/benchmarks/generate_data/__init__.py:REUSE_DATA
warning Scale unguarded

The cubesphere size calculation int(np.sqrt(np.prod(_UM_DIMS_YX) / 6)) produces a valid cubesphere dimension that can be handled by LFRic mesh generation

If this fails: Mesh generation fails when calculated cubesphere size exceeds implementation limits or produces invalid mesh topology, causing benchmark crashes

benchmarks/benchmarks/cperf/__init__.py:_N_CUBESPHERE_UM_EQUIVALENT
critical Resource unguarded

Object persistence between ASV repeat runs behaves consistently - objects created in setup() will remain modified after first benchmark run

If this fails: Subsequent benchmark runs operate on already-modified objects, producing invalid timing measurements that don't reflect real-world performance

benchmarks/benchmarks/aggregate_collapse.py:disable_repeat_between_setup
warning Environment unguarded

The checked-out commit of Iris contains parseable setup.py/pyproject.toml with standard Python packaging metadata for dependency extraction

If this fails: Environment preparation fails when checking out commits with non-standard build configurations, breaking benchmark runs for historical commits

benchmarks/asv_delegated.py:_prep_env_override

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Lazy data arrays (in-memory)
Dask arrays that represent file data without loading it into memory, enabling scalable processing of large Earth science datasets
Coordinate caches (cache)
Cached coordinate values and bounds to avoid repeated computation during coordinate operations and cube manipulations
File format registry (registry)
Registry mapping file extensions and format signatures to appropriate loader classes for automatic format detection

Feedback Loops

Delays

Control Points

Technology Stack

Dask (compute)
Provides lazy array operations and parallel computation for scalable processing of large Earth science datasets
NumPy (library)
Core array processing and mathematical operations, serving as the foundation for all numerical computations
NetCDF4 (library)
Primary interface for reading and writing NetCDF files, the most common format in Earth sciences
CF-Units (library)
Handles unit conversion and validation according to Climate and Forecast metadata conventions
Matplotlib (library)
Plotting and visualization backend for creating scientific plots and charts from Cube data
Cartopy (library)
Geospatial visualization with map projections and coordinate system transformations for Earth science data
SciPy (library)
Scientific computing algorithms including interpolation and statistical operations used in analysis module
pytest (testing)
Testing framework for comprehensive unit and integration tests across the codebase

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is iris used for?

Loads, analyses, and visualizes multi-dimensional Earth science datasets from various formats scitools/iris is a 9-component library written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 721 files.

How is iris architected?

iris is organized into 5 architecture layers: File Format Layer, Core Data Model, Analysis Operations, Mesh Support, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through iris?

Data moves through 6 stages: Format detection and file parsing → Data proxy creation → Coordinate extraction and standardization → Cube construction and metadata enrichment → Analysis operations and transformations → .... Data enters Iris through format-specific loaders that detect file types and create data proxies for lazy loading. These proxies are wrapped in Cube objects with coordinates extracted from file metadata. Analysis operations transform the cubes by manipulating their data arrays and coordinate systems, often triggering lazy evaluation only when results are needed. Finally, data exits through format-specific writers or visualization functions that render the multi-dimensional arrays with proper geospatial context. This pipeline design reflects a complex multi-stage processing system.

What technologies does iris use?

The core stack includes Dask (Provides lazy array operations and parallel computation for scalable processing of large Earth science datasets), NumPy (Core array processing and mathematical operations, serving as the foundation for all numerical computations), NetCDF4 (Primary interface for reading and writing NetCDF files, the most common format in Earth sciences), CF-Units (Handles unit conversion and validation according to Climate and Forecast metadata conventions), Matplotlib (Plotting and visualization backend for creating scientific plots and charts from Cube data), Cartopy (Geospatial visualization with map projections and coordinate system transformations for Earth science data), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does iris have?

iris exhibits 3 data pools (Lazy data arrays, Coordinate caches), 2 feedback loops, 3 control points, 3 delays. The feedback loops handle recursive and convergence. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does iris use?

4 design patterns detected: Lazy Loading with Proxies, Plugin Architecture for File Formats, Metadata-Rich Data Containers, Coordinate System Abstraction.

Analyzed on April 20, 2026 by CodeSea. Written by .