scitools/iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data

713 stars Python 9 components

Loads, analyses, and visualizes multi-dimensional Earth science datasets from various formats

Data enters Iris through format-specific loaders that detect file types and create data proxies for lazy loading. These proxies are wrapped in Cube objects with coordinates extracted from file metadata. Analysis operations transform the cubes by manipulating their data arrays and coordinate systems, often triggering lazy evaluation only when results are needed. Finally, data exits through format-specific writers or visualization functions that render the multi-dimensional arrays with proper geospatial context.

Under the hood, the system uses 2 feedback loops, 3 data pools, 3 control points to manage its runtime behavior.

A 9-component library. 721 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Format detection and file parsing — The load function examines file extensions and headers to determine the appropriate format loader (NetCDF, GRIB, UM, etc.), then delegates to format-specific parsers that extract metadata without loading large data arrays
Data proxy creation — Format loaders create data proxy objects (like NetCDFDataProxy) that provide array-like interfaces to file data without loading it into memory, enabling lazy evaluation throughout the pipeline [File metadata → LazyArray]
Coordinate extraction and standardization — Coordinate variables from files are converted into DimCoord and AuxCoord objects with proper units, standard names, and CF metadata compliance, establishing the dimensional structure of the dataset [File metadata → DimCoord]
Cube construction and metadata enrichment — The Cube constructor combines data arrays, coordinates, and metadata into a unified object, applying CF conventions and resolving coordinate systems for proper geospatial reference [LazyArray → Cube]
Analysis operations and transformations — User-requested operations like aggregation, interpolation, or regridding are applied to Cube objects, manipulating data arrays and coordinate systems while maintaining metadata consistency and lazy evaluation where possible [Cube → Cube]
Data realization and output — When concrete results are needed (for saving or plotting), lazy arrays are computed through dask, data is written to output formats with appropriate metadata, or passed to matplotlib/cartopy for visualization [Cube]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Cube lib/iris/cube.py
Container with data: np.ndarray or dask.array (shape varies), dim_coords: list[DimCoord], aux_coords: list[AuxCoord], cell_measures: list[CellMeasure], ancillary_variables: list[AncillaryVariable], attributes: dict, cell_methods: list[CellMethod]
Created by file loaders from raw data, enriched with coordinates and metadata, passed through analysis operations that may transform dimensions or data, and serialized back to files or rendered as plots

DimCoord lib/iris/coords.py
Coordinate with points: np.ndarray (1D, length matches cube dimension), bounds: np.ndarray or None (2D for intervals), standard_name: str, units: cf_units.Unit, attributes: dict
Created during file loading from dimension variables, used to define cube structure and enable coordinate-based indexing and operations like aggregation along axes

AuxCoord lib/iris/coords.py
Coordinate with points: np.ndarray (N-D, arbitrary shape), bounds: np.ndarray or None, standard_name: str or None, long_name: str or None, units: cf_units.Unit, attributes: dict
Created for auxiliary coordinate variables that don't define cube dimensions but provide additional spatial or temporal reference information

Mesh lib/iris/experimental/ugrid/mesh.py
UGRID mesh with topology_dimension: int, node_coords_and_axes: list[tuple[Coord, str]], face_node_connectivity: Connectivity, face_coords_and_axes: list[tuple[Coord, str]], edge_node_connectivity: Connectivity or None
Created from UGRID mesh files defining connectivity between nodes, edges, and faces, used to create MeshCoords for unstructured data analysis

LazyArray lib/iris/_lazy_data.py
Dask array wrapper with array: dask.array.Array, dtype: np.dtype, shape: tuple[int], chunks: tuple[tuple[int]]
Created during lazy loading to defer data reading from disk, manipulated through dask operations during analysis, and realized to numpy arrays only when needed

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract weakly guarded

The DATA_GEN_PYTHON environment variable points to a Python executable that has all required dependencies (this repo, Mule, test modules) installed in its environment

If this fails: Data generation functions will fail with import errors or missing dependencies at runtime, corrupting benchmark datasets or causing benchmark failures

benchmarks/benchmarks/generate_data/__init__.py:DATA_GEN_PYTHON

critical Temporal unguarded

The external Python environment remains stable and available throughout the entire benchmarking run sequence, which may span multiple commits and hours

If this fails: Mid-benchmark failures when the external environment becomes unavailable, requiring full benchmark restart and invalidating timing comparisons across commits

benchmarks/benchmarks/generate_data/__init__.py:run_function_elsewhere

warning Resource unguarded

System has sufficient memory to load UM files with shape (1920, 2560) of float32 data (~19MB per cube) plus coordinate arrays without memory pressure

If this fails: Silent memory swapping causes benchmark timing to include disk I/O, making results unreliable, or OOM kills benchmark process

benchmarks/benchmarks/cperf/__init__.py:_UM_DIMS_YX

warning Domain guarded

UM files always load longitude/latitude as DimCoords (which are always realized) while LFRic files load them as MeshCoords (which are lazy by default)

If this fails: Benchmark assertions fail if file format behavior changes, and timing comparisons become invalid if coordinate realization strategy differs between formats

benchmarks/benchmarks/cperf/load.py:time_load

critical Environment unguarded

The iris.tests.stock.netcdf module exists and contains the expected functions at the time run_function_elsewhere executes

If this fails: Data generation fails with AttributeError when checking out older Iris commits that don't have expected stock functions, breaking benchmark runs across commit history

benchmarks/benchmarks/generate_data/stock.py:_create_file__xios_common

warning Ordering unguarded

Coordinate dimensions returned by c.cube_dims(source_cube) remain stable throughout the lifetime of the benchmark setup and match the source_cube's dimensional structure

If this fails: Cube construction fails with dimension mismatch errors if source cube's coordinate mapping changes between setup and benchmark execution

benchmarks/benchmarks/cube.py:setup

warning Temporal unguarded

Previously generated benchmark data files remain valid and compatible with current Iris version when REUSE_DATA is enabled

If this fails: Benchmarks use stale data that doesn't match current Iris behavior, producing misleading performance measurements or silent failures due to format incompatibilities

benchmarks/benchmarks/generate_data/__init__.py:REUSE_DATA

warning Scale unguarded

The cubesphere size calculation int(np.sqrt(np.prod(_UM_DIMS_YX) / 6)) produces a valid cubesphere dimension that can be handled by LFRic mesh generation

If this fails: Mesh generation fails when calculated cubesphere size exceeds implementation limits or produces invalid mesh topology, causing benchmark crashes

benchmarks/benchmarks/cperf/__init__.py:_N_CUBESPHERE_UM_EQUIVALENT

critical Resource unguarded

Object persistence between ASV repeat runs behaves consistently - objects created in setup() will remain modified after first benchmark run

If this fails: Subsequent benchmark runs operate on already-modified objects, producing invalid timing measurements that don't reflect real-world performance

benchmarks/benchmarks/aggregate_collapse.py:disable_repeat_between_setup

warning Environment unguarded

The checked-out commit of Iris contains parseable setup.py/pyproject.toml with standard Python packaging metadata for dependency extraction

If this fails: Environment preparation fails when checking out commits with non-standard build configurations, breaking benchmark runs for historical commits

benchmarks/asv_delegated.py:_prep_env_override

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Lazy data arrays (in-memory)
Dask arrays that represent file data without loading it into memory, enabling scalable processing of large Earth science datasets

Coordinate caches (cache)
Cached coordinate values and bounds to avoid repeated computation during coordinate operations and cube manipulations

File format registry (registry)
Registry mapping file extensions and format signatures to appropriate loader classes for automatic format detection

Feedback Loops

Lazy evaluation deferral (recursive, balancing) — Trigger: Analysis operations on large datasets. Action: Operations create new dask computation graphs without triggering computation, allowing chaining of operations. Exit: When concrete data is requested through .data property or save/plot operations.
Coordinate alignment during merge (convergence, balancing) — Trigger: CubeList.merge() with misaligned coordinates. Action: Iteratively compares coordinate metadata and values to find common dimensional structure. Exit: When all cubes align on common coordinates or merge fails due to incompatibility.

Delays

Lazy data loading (async-processing, ~Variable (depends on file size and disk I/O)) — File metadata is available immediately but large data arrays are only loaded when accessed
Dask computation scheduling (batch-window, ~Variable (depends on computation graph complexity)) — Multiple operations are batched together and executed when compute() is called
Coordinate bound calculation (cache-ttl) — Coordinate bounds are calculated on first access and cached for subsequent operations

Control Points

IRIS_FUTURE (feature-flag) — Controls: Enables experimental features and API changes through environment variable flags
Dask chunk size (runtime-toggle) — Controls: Memory usage and parallelization strategy for large array operations
NetCDF engine selection (architecture-switch) — Controls: Which NetCDF library backend to use (netCDF4-python, h5netcdf)

Technology Stack

Dask (compute)
Provides lazy array operations and parallel computation for scalable processing of large Earth science datasets

NumPy (library)
Core array processing and mathematical operations, serving as the foundation for all numerical computations

NetCDF4 (library)
Primary interface for reading and writing NetCDF files, the most common format in Earth sciences

CF-Units (library)
Handles unit conversion and validation according to Climate and Forecast metadata conventions

Matplotlib (library)
Plotting and visualization backend for creating scientific plots and charts from Cube data

Cartopy (library)
Geospatial visualization with map projections and coordinate system transformations for Earth science data

SciPy (library)
Scientific computing algorithms including interpolation and statistical operations used in analysis module

pytest (testing)
Testing framework for comprehensive unit and integration tests across the codebase

Key Components

Cube (registry) — Central data container that holds n-dimensional arrays with associated coordinates, metadata, and lazy evaluation support, providing the primary interface for all data operations lib/iris/cube.py
load (orchestrator) — Coordinates the loading process by detecting file formats, calling appropriate format-specific loaders, and assembling the results into Cube objects with proper metadata lib/iris/__init__.py
NetCDFDataProxy (adapter) — Provides lazy access to NetCDF data arrays without loading them into memory, implementing the array interface for seamless integration with numpy/dask operations lib/iris/fileformats/_nc_load_rules/engine.py
CubeList (processor) — Manages collections of Cube objects and provides operations like merge and concatenate that combine multiple cubes based on coordinate alignment and metadata compatibility lib/iris/cube.py
analysis (transformer) — Implements mathematical operations on Cube data including aggregation (mean, sum), interpolation, regridding, and statistical analysis while preserving coordinate relationships lib/iris/analysis/
CoordSystem (resolver) — Defines coordinate reference systems and map projections, enabling transformation between different spatial coordinate systems and proper geospatial visualization lib/iris/coord_systems.py
MeshCoord (adapter) — Provides coordinate interface for unstructured mesh data, mapping between mesh topology and coordinate values for UGRID-compliant datasets lib/iris/experimental/ugrid/mesh.py
FieldsFileVariant (adapter) — Handles UK Met Office UM fieldsfile format by parsing binary headers and data fields, converting them into Iris-compatible data structures lib/iris/fileformats/um/_ff_replacement.py
as_lazy_data (factory) — Creates dask arrays from various input sources including numpy arrays and data proxies, enabling consistent lazy evaluation throughout the system lib/iris/_lazy_data.py

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is iris used for?

Loads, analyses, and visualizes multi-dimensional Earth science datasets from various formats scitools/iris is a 9-component library written in Python. Data flows through 6 distinct pipeline stages. The codebase contains 721 files.

How is iris architected?

iris is organized into 5 architecture layers: File Format Layer, Core Data Model, Analysis Operations, Mesh Support, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through iris?

Data moves through 6 stages: Format detection and file parsing → Data proxy creation → Coordinate extraction and standardization → Cube construction and metadata enrichment → Analysis operations and transformations → .... Data enters Iris through format-specific loaders that detect file types and create data proxies for lazy loading. These proxies are wrapped in Cube objects with coordinates extracted from file metadata. Analysis operations transform the cubes by manipulating their data arrays and coordinate systems, often triggering lazy evaluation only when results are needed. Finally, data exits through format-specific writers or visualization functions that render the multi-dimensional arrays with proper geospatial context. This pipeline design reflects a complex multi-stage processing system.

What technologies does iris use?

The core stack includes Dask (Provides lazy array operations and parallel computation for scalable processing of large Earth science datasets), NumPy (Core array processing and mathematical operations, serving as the foundation for all numerical computations), NetCDF4 (Primary interface for reading and writing NetCDF files, the most common format in Earth sciences), CF-Units (Handles unit conversion and validation according to Climate and Forecast metadata conventions), Matplotlib (Plotting and visualization backend for creating scientific plots and charts from Cube data), Cartopy (Geospatial visualization with map projections and coordinate system transformations for Earth science data), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does iris have?

iris exhibits 3 data pools (Lazy data arrays, Coordinate caches), 2 feedback loops, 3 control points, 3 delays. The feedback loops handle recursive and convergence. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does iris use?

4 design patterns detected: Lazy Loading with Proxies, Plugin Architecture for File Formats, Metadata-Rich Data Containers, Coordinate System Abstraction.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.