scitools/iris
A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
Earth science data analysis library for NetCDF, GRIB and CF-compliant formats
Data flows from various scientific file formats through format-specific readers into unified Cube objects, then through analysis operations to visualization or output formats
Under the hood, the system uses 2 feedback loops, 2 data pools, 3 control points to manage its runtime behavior.
Structural Verdict
A 8-component weather climate with 2 connections. 717 files analyzed. Minimal connections — components operate mostly in isolation.
How Data Flows Through the System
Data flows from various scientific file formats through format-specific readers into unified Cube objects, then through analysis operations to visualization or output formats
- File Input — Load NetCDF, GRIB, PP, or other Earth science formats
- Format Detection — Automatically detect file format and invoke appropriate reader
- Cube Creation — Convert file data into Iris Cube objects with coordinates and metadata
- Analysis Operations — Apply mathematical operations, aggregations, or transformations
- Output Generation — Save processed data to various formats or create visualizations
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Cache directory for synthetic benchmark data files
Dask arrays storing uncomputed data operations
Feedback Loops
- ASV Environment Refresh (polling, balancing) — Trigger: New Git commit to benchmark. Action: Rebuild isolated Python environment with new Iris version. Exit: Environment successfully created.
- Benchmark Data Reuse (cache-invalidation, balancing) — Trigger: REUSE_DATA flag and missing cache file. Action: Generate synthetic data using external Python process. Exit: Data file exists in cache.
Delays & Async Processing
- External Data Generation (async-processing, ~Variable based on data complexity) — Benchmark setup waits for synthetic data creation in isolated environment
- Lazy Data Realization (async-processing, ~Variable based on operation complexity) — Analysis operations defer computation until data is actually needed
Control Points
- REUSE_DATA (env-var) — Controls: Whether to reuse existing benchmark data files or regenerate them
- DATA_GEN_PYTHON (env-var) — Controls: Python executable path for isolated data generation
- lazy_run parameter (runtime-toggle) — Controls: Whether operations use lazy Dask arrays or eager NumPy arrays
Technology Stack
Core array operations and data structures
Lazy evaluation and parallel computing
NetCDF file format support
Data visualization and plotting
Benchmarking framework for performance tracking
UM file format support (Met Office)
Package building and distribution
Code linting and formatting
Key Components
- Cube (class) — Core data structure representing n-dimensional arrays with coordinates and metadata
lib/iris/cube.py - load (function) — Main function for loading data from various file formats into Cube objects
lib/iris/__init__.py - save (function) — Main function for saving Cube data to various file formats
lib/iris/__init__.py - realistic_4d_w_everything (function) — Generates realistic 4D test cubes with all metadata for benchmarking
benchmarks/benchmarks/generate_data/stock.py - Delegated (class) — Custom ASV environment manager that uses external scripts for benchmark environment setup
benchmarks/asv_delegated.py - run_function_elsewhere (function) — Executes data generation functions in isolated Python environment with fixed Iris version
benchmarks/benchmarks/generate_data/__init__.py - SingleDiagnosticMixin (class) — Base class for CPerf benchmarks comparing UM vs LFRic file format performance
benchmarks/benchmarks/cperf/__init__.py - AggregationMixin (class) — Base class for benchmarking aggregation operations on cube data
benchmarks/benchmarks/aggregate_collapse.py
Configuration
codecov.yml (yaml)
coverage.status.project.default.target(string, unknown) — default: autocoverage.status.project.default.threshold(string, unknown) — default: 3%coverage.status.patch(string, unknown) — default: off
lib/iris/common/resolve.py (python-dataclass)
metadata(Any, unknown)points(Any, unknown)bounds(Any, unknown)dims(Any, unknown)container(Any, unknown)mesh(Any, unknown) — default: Nonelocation(Any, unknown) — default: Noneaxis(Any, unknown) — default: None
Science Pipeline
- Load raw data — File format detection and reader dispatch [Variable (file format dependent) → n-dimensional array]
lib/iris/__init__.py - Create Cube object — Wrap data with coordinates and metadata [n-dimensional array → Cube with dims (time, level, lat, lon) typical]
lib/iris/cube.py - Apply analysis operations — Aggregations, interpolations, mathematical operations [Original cube dimensions → Transformed based on operation]
benchmarks/benchmarks/aggregate_collapse.py - Generate synthetic data — Create test cubes with specified dimensions for benchmarking [Parameter-defined (x, y, z, t) → (t, z, y, x) typical Earth science order]
benchmarks/benchmarks/generate_data/stock.py
Assumptions & Constraints
- [warning] Assumes UM data has shape (1920, 2560) and calculates equivalent cubesphere size without validation (shape)
- [warning] Assumes XIOS functions control save location and moves files without checking success (format)
- [critical] Assumes Mule package is available for UM file generation but no graceful fallback (dependency)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Weather Climate Repositories
Frequently Asked Questions
What is iris used for?
Earth science data analysis library for NetCDF, GRIB and CF-compliant formats scitools/iris is a 8-component weather climate written in Python. Minimal connections — components operate mostly in isolation. The codebase contains 717 files.
How is iris architected?
iris is organized into 4 architecture layers: Core Library, File Formats, Benchmarks, Documentation. Minimal connections — components operate mostly in isolation. This layered structure keeps concerns separated and modules independent.
How does data flow through iris?
Data moves through 5 stages: File Input → Format Detection → Cube Creation → Analysis Operations → Output Generation. Data flows from various scientific file formats through format-specific readers into unified Cube objects, then through analysis operations to visualization or output formats This pipeline design reflects a complex multi-stage processing system.
What technologies does iris use?
The core stack includes NumPy (Core array operations and data structures), Dask (Lazy evaluation and parallel computing), NetCDF4 (NetCDF file format support), Matplotlib (Data visualization and plotting), ASV (Benchmarking framework for performance tracking), Mule (UM file format support (Met Office)), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does iris have?
iris exhibits 2 data pools (BENCHMARK_DATA, Cube lazy data), 2 feedback loops, 3 control points, 2 delays. The feedback loops handle polling and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does iris use?
4 design patterns detected: Mixin Classes, External Data Generation, Lazy Data Operations, Format-Agnostic Interface.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.