ecmwf/cfgrib
A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Python interface to map GRIB meteorological files to xarray/NetCDF using CF conventions
GRIB files are parsed into messages, indexed by coordinates, transformed to CF-compliant fields, and exposed as xarray datasets
Under the hood, the system uses 2 data pools, 3 control points to manage its runtime behavior.
Structural Verdict
A 10-component weather climate with 18 connections. 34 files analyzed. Highly interconnected — components depend on each other heavily.
How Data Flows Through the System
GRIB files are parsed into messages, indexed by coordinates, transformed to CF-compliant fields, and exposed as xarray datasets
- File Reading — FileStream opens GRIB file and provides sequential access to messages using ecCodes
- Message Parsing — Message objects wrap ecCodes handles with lazy data access and metadata extraction
- Indexing — FieldsetIndex groups messages by coordinate keys to identify coherent datasets
- CF Transformation — CfField computes CF-compliant coordinates and metadata from GRIB keys
- Dataset Building — Dataset assembles fields into structured arrays with proper dimensions and attributes
- Coordinate Translation — cf2cdm translates coordinate names and units according to specified data models (config: channels, dependencies)
- xarray Integration — CfGribDataStore exposes the dataset through xarray's backend interface
System Behavior
How the system actually operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
ecCodes file handles and message indices are cached for repeated access
Coordinate-based groupings of GRIB messages for dataset construction
Delays & Async Processing
- Lazy Data Loading (async-processing) — GRIB message data arrays are not loaded until explicitly accessed
- Index Cache TTL (cache-ttl) — Index files are cached to disk to avoid re-parsing large GRIB files
Control Points
- backend_kwargs (runtime-toggle) — Controls: GRIB reading behavior including indexing and filtering options
- filter_by_keys (runtime-toggle) — Controls: Which GRIB messages to include based on key-value filters
- data_model (feature-flag) — Controls: Whether to use CDS, ECMWF, or default coordinate naming conventions
Technology Stack
GRIB file parsing and data extraction
N-dimensional array data structure and NetCDF-like interface
Numerical array operations and data types
Class definition with automatic method generation
Command-line interface framework
Testing framework
Package building and distribution
Environment and dependency management
Key Components
- open_dataset (function) — Main entry point for opening GRIB files as xarray datasets
cfgrib/xarray_store.py - Dataset (class) — Core class that builds structured datasets from GRIB fieldsets with proper metadata
cfgrib/dataset.py - CfGribDataStore (class) — xarray backend store implementation for reading GRIB files
cfgrib/xarray_plugin.py - FieldsetIndex (class) — Indexes GRIB messages by coordinate keys for efficient access
cfgrib/messages.py - Message (class) — Wrapper around ecCodes GRIB message with lazy data loading
cfgrib/messages.py - CfField (class) — Converts GRIB messages to CF-compliant field objects with computed coordinates
cfgrib/cfmessage.py - translate_coords (function) — Transforms coordinate names and metadata according to specified data models
cf2cdm/cfcoords.py - CDS (config) — Coordinate mapping configuration for Copernicus Climate Data Store format
cf2cdm/datamodels.py - ECMWF (config) — Coordinate mapping configuration for ECMWF data format
cf2cdm/datamodels.py - FileStream (class) — Provides sequential access to GRIB messages in a file with caching
cfgrib/messages.py
Sub-Modules
Coordinate translation utilities for mapping GRIB coordinates to CF conventions
Configuration
appveyor.yml (yaml)
environment.matrix(array, unknown) — default: [object Object]install(array, unknown) — default: powershell ./ci/install_python.ps1build(boolean, unknown) — default: falsetest_script(array, unknown) — default: echo Pass
environment-minimal.in.yml (yaml)
channels(array, unknown) — default: defaults,conda-forgedependencies(array, unknown) — default: attrs>=19.2,click,nomkl,numpy,pytest-cov,python-eccodes,tomli
environment-minver.in.yml (yaml)
channels(array, unknown) — default: defaults,conda-forgedependencies(array, unknown) — default: attrs=19.2.0,click=7.0.0,eccodes=2.16.0,numpy=1.15.0,pandas=0.25.0,pytest-cov,python-eccodes=1.4.0,tomli,xarray=0.15.0
environment.in.yml (yaml)
channels(array, unknown) — default: defaults,conda-forgedependencies(array, unknown) — default: attrs>=19.2,click,eccodes>=2.20.0,mypy=0.812,nomkl,numpy,pytest-cov,python-eccodes,scipy,tomli,xarray>=0.20.2
Science Pipeline
- Parse GRIB Messages — eccodes.codes_new_from_file then extract metadata keys
cfgrib/messages.py - Group by Coordinates — Build index mapping coordinate tuples to message lists [(n_messages,) → (n_coordinate_groups,)]
cfgrib/messages.py - Compute CF Coordinates — Transform GRIB coordinate metadata to CF-compliant format
cfgrib/cfmessage.py - Build Dataset Arrays — Stack message data into multi-dimensional arrays with proper coordinate alignment [(n_messages, lat, lon) → (time, level, lat, lon)]
cfgrib/dataset.py - Apply Coordinate Translation — Rename coordinates and convert units according to data model specifications
cf2cdm/cfcoords.py
Assumptions & Constraints
- [warning] Assumes GRIB time encoding follows standard conventions but no validation ensures time units are correctly interpreted (format)
- [warning] Assumes coordinate dimensions can be inferred from GRIB message groupings but irregular grids may cause issues (shape)
- [info] Returns numpy arrays with dtypes determined by ecCodes but no explicit type validation (dtype)
- [info] Unit conversion assumes standard atmospheric pressure ranges without bounds checking (value-range)
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Weather Climate Repositories
Frequently Asked Questions
What is cfgrib used for?
Python interface to map GRIB meteorological files to xarray/NetCDF using CF conventions ecmwf/cfgrib is a 10-component weather climate written in Python. Highly interconnected — components depend on each other heavily. The codebase contains 34 files.
How is cfgrib architected?
cfgrib is organized into 5 architecture layers: Abstract Layer, Message Layer, Dataset Layer, xarray Integration, and 1 more. Highly interconnected — components depend on each other heavily. This layered structure enables tight integration between components.
How does data flow through cfgrib?
Data moves through 7 stages: File Reading → Message Parsing → Indexing → CF Transformation → Dataset Building → .... GRIB files are parsed into messages, indexed by coordinates, transformed to CF-compliant fields, and exposed as xarray datasets This pipeline design reflects a complex multi-stage processing system.
What technologies does cfgrib use?
The core stack includes ecCodes (GRIB file parsing and data extraction), xarray (N-dimensional array data structure and NetCDF-like interface), numpy (Numerical array operations and data types), attrs (Class definition with automatic method generation), click (Command-line interface framework), pytest (Testing framework), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does cfgrib have?
cfgrib exhibits 2 data pools (GRIB File Cache, Fieldset Index), 3 control points, 2 delays. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does cfgrib use?
5 design patterns detected: Abstract Base Classes, Lazy Loading, Plugin Architecture, Data Model Translation, Computed Keys.
Analyzed on March 31, 2026 by CodeSea. Written by Karolina Sarna.