unidata/netcdf4-python

netcdf4-python: python/numpy interface to the netCDF C library

825 stars Cython 6 components

Wraps netCDF C library to read/write scientific array data with numpy integration

Data enters through Dataset constructor which opens netCDF files via the C library and creates Python proxy objects for the file's hierarchical structure. Variables act as lazy array proxies - reads trigger netCDF C library calls that return raw data, which gets converted to numpy arrays with proper dtype mapping and missing value handling. Writes flow the opposite direction: numpy arrays get validated, optionally compressed/chunked, and passed to the C library for storage. Complex numbers get special handling through the nc_complex extension, either as compound types or dimensional encoding.

Under the hood, the system uses 2 feedback loops, 2 data pools, 4 control points to manage its runtime behavior.

A 6-component library. 83 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

File Open and Introspection — Dataset.__init__ calls nc_open from the netCDF C library, then queries the file structure using nc_inq_* functions to populate Python objects for groups, dimensions, and variables without loading actual array data [file path → Dataset] (config: format, diskless, persist +1)
Variable Access Setup — When accessing dataset.variables['name'], a Variable proxy object is created that holds dimension info, chunking parameters, and compression settings from the netCDF file metadata [Dataset → Variable]
Array Data Reading — Variable.__getitem__ with slice notation triggers nc_get_vars calls to the C library, followed by automatic conversion to numpy arrays with proper dtype mapping and masked array creation for missing values [Variable → numpy.ndarray] (config: auto_mask, auto_scale)
Array Data Writing — Variable.__setitem__ validates numpy array shape/dtype compatibility, optionally applies compression (zlib, quantization), then calls nc_put_vars to write data through the C library [numpy.ndarray] (config: zlib, complevel, shuffle +1)
Complex Number Encoding — When auto_complex=True, the nc_complex extension detects complex arrays and either encodes them as compound types with r/i fields or as real arrays with an extra 'complex' dimension for storage compatibility [numpy.ndarray → CompoundType] (config: auto_complex)
File Close and Sync — Dataset.close() or context manager exit calls nc_close to flush pending writes and release the file handle, ensuring data integrity [Dataset]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Dataset src/netCDF4
Python object wrapping netCDF file handle with attributes: groups (dict), dimensions (dict), variables (dict), plus netCDF global attributes
Created when opening netCDF file, provides hierarchical access to all file contents, closed to flush writes and release file handle

Variable src/netCDF4
Array-like object with numpy array data plus netCDF metadata: dimensions (tuple), dtype, attributes (dict), chunking/compression parameters
Defined with dimensions and data type, accepts numpy array assignments, returns numpy arrays on read

numpy.ndarray numpy
N-dimensional array with dtype (float32/64, int32/64, etc.), shape tuple, and optional mask for missing values
Returned from Variable reads, passed to Variable writes, automatically converted between netCDF and numpy data types

CompoundType src/netCDF4
Structured data type with named fields, each having a numpy dtype - used for complex numbers as {r: float64, i: float64}
Created to define custom data structures, used in Variable creation, automatically converted to/from numpy structured arrays

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

The subprocess.run() call to nc-config or pkg-config will always succeed and produce valid stdout - no timeout, permission checks, or command existence validation

If this fails: Build process crashes with subprocess.CalledProcessError or AttributeError on flags.stdout if nc-config is missing, corrupted, or hangs indefinitely

_build/utils.py:get_config_flags

critical Domain weakly guarded

The presence of string 'nc_inq_compound' anywhere in netcdf.h indicates a valid netCDF4 installation - but this could match in comments, string literals, or documentation sections that don't represent actual API availability

If this fails: Build system incorrectly detects netCDF4 support when only netCDF3 is available, leading to link-time failures or runtime crashes when netCDF4-specific functions are called

_build/utils.py:is_netcdf4_include_dir

critical Resource unguarded

MPI.COMM_WORLD and MPI.Info() objects remain valid throughout the Dataset lifetime and all MPI processes can simultaneously open the same file path without filesystem conflicts

If this fails: Parallel writes fail with EACCES or deadlock if filesystem doesn't support concurrent access, or segfault if MPI communicator becomes invalid during file operations

examples/mpi_example.py:Dataset constructor

critical Ordering unguarded

All MPI processes call set_collective(True) in the same order and before any collective write operations - no synchronization barrier enforces this ordering

If this fails: Collective I/O operations deadlock or produce corrupted data if processes don't coordinate their collective mode transitions, especially with processes joining at different times

examples/mpi_example.py:v.set_collective

critical Scale unguarded

NC_MAX_VAR_DIMS (currently 1024) static array size is sufficient for all netCDF variable dimensionalities that will ever be encountered

If this fails: Buffer overflow and memory corruption when processing variables with more dimensions than NC_MAX_VAR_DIMS, potentially allowing arbitrary code execution

external/nc_complex/src/nc_complex.c:coord_one

warning Domain weakly guarded

Complex number detection relies on dimension names exactly matching '_pfnc_complex', 'complex', or 'ri' - case sensitive string comparison with no normalization or fuzzy matching

If this fails: Complex arrays with dimensions named 'Complex', 'COMPLEX', or 'real_imag' get treated as regular float arrays, silently losing complex number semantics and producing wrong mathematical results

external/nc_complex/src/nc_complex.c:known_dim_names

warning Contract unguarded

All three build functions (wheel, sdist, editable) have identical dependency requirements and netcdf4_has_parallel_support() returns consistent results across multiple calls during the same build

If this fails: Inconsistent builds where some build artifacts include mpi4py dependency while others don't, causing import errors when parallel-enabled wheels are installed in environments without mpi4py

_build/backend.py:get_requires_for_build_*

warning Environment unguarded

All files opened with utf-8 encoding are actually UTF-8 encoded - no BOM detection, encoding validation, or fallback for files that might be ASCII, Latin-1, or other encodings

If this fails: UnicodeDecodeError when processing netcdf.h files or setup.cfg files that contain non-UTF-8 characters, breaking the build process on systems with different default encodings

_build/utils.py:OPEN_KWARGS

warning Temporal unguarded

File close/reopen sequence assumes all MPI processes complete their writes and close the file before any process attempts to reopen it - no explicit barrier synchronization

If this fails: Race condition where some processes try to read while others are still writing, leading to incomplete data reads, file corruption, or 'file locked' errors

examples/mpi_example.py:nc.close() and reopen

warning Domain unguarded

NC_COMPLEX_GIT_VERSION macro is always defined at compile time and contains a valid version string - no fallback version or validation of version format

If this fails: Compilation fails with undefined macro error if built outside git repository or with incomplete build configuration, making it impossible to build the extension

external/nc_complex/src/nc_complex.c:pfnc_libvers

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

netCDF file (file-store)
Self-describing binary files containing hierarchical scientific data with metadata following CF conventions

numpy array cache (in-memory)
Temporary storage for array data during read/write operations before conversion between numpy and netCDF formats

Feedback Loops

parallel write coordination (polling, balancing) — Trigger: MPI collective write mode. Action: Each process polls for completion of other processes' writes before proceeding. Exit: All processes complete their portion.
compression optimization (, reinforcing) — Trigger: benchmark script execution. Action: Tests different compression levels and algorithms, measures performance. Exit: All parameter combinations tested.

Delays

build-time feature detection (compilation, ~seconds) — Build process pauses to run nc-config and inspect headers to determine available netCDF features
array chunking (batch-window, ~variable) — Large array operations get broken into chunks based on memory constraints and file chunk size
diskless buffering (cache-ttl, ~until close) — Diskless files accumulate all changes in memory before writing to disk on close

Control Points

parallel support detection (architecture-switch) — Controls: Whether mpi4py gets added as build dependency based on netCDF C library parallel support. Default: runtime detection
compression level (hyperparameter) — Controls: Trade-off between file size and write/read performance. Default: 0-9
auto_complex (feature-flag) — Controls: Whether to automatically detect and convert complex number representations. Default: False by default
diskless mode (runtime-toggle) — Controls: Whether file operations happen in memory with optional persistence to disk. Default: False by default

Technology Stack

netCDF C library (library)
Provides the actual file I/O, data format handling, and compression algorithms that Python code calls through Cython bindings

numpy (library)
Handles all array data representation and mathematical operations, providing the primary data container for scientific arrays

Cython (build)
Compiles Python-like code to C extensions that can directly call netCDF C library functions with minimal overhead

mpi4py (library)
Enables parallel I/O operations by providing Python bindings to MPI for coordinating multiple processes writing to the same file

HDF5 (library)
Serves as the underlying storage layer for netCDF4 format files, providing chunking, compression, and hierarchical organization

setuptools (build)
Manages package building and distribution, extended with custom build backend to handle netCDF C library detection

Key Components

Dataset (gateway) — Main entry point that opens netCDF files and provides hierarchical access to groups, dimensions, variables, and attributes with automatic resource management src/netCDF4
build backend (adapter) — Custom setuptools build backend that detects netCDF C library features at build time and conditionally adds mpi4py dependency for parallel support _build/backend.py
config detector (resolver) — Introspects netCDF C library installation by parsing nc-config output and checking header files to determine available features like parallel I/O _build/utils.py
compression benchmarks (validator) — Performance testing suite that measures read/write speeds across different compression algorithms (zlib, quantization) and compression levels examples/bench_compress*.py
parallel I/O handler (orchestrator) — Demonstrates collective and independent parallel file access using MPI, coordinating multiple processes writing to the same netCDF file examples/mpi_example.py
nc_complex (encoder) — Extends netCDF with complex number support by encoding them either as compound types with real/imaginary fields or as arrays with an extra dimension external/nc_complex/src/nc_complex.c

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Library Repositories

Frequently Asked Questions

What is netcdf4-python used for?

Wraps netCDF C library to read/write scientific array data with numpy integration unidata/netcdf4-python is a 6-component library written in Cython. Data flows through 6 distinct pipeline stages. The codebase contains 83 files.

How is netcdf4-python architected?

netcdf4-python is organized into 4 architecture layers: Build System, Python API, Example Code, External Extensions. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through netcdf4-python?

Data moves through 6 stages: File Open and Introspection → Variable Access Setup → Array Data Reading → Array Data Writing → Complex Number Encoding → .... Data enters through Dataset constructor which opens netCDF files via the C library and creates Python proxy objects for the file's hierarchical structure. Variables act as lazy array proxies - reads trigger netCDF C library calls that return raw data, which gets converted to numpy arrays with proper dtype mapping and missing value handling. Writes flow the opposite direction: numpy arrays get validated, optionally compressed/chunked, and passed to the C library for storage. Complex numbers get special handling through the nc_complex extension, either as compound types or dimensional encoding. This pipeline design reflects a complex multi-stage processing system.

What technologies does netcdf4-python use?

The core stack includes netCDF C library (Provides the actual file I/O, data format handling, and compression algorithms that Python code calls through Cython bindings), numpy (Handles all array data representation and mathematical operations, providing the primary data container for scientific arrays), Cython (Compiles Python-like code to C extensions that can directly call netCDF C library functions with minimal overhead), mpi4py (Enables parallel I/O operations by providing Python bindings to MPI for coordinating multiple processes writing to the same file), HDF5 (Serves as the underlying storage layer for netCDF4 format files, providing chunking, compression, and hierarchical organization), setuptools (Manages package building and distribution, extended with custom build backend to handle netCDF C library detection). A focused set of dependencies that keeps the build manageable.

What system dynamics does netcdf4-python have?

netcdf4-python exhibits 2 data pools (netCDF file, numpy array cache), 2 feedback loops, 4 control points, 3 delays. The feedback loops handle polling and adaptive behavior. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does netcdf4-python use?

4 design patterns detected: Proxy Pattern, Adapter Pattern, Bridge Pattern, Template Method.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.