Hidden Assumptions in netcdf4-python

12 assumptions this code never checks · 5 critical · spanning Environment, Domain, Resource, Ordering, Scale, Contract, Temporal

Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at unidata/netcdf4-python and picked out the few most likely to cause trouble. The full list is just below.

Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".

Worth your attention first

Build process crashes with subprocess.CalledProcessError or AttributeError on flags.stdout if nc-config is missing, corrupted, or hangs indefinitely

Worth your attention first

Build system incorrectly detects netCDF4 support when only netCDF3 is available, leading to link-time failures or runtime crashes when netCDF4-specific functions are called

Worth your attention first

Parallel writes fail with EACCES or deadlock if filesystem doesn't support concurrent access, or segfault if MPI communicator becomes invalid during file operations

Show everything (9 more)

Ordering

All MPI processes call set_collective(True) in the same order and before any collective write operations - no synchronization barrier enforces this ordering

If this fails: Collective I/O operations deadlock or produce corrupted data if processes don't coordinate their collective mode transitions, especially with processes joining at different times

examples/mpi_example.py:v.set_collective

Scale

NC_MAX_VAR_DIMS (currently 1024) static array size is sufficient for all netCDF variable dimensionalities that will ever be encountered

If this fails: Buffer overflow and memory corruption when processing variables with more dimensions than NC_MAX_VAR_DIMS, potentially allowing arbitrary code execution

external/nc_complex/src/nc_complex.c:coord_one

Domain

Complex number detection relies on dimension names exactly matching '_pfnc_complex', 'complex', or 'ri' - case sensitive string comparison with no normalization or fuzzy matching

If this fails: Complex arrays with dimensions named 'Complex', 'COMPLEX', or 'real_imag' get treated as regular float arrays, silently losing complex number semantics and producing wrong mathematical results

external/nc_complex/src/nc_complex.c:known_dim_names

Contract

All three build functions (wheel, sdist, editable) have identical dependency requirements and netcdf4_has_parallel_support() returns consistent results across multiple calls during the same build

If this fails: Inconsistent builds where some build artifacts include mpi4py dependency while others don't, causing import errors when parallel-enabled wheels are installed in environments without mpi4py

_build/backend.py:get_requires_for_build_*

Environment

All files opened with utf-8 encoding are actually UTF-8 encoded - no BOM detection, encoding validation, or fallback for files that might be ASCII, Latin-1, or other encodings

If this fails: UnicodeDecodeError when processing netcdf.h files or setup.cfg files that contain non-UTF-8 characters, breaking the build process on systems with different default encodings

_build/utils.py:OPEN_KWARGS

Temporal

File close/reopen sequence assumes all MPI processes complete their writes and close the file before any process attempts to reopen it - no explicit barrier synchronization

If this fails: Race condition where some processes try to read while others are still writing, leading to incomplete data reads, file corruption, or 'file locked' errors

examples/mpi_example.py:nc.close() and reopen

Domain

NC_COMPLEX_GIT_VERSION macro is always defined at compile time and contains a valid version string - no fallback version or validation of version format

If this fails: Compilation fails with undefined macro error if built outside git repository or with incomplete build configuration, making it impossible to build the extension

external/nc_complex/src/nc_complex.c:pfnc_libvers

Resource

File handles opened for reading netcdf.h are automatically closed by Python garbage collection - explicit close() calls are missing

If this fails: Resource leak accumulation during build process if many include directories are tested, potentially exhausting file descriptors on systems with low ulimits

_build/utils.py:open netcdf.h

Contract

The netCDF4 module is already successfully imported and all version attributes (__version__, __hdf5libversion__, __netcdf4libversion__) are guaranteed to exist as strings

If this fails: AttributeError if netCDF4 import succeeds but version attributes are missing due to incomplete installation or version skew between Python package and C libraries

checkversion.py:netCDF4.__version__

See the full structural analysis of netcdf4-python: the pipeline, data models, and system behavior that put these assumptions in context.

Full analysis of unidata/netcdf4-python →

Frequently Asked Questions

What does netcdf4-python assume that could break in production?

The one most likely to cause trouble: The subprocess.run() call to nc-config or pkg-config will always succeed and produce valid stdout - no timeout, permission checks, or command existence validation If this fails, Build process crashes with subprocess.CalledProcessError or AttributeError on flags.stdout if nc-config is missing, corrupted, or hangs indefinitely

How many hidden assumptions does netcdf4-python have?

CodeSea found 12 assumptions netcdf4-python relies on but never validates, 5 of them critical, spanning Environment, Domain, Resource, Ordering, Scale, Contract, Temporal. Most are routine — the analysis flags the two or three most likely to actually bite.

What is a hidden assumption?

Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.