Hidden Assumptions in netcdf4-python
12 assumptions this code never checks · 5 critical · spanning Environment, Domain, Resource, Ordering, Scale, Contract, Temporal
Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at unidata/netcdf4-python and picked out the few most likely to cause trouble. The full list is just below.
Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".
Build process crashes with subprocess.CalledProcessError or AttributeError on flags.stdout if nc-config is missing, corrupted, or hangs indefinitely
Build system incorrectly detects netCDF4 support when only netCDF3 is available, leading to link-time failures or runtime crashes when netCDF4-specific functions are called
Parallel writes fail with EACCES or deadlock if filesystem doesn't support concurrent access, or segfault if MPI communicator becomes invalid during file operations
Show everything (9 more)
All MPI processes call set_collective(True) in the same order and before any collective write operations - no synchronization barrier enforces this ordering
If this fails: Collective I/O operations deadlock or produce corrupted data if processes don't coordinate their collective mode transitions, especially with processes joining at different times
examples/mpi_example.py:v.set_collective
NC_MAX_VAR_DIMS (currently 1024) static array size is sufficient for all netCDF variable dimensionalities that will ever be encountered
If this fails: Buffer overflow and memory corruption when processing variables with more dimensions than NC_MAX_VAR_DIMS, potentially allowing arbitrary code execution
external/nc_complex/src/nc_complex.c:coord_one
Complex number detection relies on dimension names exactly matching '_pfnc_complex', 'complex', or 'ri' - case sensitive string comparison with no normalization or fuzzy matching
If this fails: Complex arrays with dimensions named 'Complex', 'COMPLEX', or 'real_imag' get treated as regular float arrays, silently losing complex number semantics and producing wrong mathematical results
external/nc_complex/src/nc_complex.c:known_dim_names
All three build functions (wheel, sdist, editable) have identical dependency requirements and netcdf4_has_parallel_support() returns consistent results across multiple calls during the same build
If this fails: Inconsistent builds where some build artifacts include mpi4py dependency while others don't, causing import errors when parallel-enabled wheels are installed in environments without mpi4py
_build/backend.py:get_requires_for_build_*
All files opened with utf-8 encoding are actually UTF-8 encoded - no BOM detection, encoding validation, or fallback for files that might be ASCII, Latin-1, or other encodings
If this fails: UnicodeDecodeError when processing netcdf.h files or setup.cfg files that contain non-UTF-8 characters, breaking the build process on systems with different default encodings
_build/utils.py:OPEN_KWARGS
File close/reopen sequence assumes all MPI processes complete their writes and close the file before any process attempts to reopen it - no explicit barrier synchronization
If this fails: Race condition where some processes try to read while others are still writing, leading to incomplete data reads, file corruption, or 'file locked' errors
examples/mpi_example.py:nc.close() and reopen
NC_COMPLEX_GIT_VERSION macro is always defined at compile time and contains a valid version string - no fallback version or validation of version format
If this fails: Compilation fails with undefined macro error if built outside git repository or with incomplete build configuration, making it impossible to build the extension
external/nc_complex/src/nc_complex.c:pfnc_libvers
File handles opened for reading netcdf.h are automatically closed by Python garbage collection - explicit close() calls are missing
If this fails: Resource leak accumulation during build process if many include directories are tested, potentially exhausting file descriptors on systems with low ulimits
_build/utils.py:open netcdf.h
The netCDF4 module is already successfully imported and all version attributes (__version__, __hdf5libversion__, __netcdf4libversion__) are guaranteed to exist as strings
If this fails: AttributeError if netCDF4 import succeeds but version attributes are missing due to incomplete installation or version skew between Python package and C libraries
checkversion.py:netCDF4.__version__
See the full structural analysis of netcdf4-python: the pipeline, data models, and system behavior that put these assumptions in context.
Full analysis of unidata/netcdf4-python →Frequently Asked Questions
What does netcdf4-python assume that could break in production?
The one most likely to cause trouble: The subprocess.run() call to nc-config or pkg-config will always succeed and produce valid stdout - no timeout, permission checks, or command existence validation If this fails, Build process crashes with subprocess.CalledProcessError or AttributeError on flags.stdout if nc-config is missing, corrupted, or hangs indefinitely
How many hidden assumptions does netcdf4-python have?
CodeSea found 12 assumptions netcdf4-python relies on but never validates, 5 of them critical, spanning Environment, Domain, Resource, Ordering, Scale, Contract, Temporal. Most are routine — the analysis flags the two or three most likely to actually bite.
What is a hidden assumption?
Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.