Hidden Assumptions in scipy

12 assumptions this code never checks · 5 critical · spanning Scale, Environment, Contract, Ordering, Domain, Resource, Temporal

Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at scipy/scipy and picked out the few most likely to cause trouble. The full list is just below.

Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".

Worth your attention first

If HAVE_BLAS_ILP64 is incorrectly set during build (e.g., linking against 32-bit BLAS with ILP64 flag set), function calls will use wrong symbol names causing undefined symbol errors or silent data corruption when array indices exceed 32-bit limits

Worth your attention first

If MKL changes its symbol naming convention or if the fix flag is applied to non-MKL libraries, linker errors will occur or wrong functions will be called silently

Worth your attention first

Direct calls to 'w'-prefixed wrappers from Cython/F2PY code will cause segfaults on x86 machines because return value ABI is incompatible - the calling code receives garbage data instead of proper complex values

Show everything (9 more)
Ordering

This header must be included after fortran_defs.h because it redefines the F_FUNC macro, but there's no enforcement of this include order dependency

If this fails: If included in wrong order, F_FUNC macro will have wrong definition leading to incorrect symbol name mangling and linker errors for all BLAS/LAPACK function calls

scipy/_build_utils/src/_blas64_defines.h:include order
Environment

The NumPy C API import_array() function has been called in the current scope before including this header, as stated in the comment that 'version dependency will misbehave' otherwise

If this fails: Runtime version detection will fail silently, potentially causing crashes when NumPy 2.0 features are used with NumPy 1.x or vice versa

scipy/_build_utils/src/npy_2_compat.h:import_array requirement
Domain

Version strings follow PEP 440 format with optional epoch, release numbers, pre/post/dev suffixes, and local version identifiers, but the VERSION_PATTERN regex is the only validation

If this fails: Malformed version strings that don't match expected patterns will raise InvalidVersion exceptions, but edge cases in the regex could allow through invalid versions that cause incorrect version comparisons

scipy/_external/packaging_version/src/version.py:version string format
Resource

The operating system cache directory selected by pooch.os_cache() is writable and has sufficient disk space for dataset files, with no validation of disk space or write permissions

If this fails: Dataset downloads will fail with unclear error messages if cache directory is read-only, full, or on a filesystem that doesn't support the file sizes being downloaded

scipy/datasets/_fetchers.py:pooch.os_cache
Temporal

The scipy module version string in sys.modules['scipy'].__version__ is available and properly formatted for HTTP User-Agent header construction

If this fails: If scipy version is malformed or missing, HTTP requests could fail due to invalid User-Agent header, causing dataset downloads to be rejected by servers that validate headers

scipy/datasets/_fetchers.py:HTTPDownloader with User-Agent
Contract

Python callback functions provided by users match the expected C function signature defined in ccallback_signature_t, but signature validation only happens at callback registration time

If this fails: If user provides callback with wrong argument types or count, the callback will crash during execution within C/Fortran code, potentially corrupting the Python interpreter state

scipy/_lib/src/ccallback.h:callback signature matching
Environment

When ACCELERATE_NEW_LAPACK is defined, the build environment has macOS SDK 13.3 or later (__MAC_OS_X_VERSION_MAX_ALLOWED >= 130300), but this is only checked at compile time

If this fails: Building with older macOS SDK while ACCELERATE_NEW_LAPACK is enabled will cause compilation errors, and the error only appears during build rather than being caught by configure/setup

scipy/_build_utils/src/scipy_blas_defines.h:macOS SDK version
Scale

Version comparison operations will never encounter arithmetic overflow when comparing against Infinity/NegativeInfinity objects, and that Python's object comparison protocol handles mixed types correctly

If this fails: If version components become extremely large numbers that cause comparison overflow, or if unexpected object types are compared against Infinity, version ordering could become inconsistent leading to wrong dependency resolution

scipy/_external/packaging_version/src/_structures.py:InfinityType comparison
Domain

The registry and registry_urls imported from _registry follow expected format with dataset names as keys and properly formatted URLs and hashes as values

If this fails: Malformed registry entries could cause pooch to attempt downloads with invalid URLs or fail hash verification with cryptic error messages that don't indicate the source of the problem

scipy/datasets/_fetchers.py:registry format

See the full structural analysis of scipy: the pipeline, data models, and system behavior that put these assumptions in context.

Full analysis of scipy/scipy →

Compare scipy

Frequently Asked Questions

What does scipy assume that could break in production?

The one most likely to cause trouble: The HAVE_BLAS_ILP64 preprocessor macro is defined during compilation when ILP64 (64-bit integer) BLAS libraries are used, but there's no runtime validation of this assumption If this fails, If HAVE_BLAS_ILP64 is incorrectly set during build (e.g., linking against 32-bit BLAS with ILP64 flag set), function calls will use wrong symbol names causing undefined symbol errors or silent data corruption when array indices exceed 32-bit limits

How many hidden assumptions does scipy have?

CodeSea found 12 assumptions scipy relies on but never validates, 5 of them critical, spanning Scale, Environment, Contract, Ordering, Domain, Resource, Temporal. Most are routine — the analysis flags the two or three most likely to actually bite.

What is a hidden assumption?

Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.