Hidden Assumptions in determined
12 assumptions this code never checks · 4 critical · spanning Environment, Contract, Temporal, Resource, Ordering, Domain, Scale
Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at determined-ai/determined and picked out the few most likely to cause trouble. The full list is just below.
Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".
If user code has import dependencies not available in container PYTHONPATH or circular imports, the trial fails with confusing ImportError during initialization
If master sends malformed config (missing required keys or wrong types), det.ExperimentConfig() construction silently succeeds but trials crash when accessing expected config values like experiment_config['resources']['slots_per_trial']
If master provides mismatched array lengths or out-of-order mappings, workers in distributed training bind to wrong GPU devices, causing CUDA errors or suboptimal performance
Show everything (9 more)
Authentication credentials remain valid for the entire duration of a test session and master certificate hasn't changed
If this fails: Long-running e2e tests fail mid-execution with authentication errors if master rotates certificates or credentials expire, causing flaky test results
e2e_tests/tests/api_utils.py:make_session
The 'det agent list --json' command completes within default subprocess timeout and produces valid JSON output
If this fails: If cluster has hundreds of agents or network latency is high, the command times out and test setup fails without indicating whether it's a performance or connectivity issue
e2e_tests/tests/cluster/managed_cluster.py:get_agent_data
When stdin is not a character device, it contains valid YAML data that can be unmarshaled into map[string]interface{}
If this fails: If piped input contains malformed YAML or binary data, the template processing fails with log.Fatal() terminating the entire process instead of graceful error handling
master/cmd/determined-gotmpl/main.go:stdinData
The steps_completed value from TrialInfo represents the exact number of training steps completed and checkpoints are saved synchronously
If this fails: If trial crashes between completing a training step and saving checkpoint, resume from latest_checkpoint will repeat the last step, potentially causing incorrect learning rate scheduling or data shuffling
harness/determined/_env_context.py:steps_completed initialization
Go source files being parsed fit entirely in memory and don't exceed parser's internal limits
If this fails: For very large generated code files (hundreds of thousands of lines), the AST parser runs out of memory or fails, breaking the build process
master/cmd/stream-gen/main.go:parser.ParseFiles
Command line arguments os.Args always contains at least one element (the program name) before manipulation
If this fails: If agent is invoked through unusual process spawning that provides empty os.Args, the index access os.Args[0] panics with index out of range
agent/cmd/determined-agent/main.go:maybeInjectRootAlias
The session singleton can be started successfully and external dependencies for performance testing are available
If this fails: If performance testing infrastructure (databases, monitoring tools) is unavailable, TestProgramWithConfig.pre() fails but error handling uses sys.exit() without cleanup, leaving partial test state
performance/daist/daist/framework/main.py:session.start()
GPU device IDs in container_gpus list are valid CUDA device identifiers that exist and are accessible within the container
If this fails: If master assigns non-existent GPU IDs or container lacks proper device permissions, CUDA initialization fails with cryptic device errors rather than clear resource allocation messages
harness/determined/_env_context.py:container_gpus List[str]
API sessions cached by lru_cache remain valid for the entire test process lifetime and don't need refreshing
If this fails: If master restarts or network connectivity is lost during test execution, cached sessions become stale but continue to be returned by lru_cache, causing subsequent API calls to fail with authentication errors
e2e_tests/tests/api_utils.py:@functools.lru_cache decorators
See the full structural analysis of determined: the pipeline, data models, and system behavior that put these assumptions in context.
Full analysis of determined-ai/determined →Frequently Asked Questions
What does determined assume that could break in production?
The one most likely to cause trouble: User training code is importable from filesystem paths without validating module structure or Python path setup If this fails, If user code has import dependencies not available in container PYTHONPATH or circular imports, the trial fails with confusing ImportError during initialization
How many hidden assumptions does determined have?
CodeSea found 12 assumptions determined relies on but never validates, 4 of them critical, spanning Environment, Contract, Temporal, Resource, Ordering, Domain, Scale. Most are routine — the analysis flags the two or three most likely to actually bite.
What is a hidden assumption?
Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.