Hidden Assumptions in gatus

11 assumptions this code never checks · 5 critical · spanning Temporal, Environment, Resource, Ordering, Contract, Scale

Every codebase relies on things it never checks. Most of them are routine. CodeSea looked at twin/gatus and picked out the few most likely to cause trouble. The full list is just below.

Most of what this code assumes is routine. These 3 are the ones most likely to cause trouble here. The rest are minor; they're under "Show everything".

Worth your attention first

If watchdog.Shutdown() or controller.Shutdown() hang due to stuck goroutines or blocking I/O, the process becomes unkillable and requires SIGKILL, potentially corrupting the storage layer

Worth your attention first

Under high endpoint counts or rapid config reloads, goroutine exhaustion could crash the process before any component can report the resource constraint

Worth your attention first

If config file becomes unreadable due to permission changes after startup, config reloads will fail silently or crash the monitoring loop, stopping all health checks

Show everything (8 more)
Environment

Assumes GATUS_LOG_LEVEL environment variable, if present, contains a valid log level string that logr.LevelFromString() can parse

If this fails: Invalid log levels fall back to INFO silently, but malformed values could cause unexpected logging behavior or performance issues if the logging library doesn't handle edge cases properly

main.go:configureLogging
Ordering

Assumes watchdog.Shutdown() will complete before controller.Shutdown() and that metrics.UnregisterPrometheusMetrics() can safely execute while HTTP handlers might still be processing requests

If this fails: Race conditions during shutdown could cause HTTP handlers to access unregistered metrics or attempt to send alerts through a shutdown watchdog, leading to panics or incomplete shutdown

main.go:stop
Temporal

GATUS_DELAY_START_SECONDS environment variable, if present, contains a valid integer and that delaying startup by that duration won't exceed container orchestration timeouts

If this fails: Non-integer values cause strconv.Atoi to return 0 silently, disabling the delay feature. Very large delays could cause Kubernetes/Docker to kill the container before Gatus finishes starting

main.go:main
Contract

Assumes store.Get() returns a non-nil storage provider that implements the Save() method, and that Save() is safe to call during shutdown when other goroutines might still be writing

If this fails: If storage provider is nil or Save() isn't thread-safe, shutdown could panic or corrupt the database, losing historical monitoring data

main.go:save
Resource

Assumes storage initialization will succeed and that the configured storage backend (SQLite/PostgreSQL) is available with sufficient disk space and proper permissions

If this fails: Storage failures cause panic during startup, but there's no fallback to in-memory storage or graceful degradation - the entire monitoring system becomes unavailable

main.go:initializeStorage
Contract

Assumes config file watcher can distinguish between complete config writes and partial/temporary files created during atomic file operations

If this fails: If config is updated via non-atomic operations (direct writes instead of write-then-rename), the watcher might trigger on incomplete YAML, causing parsing errors and stopping all monitoring

main.go:listenToConfigurationFileChanges
Scale

Assumes the 20+ imported alert providers can all be initialized simultaneously without hitting system limits on network connections, file descriptors, or memory

If this fails: With many endpoints using different alert providers, concurrent initialization during config reload could exhaust connection pools or hit API rate limits, causing alerts to fail silently

alerting/config.go:imports
Environment

Assumes the signal channel buffer size of 1 is sufficient and that only one termination signal will be received before the shutdown completes

If this fails: If multiple rapid signals are sent (common in container restart scenarios), subsequent signals are dropped, potentially leaving the impression that the process is unresponsive to termination

main.go:signalChannel

See the full structural analysis of gatus: the pipeline, data models, and system behavior that put these assumptions in context.

Full analysis of twin/gatus →

Frequently Asked Questions

What does gatus assume that could break in production?

The one most likely to cause trouble: System assumes termination signals (SIGTERM, SIGINT) will be handled gracefully within reasonable time bounds, but has no timeout for the shutdown sequence If this fails, If watchdog.Shutdown() or controller.Shutdown() hang due to stuck goroutines or blocking I/O, the process becomes unkillable and requires SIGKILL, potentially corrupting the storage layer

How many hidden assumptions does gatus have?

CodeSea found 11 assumptions gatus relies on but never validates, 5 of them critical, spanning Temporal, Environment, Resource, Ordering, Contract, Scale. Most are routine — the analysis flags the two or three most likely to actually bite.

What is a hidden assumption?

Something the code depends on but never checks: a data shape, an ordering, an environment condition, a scale limit, or a contract with another service. It holds until the world it runs in changes, then fails silently.