twin/gatus
Automated developer-oriented status page with alerting and incident support
Monitors service health via HTTP/TCP/DNS checks and alerts on failures
Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data.
Under the hood, the system uses 3 feedback loops, 3 data pools, 5 control points to manage its runtime behavior.
A 8-component dashboard. 235 files analyzed. Data flows through 8 distinct pipeline stages.
How Data Flows Through the System
Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data.
- Load configuration from YAML — Config.Load() parses the main config file, validates endpoint definitions (required fields like name and URL), validates alert provider credentials, and applies defaults like 30-second intervals (config: endpoints)
- Start endpoint monitoring goroutines — Monitor() iterates through all configured endpoints and launches monitorEndpoint() in separate goroutines, each running its own check-sleep loop with the configured interval [EndpointConfig]
- Execute protocol-specific health checks — executeRequest() performs HTTP GET/POST, TCP connection, ICMP ping, or DNS lookup based on endpoint URL scheme, measuring response time and capturing response data [EndpointConfig → Result]
- Evaluate success conditions — evaluateConditions() parses expressions like '[STATUS] == 200' or '[BODY] contains "healthy"' and applies them to the result data, setting Success: true/false on the Result [Result → Result]
- Store health check results — Store.Insert() persists the Result to the configured backend (SQLite by default), maintaining a rolling history for uptime calculations and dashboard charts [Result]
- Evaluate alert conditions — checkAndHandleAlert() compares consecutive failures against FailureThreshold and successes against SuccessThreshold, tracking alert state per endpoint to avoid duplicate notifications [AlertConfig]
- Send alert notifications — AlertProvider.Send() dispatches notifications through configured providers - formatting messages with endpoint details and sending via webhooks, SMTP, or APIs [ProviderConfig]
- Serve web dashboard and API — Handle() serves the Vue.js status page, REST API endpoints for historical data, and WebSocket connections that push real-time status updates to connected browsers
Data Models
The data structures that flow between stages — the contracts that hold the system together.
config/endpoint/endpoint.gostruct with Name: string, URL: string, Method: string, Conditions: []Condition, Interval: time.Duration, Alerts: []Alert, Headers: map[string]string
Loaded from YAML config at startup, validated for required fields and protocol support, then passed to watchdog for continuous monitoring
config/endpoint/result.gostruct with Success: bool, HTTPStatus: int, Duration: time.Duration, Body: []byte, Timestamp: time.Time, Errors: []string, ConditionResults: []ConditionResult
Created after each endpoint check with protocol-specific data, stored in database, and used to determine alert states and dashboard display
alerting/alert/alert.gostruct with Type: Type, Enabled: *bool, FailureThreshold: int, SuccessThreshold: int, Description: *string, SendOnResolved: *bool, MinimumReminderInterval: time.Duration
Defined per endpoint in config, evaluated against result history to determine when to trigger or resolve alerts
alerting/provider/*/provider-specific struct (e.g., Discord: {WebhookURL: string, Title: string}, Email: {From: string, To: string, Host: string, Port: int})
Loaded from YAML with provider-specific fields, validated for required credentials, then used to dispatch alert messages
config/endpoint/condition.gostruct with Expression: string (e.g., '[STATUS] == 200', '[RESPONSE_TIME] < 1000ms', '[BODY] contains "OK"')
Parsed from endpoint config strings, compiled into evaluatable expressions, then applied to each health check result
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
System assumes termination signals (SIGTERM, SIGINT) will be handled gracefully within reasonable time bounds, but has no timeout for the shutdown sequence
If this fails: If watchdog.Shutdown() or controller.Shutdown() hang due to stuck goroutines or blocking I/O, the process becomes unkillable and requires SIGKILL, potentially corrupting the storage layer
main.go:main
Assumes GATUS_LOG_LEVEL environment variable, if present, contains a valid log level string that logr.LevelFromString() can parse
If this fails: Invalid log levels fall back to INFO silently, but malformed values could cause unexpected logging behavior or performance issues if the logging library doesn't handle edge cases properly
main.go:configureLogging
Assumes unlimited goroutine creation when starting controller.Handle(), metrics initialization, and watchdog.Monitor() concurrently without any coordination or resource limits
If this fails: Under high endpoint counts or rapid config reloads, goroutine exhaustion could crash the process before any component can report the resource constraint
main.go:start
Assumes watchdog.Shutdown() will complete before controller.Shutdown() and that metrics.UnregisterPrometheusMetrics() can safely execute while HTTP handlers might still be processing requests
If this fails: Race conditions during shutdown could cause HTTP handlers to access unregistered metrics or attempt to send alerts through a shutdown watchdog, leading to panics or incomplete shutdown
main.go:stop
Assumes GATUS_CONFIG_PATH or deprecated GATUS_CONFIG_FILE environment variables point to readable YAML files with correct filesystem permissions
If this fails: If config file becomes unreadable due to permission changes after startup, config reloads will fail silently or crash the monitoring loop, stopping all health checks
main.go:loadConfiguration
GATUS_DELAY_START_SECONDS environment variable, if present, contains a valid integer and that delaying startup by that duration won't exceed container orchestration timeouts
If this fails: Non-integer values cause strconv.Atoi to return 0 silently, disabling the delay feature. Very large delays could cause Kubernetes/Docker to kill the container before Gatus finishes starting
main.go:main
Assumes store.Get() returns a non-nil storage provider that implements the Save() method, and that Save() is safe to call during shutdown when other goroutines might still be writing
If this fails: If storage provider is nil or Save() isn't thread-safe, shutdown could panic or corrupt the database, losing historical monitoring data
main.go:save
Assumes storage initialization will succeed and that the configured storage backend (SQLite/PostgreSQL) is available with sufficient disk space and proper permissions
If this fails: Storage failures cause panic during startup, but there's no fallback to in-memory storage or graceful degradation - the entire monitoring system becomes unavailable
main.go:initializeStorage
Assumes config file watcher can distinguish between complete config writes and partial/temporary files created during atomic file operations
If this fails: If config is updated via non-atomic operations (direct writes instead of write-then-rename), the watcher might trigger on incomplete YAML, causing parsing errors and stopping all monitoring
main.go:listenToConfigurationFileChanges
Assumes the 20+ imported alert providers can all be initialized simultaneously without hitting system limits on network connections, file descriptors, or memory
If this fails: With many endpoints using different alert providers, concurrent initialization during config reload could exhaust connection pools or hit API rate limits, causing alerts to fail silently
alerting/config.go:imports
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Stores historical health check results with timestamps, enabling uptime calculations and trend analysis
Active endpoint configurations loaded from YAML, used by monitoring goroutines
Tracks current alert status per endpoint to prevent duplicate notifications
Feedback Loops
- Health Check Loop (polling, balancing) — Trigger: Interval timer (default 30s per endpoint). Action: Execute health check, evaluate conditions, store result, check alert thresholds. Exit: Endpoint removed from config or system shutdown.
- Alert Resolution Loop (self-correction, balancing) — Trigger: Endpoint starts passing after failures. Action: Count successive successes against SuccessThreshold. Exit: Alert marked as resolved and notification sent.
- Config Reload Loop (polling, balancing) — Trigger: File system watcher detects YAML changes. Action: Reload config, restart affected endpoint goroutines, preserve existing alert states. Exit: No further config changes.
Delays
- Health Check Intervals (scheduled-job, ~Configurable per endpoint (default 30s)) — Time between detecting an issue and generating an alert depends on check frequency
- Alert Threshold Delays (batch-window, ~FailureThreshold * check interval) — Prevents spurious alerts by requiring consecutive failures before triggering
- Minimum Reminder Interval (rate-limit, ~Configurable (min 5m)) — Prevents alert spam by limiting notification frequency for ongoing incidents
Control Points
- Check Intervals (hyperparameter) — Controls: How frequently each endpoint is monitored. Default: 30s default
- Alert Thresholds (threshold) — Controls: Number of consecutive failures/successes required to trigger/resolve alerts. Default: 3 failures, 2 successes
- Storage Backend (architecture-switch) — Controls: Whether to use SQLite, PostgreSQL, or in-memory storage. Default: SQLite default
- Log Level (env-var) — Controls: Verbosity of application logging. Default: GATUS_LOG_LEVEL
- Provider Credentials (env-var) — Controls: Authentication for alert providers like Slack tokens, SMTP passwords. Default: Various env vars
Technology Stack
HTTP framework serving the web dashboard, REST API, and WebSocket endpoints
Persists health check results and historical data for uptime calculations
Enables real-time status updates to connected web browsers
Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections
Configuration format for defining endpoints, conditions, and alert providers
Exports metrics for external monitoring and integration with Prometheus/Grafana
Frontend framework for the status dashboard with real-time updates
Key Components
- Monitor (orchestrator) — Coordinates all endpoint monitoring by starting goroutines for each endpoint and managing their lifecycle during config reloads
watchdog/watchdog.go - monitorEndpoint (executor) — Runs the monitoring loop for a single endpoint - executes checks at intervals, evaluates conditions, stores results, and triggers alerts
watchdog/watchdog.go - Handle (gateway) — HTTP router that serves the web dashboard, REST API endpoints, WebSocket connections for real-time updates, and handles authentication
controller/controller.go - AlertProvider (adapter) — Protocol-specific implementations for sending alerts (Discord webhooks, SMTP email, Slack API, etc.) with provider-specific formatting and authentication
alerting/provider/*/ - Store (store) — Abstraction layer over storage backends (SQLite, PostgreSQL, in-memory) for persisting endpoint results, metrics, and pagination
storage/store/store.go - Config (loader) — Loads and validates the main YAML configuration file, merging defaults and validating endpoint definitions and alert provider configs
config/config.go - evaluateConditions (validator) — Parses condition expressions and evaluates them against endpoint results using placeholders like [STATUS], [BODY], [RESPONSE_TIME]
config/endpoint/condition.go - executeRequest (executor) — Performs the actual HTTP/TCP/ICMP/DNS requests with timeout handling, DNS resolution, and protocol-specific connection logic
client/client.go
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Dashboard Repositories
Frequently Asked Questions
What is gatus used for?
Monitors service health via HTTP/TCP/DNS checks and alerts on failures twin/gatus is a 8-component dashboard written in Go. Data flows through 8 distinct pipeline stages. The codebase contains 235 files.
How is gatus architected?
gatus is organized into 6 architecture layers: Main Orchestrator, Monitoring Engine, Web Interface, Alerting System, and 2 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through gatus?
Data moves through 8 stages: Load configuration from YAML → Start endpoint monitoring goroutines → Execute protocol-specific health checks → Evaluate success conditions → Store health check results → .... Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data. This pipeline design reflects a complex multi-stage processing system.
What technologies does gatus use?
The core stack includes Fiber (HTTP framework serving the web dashboard, REST API, and WebSocket endpoints), SQLite/PostgreSQL (Persists health check results and historical data for uptime calculations), Gorilla WebSocket (Enables real-time status updates to connected web browsers), Various HTTP clients (Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections), YAML (Configuration format for defining endpoints, conditions, and alert providers), Prometheus (Exports metrics for external monitoring and integration with Prometheus/Grafana), and 1 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does gatus have?
gatus exhibits 3 data pools (Results Database, Endpoint Registry), 3 feedback loops, 5 control points, 3 delays. The feedback loops handle polling and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does gatus use?
5 design patterns detected: Protocol Abstraction, Provider Plugin System, Configuration Override Hierarchy, Goroutine-per-Endpoint, Graceful Degradation.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.