twin/gatus

Automated developer-oriented status page with alerting and incident support

10,712 stars Go 8 components

Monitors service health via HTTP/TCP/DNS checks and alerts on failures

Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data.

Under the hood, the system uses 3 feedback loops, 3 data pools, 5 control points to manage its runtime behavior.

A 8-component dashboard. 235 files analyzed. Data flows through 8 distinct pipeline stages.

How Data Flows Through the System

Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data.

  1. Load configuration from YAML — Config.Load() parses the main config file, validates endpoint definitions (required fields like name and URL), validates alert provider credentials, and applies defaults like 30-second intervals (config: endpoints)
  2. Start endpoint monitoring goroutines — Monitor() iterates through all configured endpoints and launches monitorEndpoint() in separate goroutines, each running its own check-sleep loop with the configured interval [EndpointConfig]
  3. Execute protocol-specific health checks — executeRequest() performs HTTP GET/POST, TCP connection, ICMP ping, or DNS lookup based on endpoint URL scheme, measuring response time and capturing response data [EndpointConfig → Result]
  4. Evaluate success conditions — evaluateConditions() parses expressions like '[STATUS] == 200' or '[BODY] contains "healthy"' and applies them to the result data, setting Success: true/false on the Result [Result → Result]
  5. Store health check results — Store.Insert() persists the Result to the configured backend (SQLite by default), maintaining a rolling history for uptime calculations and dashboard charts [Result]
  6. Evaluate alert conditions — checkAndHandleAlert() compares consecutive failures against FailureThreshold and successes against SuccessThreshold, tracking alert state per endpoint to avoid duplicate notifications [AlertConfig]
  7. Send alert notifications — AlertProvider.Send() dispatches notifications through configured providers - formatting messages with endpoint details and sending via webhooks, SMTP, or APIs [ProviderConfig]
  8. Serve web dashboard and API — Handle() serves the Vue.js status page, REST API endpoints for historical data, and WebSocket connections that push real-time status updates to connected browsers

Data Models

The data structures that flow between stages — the contracts that hold the system together.

EndpointConfig config/endpoint/endpoint.go
struct with Name: string, URL: string, Method: string, Conditions: []Condition, Interval: time.Duration, Alerts: []Alert, Headers: map[string]string
Loaded from YAML config at startup, validated for required fields and protocol support, then passed to watchdog for continuous monitoring
Result config/endpoint/result.go
struct with Success: bool, HTTPStatus: int, Duration: time.Duration, Body: []byte, Timestamp: time.Time, Errors: []string, ConditionResults: []ConditionResult
Created after each endpoint check with protocol-specific data, stored in database, and used to determine alert states and dashboard display
AlertConfig alerting/alert/alert.go
struct with Type: Type, Enabled: *bool, FailureThreshold: int, SuccessThreshold: int, Description: *string, SendOnResolved: *bool, MinimumReminderInterval: time.Duration
Defined per endpoint in config, evaluated against result history to determine when to trigger or resolve alerts
ProviderConfig alerting/provider/*/
provider-specific struct (e.g., Discord: {WebhookURL: string, Title: string}, Email: {From: string, To: string, Host: string, Port: int})
Loaded from YAML with provider-specific fields, validated for required credentials, then used to dispatch alert messages
Condition config/endpoint/condition.go
struct with Expression: string (e.g., '[STATUS] == 200', '[RESPONSE_TIME] < 1000ms', '[BODY] contains "OK"')
Parsed from endpoint config strings, compiled into evaluatable expressions, then applied to each health check result

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Temporal unguarded

System assumes termination signals (SIGTERM, SIGINT) will be handled gracefully within reasonable time bounds, but has no timeout for the shutdown sequence

If this fails: If watchdog.Shutdown() or controller.Shutdown() hang due to stuck goroutines or blocking I/O, the process becomes unkillable and requires SIGKILL, potentially corrupting the storage layer

main.go:main
warning Environment weakly guarded

Assumes GATUS_LOG_LEVEL environment variable, if present, contains a valid log level string that logr.LevelFromString() can parse

If this fails: Invalid log levels fall back to INFO silently, but malformed values could cause unexpected logging behavior or performance issues if the logging library doesn't handle edge cases properly

main.go:configureLogging
critical Resource unguarded

Assumes unlimited goroutine creation when starting controller.Handle(), metrics initialization, and watchdog.Monitor() concurrently without any coordination or resource limits

If this fails: Under high endpoint counts or rapid config reloads, goroutine exhaustion could crash the process before any component can report the resource constraint

main.go:start
warning Ordering unguarded

Assumes watchdog.Shutdown() will complete before controller.Shutdown() and that metrics.UnregisterPrometheusMetrics() can safely execute while HTTP handlers might still be processing requests

If this fails: Race conditions during shutdown could cause HTTP handlers to access unregistered metrics or attempt to send alerts through a shutdown watchdog, leading to panics or incomplete shutdown

main.go:stop
critical Environment weakly guarded

Assumes GATUS_CONFIG_PATH or deprecated GATUS_CONFIG_FILE environment variables point to readable YAML files with correct filesystem permissions

If this fails: If config file becomes unreadable due to permission changes after startup, config reloads will fail silently or crash the monitoring loop, stopping all health checks

main.go:loadConfiguration
warning Temporal weakly guarded

GATUS_DELAY_START_SECONDS environment variable, if present, contains a valid integer and that delaying startup by that duration won't exceed container orchestration timeouts

If this fails: Non-integer values cause strconv.Atoi to return 0 silently, disabling the delay feature. Very large delays could cause Kubernetes/Docker to kill the container before Gatus finishes starting

main.go:main
critical Contract weakly guarded

Assumes store.Get() returns a non-nil storage provider that implements the Save() method, and that Save() is safe to call during shutdown when other goroutines might still be writing

If this fails: If storage provider is nil or Save() isn't thread-safe, shutdown could panic or corrupt the database, losing historical monitoring data

main.go:save
critical Resource weakly guarded

Assumes storage initialization will succeed and that the configured storage backend (SQLite/PostgreSQL) is available with sufficient disk space and proper permissions

If this fails: Storage failures cause panic during startup, but there's no fallback to in-memory storage or graceful degradation - the entire monitoring system becomes unavailable

main.go:initializeStorage
warning Contract unguarded

Assumes config file watcher can distinguish between complete config writes and partial/temporary files created during atomic file operations

If this fails: If config is updated via non-atomic operations (direct writes instead of write-then-rename), the watcher might trigger on incomplete YAML, causing parsing errors and stopping all monitoring

main.go:listenToConfigurationFileChanges
warning Scale unguarded

Assumes the 20+ imported alert providers can all be initialized simultaneously without hitting system limits on network connections, file descriptors, or memory

If this fails: With many endpoints using different alert providers, concurrent initialization during config reload could exhaust connection pools or hit API rate limits, causing alerts to fail silently

alerting/config.go:imports

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Results Database (database)
Stores historical health check results with timestamps, enabling uptime calculations and trend analysis
Endpoint Registry (in-memory)
Active endpoint configurations loaded from YAML, used by monitoring goroutines
Alert State Cache (in-memory)
Tracks current alert status per endpoint to prevent duplicate notifications

Feedback Loops

Delays

Control Points

Technology Stack

Fiber (framework)
HTTP framework serving the web dashboard, REST API, and WebSocket endpoints
SQLite/PostgreSQL (database)
Persists health check results and historical data for uptime calculations
Gorilla WebSocket (library)
Enables real-time status updates to connected web browsers
Various HTTP clients (library)
Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections
YAML (serialization)
Configuration format for defining endpoints, conditions, and alert providers
Prometheus (library)
Exports metrics for external monitoring and integration with Prometheus/Grafana
Vue.js (framework)
Frontend framework for the status dashboard with real-time updates

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Dashboard Repositories

Frequently Asked Questions

What is gatus used for?

Monitors service health via HTTP/TCP/DNS checks and alerts on failures twin/gatus is a 8-component dashboard written in Go. Data flows through 8 distinct pipeline stages. The codebase contains 235 files.

How is gatus architected?

gatus is organized into 6 architecture layers: Main Orchestrator, Monitoring Engine, Web Interface, Alerting System, and 2 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through gatus?

Data moves through 8 stages: Load configuration from YAML → Start endpoint monitoring goroutines → Execute protocol-specific health checks → Evaluate success conditions → Store health check results → .... Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data. This pipeline design reflects a complex multi-stage processing system.

What technologies does gatus use?

The core stack includes Fiber (HTTP framework serving the web dashboard, REST API, and WebSocket endpoints), SQLite/PostgreSQL (Persists health check results and historical data for uptime calculations), Gorilla WebSocket (Enables real-time status updates to connected web browsers), Various HTTP clients (Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections), YAML (Configuration format for defining endpoints, conditions, and alert providers), Prometheus (Exports metrics for external monitoring and integration with Prometheus/Grafana), and 1 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does gatus have?

gatus exhibits 3 data pools (Results Database, Endpoint Registry), 3 feedback loops, 5 control points, 3 delays. The feedback loops handle polling and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does gatus use?

5 design patterns detected: Protocol Abstraction, Provider Plugin System, Configuration Override Hierarchy, Goroutine-per-Endpoint, Graceful Degradation.

Analyzed on April 20, 2026 by CodeSea. Written by .