twin/gatus

Q: How is gatus architected?

gatus is organized into 6 architecture layers: Main Orchestrator, Monitoring Engine, Web Interface, Alerting System, and 2 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

Q: How does data flow through gatus?

Data moves through 8 stages: Load configuration from YAML → Start endpoint monitoring goroutines → Execute protocol-specific health checks → Evaluate success conditions → Store health check results → .... Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data. This pipeline design reflects a complex multi-stage processing system.

Q: What technologies does gatus use?

The core stack includes Fiber (HTTP framework serving the web dashboard, REST API, and WebSocket endpoints), SQLite/PostgreSQL (Persists health check results and historical data for uptime calculations), Gorilla WebSocket (Enables real-time status updates to connected web browsers), Various HTTP clients (Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections), YAML (Configuration format for defining endpoints, conditions, and alert providers), Prometheus (Exports metrics for external monitoring and integration with Prometheus/Grafana), and 1 more. A focused set of dependencies that keeps the build manageable.

Q: What system dynamics does gatus have?

gatus exhibits 3 data pools (Results Database, Endpoint Registry), 3 feedback loops, 5 control points, 3 delays. The feedback loops handle polling and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

Q: What design patterns does gatus use?

5 design patterns detected: Protocol Abstraction, Provider Plugin System, Configuration Override Hierarchy, Goroutine-per-Endpoint, Graceful Degradation.

Automated developer-oriented status page with alerting and incident support

10,712 stars Go 8 components

Monitors service health via HTTP/TCP/DNS checks and alerts on failures

Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data.

Under the hood, the system uses 3 feedback loops, 3 data pools, 5 control points to manage its runtime behavior.

A 8-component dashboard. 235 files analyzed. Data flows through 8 distinct pipeline stages.

How Data Flows Through the System

Load configuration from YAML — Config.Load() parses the main config file, validates endpoint definitions (required fields like name and URL), validates alert provider credentials, and applies defaults like 30-second intervals (config: endpoints)
Start endpoint monitoring goroutines — Monitor() iterates through all configured endpoints and launches monitorEndpoint() in separate goroutines, each running its own check-sleep loop with the configured interval [EndpointConfig]
Execute protocol-specific health checks — executeRequest() performs HTTP GET/POST, TCP connection, ICMP ping, or DNS lookup based on endpoint URL scheme, measuring response time and capturing response data [EndpointConfig → Result]
Evaluate success conditions — evaluateConditions() parses expressions like '[STATUS] == 200' or '[BODY] contains "healthy"' and applies them to the result data, setting Success: true/false on the Result [Result → Result]
Store health check results — Store.Insert() persists the Result to the configured backend (SQLite by default), maintaining a rolling history for uptime calculations and dashboard charts [Result]
Evaluate alert conditions — checkAndHandleAlert() compares consecutive failures against FailureThreshold and successes against SuccessThreshold, tracking alert state per endpoint to avoid duplicate notifications [AlertConfig]
Send alert notifications — AlertProvider.Send() dispatches notifications through configured providers - formatting messages with endpoint details and sending via webhooks, SMTP, or APIs [ProviderConfig]
Serve web dashboard and API — Handle() serves the Vue.js status page, REST API endpoints for historical data, and WebSocket connections that push real-time status updates to connected browsers

Data Models

The data structures that flow between stages — the contracts that hold the system together.

EndpointConfig config/endpoint/endpoint.go
struct with Name: string, URL: string, Method: string, Conditions: []Condition, Interval: time.Duration, Alerts: []Alert, Headers: map[string]string
Loaded from YAML config at startup, validated for required fields and protocol support, then passed to watchdog for continuous monitoring

Result config/endpoint/result.go
struct with Success: bool, HTTPStatus: int, Duration: time.Duration, Body: []byte, Timestamp: time.Time, Errors: []string, ConditionResults: []ConditionResult
Created after each endpoint check with protocol-specific data, stored in database, and used to determine alert states and dashboard display

AlertConfig alerting/alert/alert.go
struct with Type: Type, Enabled: *bool, FailureThreshold: int, SuccessThreshold: int, Description: *string, SendOnResolved: *bool, MinimumReminderInterval: time.Duration
Defined per endpoint in config, evaluated against result history to determine when to trigger or resolve alerts

ProviderConfig alerting/provider/*/
provider-specific struct (e.g., Discord: {WebhookURL: string, Title: string}, Email: {From: string, To: string, Host: string, Port: int})
Loaded from YAML with provider-specific fields, validated for required credentials, then used to dispatch alert messages

Condition config/endpoint/condition.go
struct with Expression: string (e.g., '[STATUS] == 200', '[RESPONSE_TIME] < 1000ms', '[BODY] contains "OK"')
Parsed from endpoint config strings, compiled into evaluatable expressions, then applied to each health check result

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Temporal unguarded

System assumes termination signals (SIGTERM, SIGINT) will be handled gracefully within reasonable time bounds, but has no timeout for the shutdown sequence

If this fails: If watchdog.Shutdown() or controller.Shutdown() hang due to stuck goroutines or blocking I/O, the process becomes unkillable and requires SIGKILL, potentially corrupting the storage layer

main.go:main

warning Environment weakly guarded

Assumes GATUS_LOG_LEVEL environment variable, if present, contains a valid log level string that logr.LevelFromString() can parse

If this fails: Invalid log levels fall back to INFO silently, but malformed values could cause unexpected logging behavior or performance issues if the logging library doesn't handle edge cases properly

main.go:configureLogging

critical Resource unguarded

Assumes unlimited goroutine creation when starting controller.Handle(), metrics initialization, and watchdog.Monitor() concurrently without any coordination or resource limits

If this fails: Under high endpoint counts or rapid config reloads, goroutine exhaustion could crash the process before any component can report the resource constraint

main.go:start

warning Ordering unguarded

Assumes watchdog.Shutdown() will complete before controller.Shutdown() and that metrics.UnregisterPrometheusMetrics() can safely execute while HTTP handlers might still be processing requests

If this fails: Race conditions during shutdown could cause HTTP handlers to access unregistered metrics or attempt to send alerts through a shutdown watchdog, leading to panics or incomplete shutdown

main.go:stop

critical Environment weakly guarded

Assumes GATUS_CONFIG_PATH or deprecated GATUS_CONFIG_FILE environment variables point to readable YAML files with correct filesystem permissions

If this fails: If config file becomes unreadable due to permission changes after startup, config reloads will fail silently or crash the monitoring loop, stopping all health checks

main.go:loadConfiguration

warning Temporal weakly guarded

GATUS_DELAY_START_SECONDS environment variable, if present, contains a valid integer and that delaying startup by that duration won't exceed container orchestration timeouts

If this fails: Non-integer values cause strconv.Atoi to return 0 silently, disabling the delay feature. Very large delays could cause Kubernetes/Docker to kill the container before Gatus finishes starting

main.go:main

critical Contract weakly guarded

Assumes store.Get() returns a non-nil storage provider that implements the Save() method, and that Save() is safe to call during shutdown when other goroutines might still be writing

If this fails: If storage provider is nil or Save() isn't thread-safe, shutdown could panic or corrupt the database, losing historical monitoring data

main.go:save

critical Resource weakly guarded

Assumes storage initialization will succeed and that the configured storage backend (SQLite/PostgreSQL) is available with sufficient disk space and proper permissions

If this fails: Storage failures cause panic during startup, but there's no fallback to in-memory storage or graceful degradation - the entire monitoring system becomes unavailable

main.go:initializeStorage

warning Contract unguarded

Assumes config file watcher can distinguish between complete config writes and partial/temporary files created during atomic file operations

If this fails: If config is updated via non-atomic operations (direct writes instead of write-then-rename), the watcher might trigger on incomplete YAML, causing parsing errors and stopping all monitoring

main.go:listenToConfigurationFileChanges

warning Scale unguarded

Assumes the 20+ imported alert providers can all be initialized simultaneously without hitting system limits on network connections, file descriptors, or memory

If this fails: With many endpoints using different alert providers, concurrent initialization during config reload could exhaust connection pools or hit API rate limits, causing alerts to fail silently

alerting/config.go:imports

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Results Database (database)
Stores historical health check results with timestamps, enabling uptime calculations and trend analysis

Endpoint Registry (in-memory)
Active endpoint configurations loaded from YAML, used by monitoring goroutines

Alert State Cache (in-memory)
Tracks current alert status per endpoint to prevent duplicate notifications

Feedback Loops

Health Check Loop (polling, balancing) — Trigger: Interval timer (default 30s per endpoint). Action: Execute health check, evaluate conditions, store result, check alert thresholds. Exit: Endpoint removed from config or system shutdown.
Alert Resolution Loop (self-correction, balancing) — Trigger: Endpoint starts passing after failures. Action: Count successive successes against SuccessThreshold. Exit: Alert marked as resolved and notification sent.
Config Reload Loop (polling, balancing) — Trigger: File system watcher detects YAML changes. Action: Reload config, restart affected endpoint goroutines, preserve existing alert states. Exit: No further config changes.

Delays

Health Check Intervals (scheduled-job, ~Configurable per endpoint (default 30s)) — Time between detecting an issue and generating an alert depends on check frequency
Alert Threshold Delays (batch-window, ~FailureThreshold * check interval) — Prevents spurious alerts by requiring consecutive failures before triggering
Minimum Reminder Interval (rate-limit, ~Configurable (min 5m)) — Prevents alert spam by limiting notification frequency for ongoing incidents

Control Points

Check Intervals (hyperparameter) — Controls: How frequently each endpoint is monitored. Default: 30s default
Alert Thresholds (threshold) — Controls: Number of consecutive failures/successes required to trigger/resolve alerts. Default: 3 failures, 2 successes
Storage Backend (architecture-switch) — Controls: Whether to use SQLite, PostgreSQL, or in-memory storage. Default: SQLite default
Log Level (env-var) — Controls: Verbosity of application logging. Default: GATUS_LOG_LEVEL
Provider Credentials (env-var) — Controls: Authentication for alert providers like Slack tokens, SMTP passwords. Default: Various env vars

Technology Stack

Fiber (framework)
HTTP framework serving the web dashboard, REST API, and WebSocket endpoints

SQLite/PostgreSQL (database)
Persists health check results and historical data for uptime calculations

Gorilla WebSocket (library)
Enables real-time status updates to connected web browsers

Various HTTP clients (library)
Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections

YAML (serialization)
Configuration format for defining endpoints, conditions, and alert providers

Prometheus (library)
Exports metrics for external monitoring and integration with Prometheus/Grafana

Vue.js (framework)
Frontend framework for the status dashboard with real-time updates

Key Components

Monitor (orchestrator) — Coordinates all endpoint monitoring by starting goroutines for each endpoint and managing their lifecycle during config reloads watchdog/watchdog.go
monitorEndpoint (executor) — Runs the monitoring loop for a single endpoint - executes checks at intervals, evaluates conditions, stores results, and triggers alerts watchdog/watchdog.go
Handle (gateway) — HTTP router that serves the web dashboard, REST API endpoints, WebSocket connections for real-time updates, and handles authentication controller/controller.go
AlertProvider (adapter) — Protocol-specific implementations for sending alerts (Discord webhooks, SMTP email, Slack API, etc.) with provider-specific formatting and authentication alerting/provider/*/
Store (store) — Abstraction layer over storage backends (SQLite, PostgreSQL, in-memory) for persisting endpoint results, metrics, and pagination storage/store/store.go
Config (loader) — Loads and validates the main YAML configuration file, merging defaults and validating endpoint definitions and alert provider configs config/config.go
evaluateConditions (validator) — Parses condition expressions and evaluates them against endpoint results using placeholders like [STATUS], [BODY], [RESPONSE_TIME] config/endpoint/condition.go
executeRequest (executor) — Performs the actual HTTP/TCP/ICMP/DNS requests with timeout handling, DNS resolution, and protocol-specific connection logic client/client.go

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Dashboard Repositories

Frequently Asked Questions

What is gatus used for?

Monitors service health via HTTP/TCP/DNS checks and alerts on failures twin/gatus is a 8-component dashboard written in Go. Data flows through 8 distinct pipeline stages. The codebase contains 235 files.

How is gatus architected?

gatus is organized into 6 architecture layers: Main Orchestrator, Monitoring Engine, Web Interface, Alerting System, and 2 more. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through gatus?

Data moves through 8 stages: Load configuration from YAML → Start endpoint monitoring goroutines → Execute protocol-specific health checks → Evaluate success conditions → Store health check results → .... Configuration is loaded from YAML defining endpoints and alert providers. The watchdog starts monitoring goroutines for each endpoint, which execute protocol-specific health checks at intervals, evaluate success conditions, store results in the database, and trigger alerts when thresholds are exceeded. The controller serves a web dashboard displaying current status and handles API requests for historical data. This pipeline design reflects a complex multi-stage processing system.

What technologies does gatus use?

The core stack includes Fiber (HTTP framework serving the web dashboard, REST API, and WebSocket endpoints), SQLite/PostgreSQL (Persists health check results and historical data for uptime calculations), Gorilla WebSocket (Enables real-time status updates to connected web browsers), Various HTTP clients (Protocol-specific clients for HTTP, ICMP ping, DNS queries, and TCP connections), YAML (Configuration format for defining endpoints, conditions, and alert providers), Prometheus (Exports metrics for external monitoring and integration with Prometheus/Grafana), and 1 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does gatus have?