openstatushq/openstatus

🫖 Status page with uptime monitoring & API monitoring as code 🫖

8,580 stars TypeScript 9 components

Runs uptime monitoring checks, status pages, and incident notifications across global regions

The monitoring pipeline starts when distributed checker applications probe monitored endpoints every few minutes from multiple global regions, measuring response times and status codes. These results flow into the central workflows service which detects status changes and creates incident records when services go down. Status changes trigger notification workflows that send alerts via email, Slack, Discord, and other channels. Public status pages query the incident database to display current service health, while the admin dashboard allows users to configure monitors and view analytics. Screenshots are captured during incidents for documentation purposes.

Under the hood, the system uses 4 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.

A 9-component fullstack. 1483 files analyzed. Data flows through 8 distinct pipeline stages.

How Data Flows Through the System

The monitoring pipeline starts when distributed checker applications probe monitored endpoints every few minutes from multiple global regions, measuring response times and status codes. These results flow into the central workflows service which detects status changes and creates incident records when services go down. Status changes trigger notification workflows that send alerts via email, Slack, Discord, and other channels. Public status pages query the incident database to display current service health, while the admin dashboard allows users to configure monitors and view analytics. Screenshots are captured during incidents for documentation purposes.

  1. Schedule monitoring checks — The cron service in workflows app sends HTTP requests to checker applications across regions every 30s/1m/5m/10m/30m based on monitor configuration, triggering distributed health checks [Monitor configurations → Check triggers] (config: monitors.periodicity, regions.enabled)
  2. Execute endpoint probes — JobRunner in checker apps makes HTTP requests to monitored URLs, measuring response time and status code, handling timeouts and connection errors with configurable retry logic [Check triggers → MonitorResult] (config: monitors.timeout, monitors.retry_count)
  3. Aggregate regional results — The checkerRoute handler in workflows service receives MonitorResult payloads from all regions and applies consensus logic to determine overall service status [MonitorResult → Status aggregation] (config: monitors.regions, monitors.degraded_after_failures)
  4. Detect status transitions — upsertMonitorStatus compares new status against previous status in database, identifying transitions from operational→degraded→down or recovery patterns [Status aggregation → Status change events] (config: monitors.status_threshold)
  5. Create and manage incidents — When failures are detected, findOpenIncident checks for existing incidents and either creates new IncidentRecord or updates resolution timestamp for recovery [Status change events → IncidentRecord]
  6. Trigger notifications — triggerNotifications sends alerts to configured channels (Slack, Discord, email, webhooks) when incidents are created or resolved, with customizable message templates [IncidentRecord → Notification messages] (config: notifications.channels, notifications.templates)
  7. Display public status — Status page apps query incident and monitor tables to render current service health, historical uptime percentages, and incident timeline for public viewing [IncidentRecord → Status page HTML] (config: status_page.theme, status_page.custom_domain)
  8. Capture incident screenshots — Screenshot service receives QStash webhooks when incidents occur, uses Playwright to capture full-page screenshots, and stores them in R2 bucket with incident ID [ScreenshotRequest → Screenshot URLs] (config: screenshot.enabled, storage.r2_bucket)

Data Models

The data structures that flow between stages — the contracts that hold the system together.

MonitorResult apps/workflows/src/checker/index.ts
object with monitorId: string, statusCode: number, region: monitorRegions enum, cronTimestamp: number, status: monitorStatusSchema, latency: number, message: string
Created by checker agents during endpoint probes, processed by workflows service to detect incidents, stored in database for analytics
IncidentRecord apps/workflows/src/checker/index.ts
database record with id, monitorId, resolvedAt timestamp, autoResolved boolean flag, and incident metadata
Created when monitor failures are detected, displayed on status pages during outages, resolved when monitors recover
AuthSession apps/dashboard/src/lib/auth/index.ts
NextAuth session object with user id, email, OAuth provider details, and profile information
Generated during OAuth flow with GitHub/Google, persisted in auth adapter, validated on each authenticated request
StatusPageConfig apps/status-page/src/lib/resolve-route.ts
ResolvedRoute object with type: 'hostname'|'pathname', prefix: string, locale: Locale, localeExplicit: boolean, rewritePath: string
Resolved from incoming HTTP requests based on hostname or path, used to determine which status page to display and in what language
ScreenshotRequest apps/screenshot-service/src/index.ts
Zod schema with url: URL, incidentId: number, kind: 'incident'|'recovery' enum
Triggered via QStash when incidents occur, processed by Playwright to capture page screenshots, stored in R2 bucket for incident documentation

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Environment unguarded

OPENSTATUS_KEY environment variable contains a valid API key that never expires and has sufficient permissions

If this fails: If the API key is invalid, expired, or lacks permissions, all monitor checks fail silently without alerting operators - the checker appears to run but produces no results

apps/checker/cmd/private/main.go:main
warning Temporal unguarded

Monitor configuration updates can wait up to 10 minutes to be picked up by checker agents

If this fails: Critical monitors added during outages won't be checked for up to 10 minutes, and disabled monitors continue running unnecessary checks, wasting resources and potentially triggering false alerts

apps/checker/cmd/private/main.go:configRefreshInterval
critical Resource unguarded

Container has sufficient memory to launch Chromium browser instances without being killed by OOM

If this fails: Screenshot capture fails silently during high incident volume when multiple Chromium instances exhaust container memory, leaving incidents without visual evidence

apps/screenshot-service/src/index.ts:playwright.chromium.launch
critical Scale weakly guarded

Railway region headers contain exactly one of four hardcoded region values

If this fails: When Railway adds new regions or changes region identifiers, requests route to undefined targetUrl causing panic, taking down the entire proxy service

apps/railway-proxy/main.go:proxy
critical Contract weakly guarded

MonitorResult payloads from regional checkers always include cronTimestamp as Unix milliseconds in the same timezone

If this fails: If checkers send timestamps in different formats or timezones, incident timing becomes corrupted, causing false recovery notifications and incorrect SLA calculations

apps/workflows/src/checker/index.ts:payloadSchema
warning Ordering unguarded

Only one incident can be open per monitor at any given time

If this fails: If multiple incident creation requests race during rapid status changes, duplicate incidents are created but only one gets resolved, leaving phantom open incidents that block future incident creation

apps/workflows/src/checker/index.ts:findOpenIncident
warning Domain weakly guarded

Custom domain hostnames always have exactly 3+ segments separated by dots, with the subdomain as the first segment

If this fails: Status pages hosted on unusual domains (e.g., single-level domains, IPv6 addresses, or domains with multiple subdomain levels) are misrouted, showing wrong status pages or 404 errors to customers

apps/status-page/src/lib/resolve-route.ts:resolveRoute
info Environment weakly guarded

OAuth profile objects from Google and GitHub providers always contain expected fields (given_name, family_name, picture, avatar_url)

If this fails: Authentication succeeds but user profile updates fail silently when OAuth providers change their response schema, leaving users with incomplete profiles and broken avatars

apps/dashboard/src/lib/auth/index.ts:signIn
warning Temporal unguarded

Screenshot filenames using Date.now() are globally unique across all incident captures

If this fails: Simultaneous incident screenshots for the same incident ID overwrite each other in R2 storage, leaving only the last screenshot and losing evidence of the incident progression

apps/screenshot-service/src/index.ts:Date.now
warning Resource unguarded

AXIOM_TOKEN environment variable provides unlimited log ingestion quota

If this fails: When Axiom quota is exceeded, all application logging silently stops without fallback, making debugging production issues impossible during high-traffic periods

apps/server/src/index.ts:configure

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Monitor Status Database (database)
LibSQL database storing monitor configurations, status history, incident records, and user accounts with real-time status updates
Task Scheduler (in-memory)
In-memory task queue managed by github.com/madflojo/tasks that schedules periodic monitor checks with configurable intervals
QStash Queue (queue)
Upstash QStash message queue that buffers screenshot requests during incidents with webhook delivery and retry logic
R2 Screenshot Storage (file-store)
Cloudflare R2 bucket storing incident screenshots with public URLs for viewing in incident reports

Feedback Loops

Delays

Control Points

Technology Stack

LibSQL/Turso (database)
Primary database storing monitor configurations, incidents, user data with edge replication
Hono (framework)
HTTP framework for API servers and webhook handlers with TypeScript support
Next.js (framework)
React framework powering dashboard, status pages, and marketing site with SSR/SSG
Go (runtime)
High-performance language for checker agents and proxy services requiring low latency
NextAuth.js (library)
Authentication handling for OAuth providers (GitHub, Google) and session management
Playwright (library)
Browser automation for capturing incident screenshots in screenshot-service
QStash (infra)
Message queue for asynchronous screenshot processing with webhook delivery
Cloudflare R2 (infra)
Object storage for incident screenshots with CDN delivery
OpenTelemetry (infra)
Observability stack providing structured logging, metrics, and distributed tracing

Key Components

Package Structure

checker (app)
Go-based monitoring agent that performs HTTP checks and reports results to the central platform
dashboard (app)
Next.js admin interface for configuring monitors, viewing analytics, and managing incidents
docs (app)
Astro-powered documentation site with API reference and user guides
private-location (app)
Self-hosted Go server that runs monitoring checks from private networks
railway-proxy (app)
Go reverse proxy that routes checker requests based on Railway region headers
screenshot-service (app)
Hono service that captures web page screenshots during incidents using Playwright
server (app)
Main Hono API server handling checker results, user management, and external integrations
ssh-server (app)
Go SSH server that displays service status over SSH connections
status-page (app)
Public Next.js status pages that customers can view to see service health
web (app)
Marketing website and blog built with Next.js and MDX content
workflows (app)
Hono service managing cron jobs for monitoring tasks and notification workflows

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Fullstack Repositories

Frequently Asked Questions

What is openstatus used for?

Runs uptime monitoring checks, status pages, and incident notifications across global regions openstatushq/openstatus is a 9-component fullstack written in TypeScript. Data flows through 8 distinct pipeline stages. The codebase contains 1483 files.

How is openstatus architected?

openstatus is organized into 4 architecture layers: Monitoring Layer, API Gateway, User Interfaces, Support Services. Data flows through 8 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through openstatus?

Data moves through 8 stages: Schedule monitoring checks → Execute endpoint probes → Aggregate regional results → Detect status transitions → Create and manage incidents → .... The monitoring pipeline starts when distributed checker applications probe monitored endpoints every few minutes from multiple global regions, measuring response times and status codes. These results flow into the central workflows service which detects status changes and creates incident records when services go down. Status changes trigger notification workflows that send alerts via email, Slack, Discord, and other channels. Public status pages query the incident database to display current service health, while the admin dashboard allows users to configure monitors and view analytics. Screenshots are captured during incidents for documentation purposes. This pipeline design reflects a complex multi-stage processing system.

What technologies does openstatus use?

The core stack includes LibSQL/Turso (Primary database storing monitor configurations, incidents, user data with edge replication), Hono (HTTP framework for API servers and webhook handlers with TypeScript support), Next.js (React framework powering dashboard, status pages, and marketing site with SSR/SSG), Go (High-performance language for checker agents and proxy services requiring low latency), NextAuth.js (Authentication handling for OAuth providers (GitHub, Google) and session management), Playwright (Browser automation for capturing incident screenshots in screenshot-service), and 3 more. This broad technology surface reflects a mature project with many integration points.

What system dynamics does openstatus have?

openstatus exhibits 4 data pools (Monitor Status Database, Task Scheduler), 4 feedback loops, 5 control points, 4 delays. The feedback loops handle polling and self-correction. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does openstatus use?

5 design patterns detected: Multi-Region Consensus, Event-Driven Incident Management, Domain-Based Routing, Graceful Degradation, Observability First.

Analyzed on April 20, 2026 by CodeSea. Written by .