signoz/signoz
SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
12 hidden assumptions · 5-stage pipeline · 8 components
This fullstack relies on 12 assumptions it never validates, 3 of them critical. They hold until the system changes, then fail silently.
Collects, processes and visualizes telemetry data from applications for observability monitoring
Telemetry data flows from instrumented applications through OpenTelemetry collectors into SigNoz's ingestion endpoints, gets stored in ClickHouse/SQL databases, and is queried by the frontend via the query service to render dashboards, traces, and logs. Alert rules continuously evaluate this data to trigger notifications.
Under the hood, the system uses 4 feedback loops, 4 data pools, 5 control points to manage its runtime behavior.
A 8-component fullstack. 4513 files analyzed. Data flows through 5 distinct pipeline stages.
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
AWS CloudFormation template URLs follow a specific S3 path format with embedded agent version, but never validates that the template exists at that URL or that the agent version is compatible with the template
If this fails: Users get broken CloudFormation links that fail silently during AWS stack deployment, wasting time on infrastructure setup before discovering the template is missing or incompatible
frontend/src/api/integration/aws/index.ts:generateConnectionUrl
Session rotation endpoint '/sessions/rotate' will always be available and handle 401s gracefully, but if this endpoint is down or misconfigured, the auth recovery mechanism fails completely
If this fails: Users get stuck in infinite 401 loops unable to refresh expired tokens, requiring manual logout/login even when backend auth is working normally
frontend/src/api/index.ts:interceptorRejected
Translation files exist at '/locales/{language}/{namespace}.json' paths with valid hash cache-busting parameters from 'i18n-translations-hash.json', but never checks if the hash file exists or contains expected keys
If this fails: Missing or corrupted hash file causes translation loading to fail silently, falling back to untranslated keys without user-visible errors
frontend/src/ReactI18/index.tsx:init
CloudFormation template parameters (SigNozApiUrl, IngestionUrl, etc.) are URL-safe strings that don't need escaping, but malformed URLs in credentials could break CloudFormation deployment
If this fails: CloudFormation stack creation fails with cryptic parameter validation errors if credential URLs contain special characters or are malformed
ee/modules/cloudintegration/implcloudintegration/implcloudprovider/awscloudprovider.go:GetConnectionArtifact
5-second response threshold is appropriate for all API calls regardless of query complexity, data volume, or network conditions, with no configuration for different endpoint types
If this fails: Complex telemetry queries on large datasets trigger false-positive slow API warnings, flooding logs with noise while hiding actually problematic performance issues
frontend/src/api/index.ts:RESPONSE_TIMEOUT_THRESHOLD
Analytics initialization waits for 'isFetchingActiveLicense' to become false but assumes license data loading always completes successfully or fails definitively, never handling hung requests
If this fails: If license API hangs indefinitely, analytics never initialize, breaking user tracking and feature flag evaluation without visible error indication
frontend/src/AppRoutes/index.tsx:enableAnalytics
Azure cloud provider methods are placeholder implementations that will panic when called, but the module registration system doesn't validate implementations before registering providers
If this fails: Selecting Azure as a cloud provider causes immediate application crashes with 'implement me' panics in production, breaking the entire cloud integration feature
ee/modules/cloudintegration/implcloudintegration/implcloudprovider/azurecloudprovider.go
Dashboard module dependencies (store, analytics, orgGetter, queryParser) are fully initialized before module creation, but initialization order isn't enforced
If this fails: If modules are initialized in wrong order, dashboard operations fail with nil pointer dereferences or incomplete functionality during startup race conditions
ee/modules/dashboard/impldashboard/module.go:NewModule
All funnel API endpoints are available at '/trace-funnels' prefix, but doesn't verify the trace funnels feature is enabled or the backend supports these endpoints
If this fails: Funnel operations silently fail or return 404s if trace funnels feature is disabled in backend configuration, confusing users with missing functionality
frontend/src/api/traceFunnels/index.ts:FUNNELS_BASE_PATH
AWS region names in account configuration are valid AWS regions that support CloudFormation and the specified services, but never validates regions against AWS API
If this fails: Invalid or unsupported regions cause CloudFormation deployments to fail in AWS console after user completes SigNoz configuration, creating confusing disconnect between setup success and deployment failure
ee/modules/cloudintegration/implcloudintegration/implcloudprovider/awscloudprovider.go:BuildIntegrationConfig
QueryClient default configuration with no retry and no window focus refetch is appropriate for all telemetry data queries regardless of data criticality or user context
If this fails: Transient network failures cause permanent query failures requiring manual refresh, and stale dashboard data persists when users return to browser tabs
frontend/src/api/index.ts:interceptorsResponse
Domain extraction utility correctly handles all possible hostname formats including localhost, IP addresses, and internationalized domain names for analytics and Sentry configuration
If this fails: Analytics and error tracking may be misconfigured for non-standard deployment domains, leading to data attribution issues or privacy policy violations
frontend/src/AppRoutes/index.tsx:extractDomain
Open the standalone hidden-assumptions report for signoz →
How Data Flows Through the System
Telemetry data flows from instrumented applications through OpenTelemetry collectors into SigNoz's ingestion endpoints, gets stored in ClickHouse/SQL databases, and is queried by the frontend via the query service to render dashboards, traces, and logs. Alert rules continuously evaluate this data to trigger notifications.
- Telemetry Ingestion — Applications send OTLP data (traces, logs, metrics) via OpenTelemetry SDK to SigNoz collectors which validate and forward to storage engines [Raw OTLP Data → TelemetryData] (config: global.ingestion_url)
- Data Storage — TelemetryStore writes processed telemetry data to ClickHouse tables optimized for time-series queries, with separate schemas for traces, logs, and metrics [TelemetryData]
- Query Processing — QueryService receives dashboard/explorer requests, parses Query objects, and delegates to Querier which generates optimized ClickHouse SQL with time ranges and filters [Query → TelemetryData]
- Dashboard Rendering — Frontend React components fetch Dashboard configurations from SQL store and execute panel queries to populate charts, tables, and visualizations [Dashboard → Visualization Data]
- Alert Evaluation — Alertmanager runs scheduled jobs evaluating Alert rules by executing queries against telemetry data and comparing results to configured thresholds [Alert → Notification Events]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
pkg/types/OpenTelemetry standard with traces (spans with timing/context), logs (structured records with severity/attributes), and metrics (time-series with labels/values)
Ingested via OTLP from applications, transformed into storage format, queried for visualization
pkg/types/querybuildertypes/Query builder structure with QueryData containing MetricQuery/LogsQuery/TracesQuery, filters, aggregations, and time ranges
Built in frontend query builder, processed by query service, translated to ClickHouse/SQL queries
pkg/types/dashboardtypes/Dashboard with metadata (id, title, description) and panels array containing widget configurations and queries
Created/edited in frontend, persisted in SQL database, rendered by fetching panel data
pkg/types/AlertRule with query conditions, thresholds, evaluation frequency, and notification channels
Configured via UI, evaluated periodically by alertmanager, triggers notifications when conditions met
pkg/types/cloudintegrationtypes/Account with cloud provider type, credentials, and services array defining monitoring configurations
Configured for AWS/Azure, generates deployment artifacts, collects cloud metrics via agents
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Primary storage for traces, logs, and metrics with time-partitioned tables optimized for observability queries
PostgreSQL/SQLite storing dashboards, alerts, users, and configuration metadata
Query result caching and session storage for improved response times
In-memory buffering of query results during processing and aggregation
Feedback Loops
- Alert Rule Evaluation (polling, balancing) — Trigger: Scheduled cron jobs. Action: Execute alert queries and check thresholds. Exit: Alert disabled or system shutdown.
- Query Cache Invalidation (cache-invalidation, balancing) — Trigger: New telemetry data ingestion. Action: Clear related cached query results. Exit: Cache TTL expires.
- Database Connection Pool (auto-scale, reinforcing) — Trigger: High query load. Action: Adjust ClickHouse connection pool size. Exit: Load returns to normal.
- Frontend Auto-Refresh (polling, reinforcing) — Trigger: Dashboard view with refresh interval. Action: Re-execute panel queries. Exit: User navigates away or disables refresh.
Delays
- Telemetry Data Ingestion (async-processing, ~seconds) — Recent telemetry data may not appear immediately in queries
- Query Result Caching (cache-ttl, ~configurable minutes) — Stale data served until cache expires
- Alert Evaluation Interval (scheduled-job, ~1-60 minutes) — Alerts may not fire immediately when conditions are met
- Dashboard Auto-Refresh (scheduled-job, ~5s-5m) — Dashboards show slightly stale data between refresh cycles
Control Points
- Instrumentation Level (env-var) — Controls: Logging verbosity and internal telemetry collection. Default: info
- Cache TTL (runtime-toggle) — Controls: How long query results are cached before expiration
- Query Timeout (threshold) — Controls: Maximum execution time for database queries before cancellation
- Alert Evaluation Frequency (hyperparameter) — Controls: How often alert rules are checked against telemetry data
- Feature Flags (feature-flag) — Controls: Frontend feature availability and UI behavior
Technology Stack
Primary backend language for high-performance telemetry processing and API services
Frontend framework building the observability dashboard and exploration interfaces
Primary database for storing and querying time-series telemetry data with columnar optimization
Metadata storage for dashboards, alerts, users, and system configuration
Telemetry standards compliance for ingesting traces, logs, and metrics from applications
Query result caching and session storage for improved performance
Metrics collection format compatibility and internal system monitoring
HTTP web framework for REST API endpoints and middleware
React UI component library providing dashboard widgets and forms
Containerization for deployment and local development environments
Key Components
- QueryService (orchestrator) — Central coordinator handling all telemetry queries, dashboard data fetching, and API request routing to appropriate processors
pkg/query-service/ - Querier (processor) — Executes actual database queries against ClickHouse and SQL stores, translating query builder requests into optimized database queries
pkg/querier/ - Alertmanager (scheduler) — Periodically evaluates alert rules against telemetry data and triggers notifications when thresholds are breached
pkg/alertmanager/ - TelemetryStore (adapter) — Database abstraction layer managing connections to ClickHouse for telemetry data storage and retrieval
pkg/telemetrystore/ - Gateway (gateway) — HTTP proxy routing frontend requests to backend services with authentication middleware and CORS handling
pkg/gateway/ - Instrumentation (monitor) — Internal telemetry collection for SigNoz itself, emitting metrics and traces about system performance
pkg/instrumentation/ - CloudIntegrationModule (adapter) — Enterprise module managing connections to cloud providers (AWS/Azure) for infrastructure monitoring and agent deployment
ee/modules/cloudintegration/ - DashboardModule (processor) — Enterprise dashboard features with enhanced querying, caching, and advanced visualization capabilities
ee/modules/dashboard/
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Fullstack Repositories
Frequently Asked Questions
What is signoz used for?
Collects, processes and visualizes telemetry data from applications for observability monitoring signoz/signoz is a 8-component fullstack written in TypeScript. Data flows through 5 distinct pipeline stages. The codebase contains 4513 files.
How is signoz architected?
signoz is organized into 5 architecture layers: Frontend Layer, API Gateway Layer, Query Service Layer, Storage Layer, and 1 more. Data flows through 5 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through signoz?
Data moves through 5 stages: Telemetry Ingestion → Data Storage → Query Processing → Dashboard Rendering → Alert Evaluation. Telemetry data flows from instrumented applications through OpenTelemetry collectors into SigNoz's ingestion endpoints, gets stored in ClickHouse/SQL databases, and is queried by the frontend via the query service to render dashboards, traces, and logs. Alert rules continuously evaluate this data to trigger notifications. This pipeline design reflects a complex multi-stage processing system.
What technologies does signoz use?
The core stack includes Go (Primary backend language for high-performance telemetry processing and API services), React (Frontend framework building the observability dashboard and exploration interfaces), ClickHouse (Primary database for storing and querying time-series telemetry data with columnar optimization), PostgreSQL/SQLite (Metadata storage for dashboards, alerts, users, and system configuration), OpenTelemetry (Telemetry standards compliance for ingesting traces, logs, and metrics from applications), Redis (Query result caching and session storage for improved performance), and 4 more. This broad technology surface reflects a mature project with many integration points.
What system dynamics does signoz have?
signoz exhibits 4 data pools (ClickHouse Telemetry Database, SQL Metadata Store), 4 feedback loops, 5 control points, 4 delays. The feedback loops handle polling and cache-invalidation. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does signoz use?
5 design patterns detected: OpenTelemetry Integration, Enterprise Module Architecture, Query Builder Pattern, Multi-Store Architecture, Frontend API Integration.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.