matomo-org/matomo
Empowering People Ethically 🚀 — Matomo is hiring! Join us → https://matomo.org/jobs Matomo is the leading open-source alternative to Google Analytics, giving you complete control and built-in privacy. Easily collect, visualise, and analyse data from websites & apps. Star us on GitHub ⭐️ – Pull Requests welcome!
Tracks website visitor behavior and creates privacy-focused analytics reports
Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization.
Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.
A 9-component fullstack. 3870 files analyzed. Data flows through 7 distinct pipeline stages.
How Data Flows Through the System
Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization.
- Capture tracking requests — JavaScript tracking code (matomo.js) captures visitor interactions and sends HTTP POST requests to matomo.php with visitor data including page URL, referrer, custom variables, and goal conversions
- Validate and enrich tracking data — Tracker validates request parameters, applies privacy settings, enriches with IP geolocation via GeoIP2, detects devices/browsers using DeviceDetector, and handles bot filtering [TrackingData → TrackingData]
- Store raw visits — Tracker inserts validated tracking data into MySQL log tables (log_visit for sessions, log_link_visit_action for page views, log_conversion for goals) using the Db adapter [TrackingData → LogTables]
- Process into archives — CronArchive spawns ArchiveProcessor instances that read from log tables, calculate aggregated metrics through plugin-specific logic, and store results in archive_numeric and archive_blob tables [LogTables → ArchiveData]
- Generate reports — API Proxy routes requests to plugin Report classes which retrieve data from Archive tables, construct DataTable objects with filtering and formatting applied, ready for serialization [ArchiveData → DataTable]
- Render visualizations — Vue.js components in plugins/*/vue/ receive DataTable JSON from API endpoints and render them as charts, tables, or custom visualizations using the configured visualization type [DataTable → VisualizationData]
- Display dashboards — CoreHome Vue components compose individual report visualizations into dashboards, handling URL routing, period selection, and user interactions through MatomoUrl and other stores [VisualizationData]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
core/Tracker/PHP arrays with visitor info (idsite: int, rec: 1, action_name: string, url: string, urlref: string, _id: visitor_id, _idts: timestamp, custom variables, goal data, ecommerce items)
Created from JavaScript tracker HTTP requests, validated and enriched with IP geolocation and device detection, then inserted into log tables
core/DataAccess/MySQL tables (log_visit: visitor sessions, log_link_visit_action: page views/downloads/outlinks, log_conversion: goal completions, log_conversion_item: ecommerce items)
Raw tracking data is inserted into these tables, then read during archiving to calculate aggregated metrics
core/Archive/Serialized PHP objects in archive_numeric (single values) and archive_blob (complex data like DataTable) tables, keyed by period, site, segment, and metric name
Generated from log tables during scheduled archiving, cached for fast report serving, invalidated when new data arrives
core/DataTable/PHP object with rows (each containing columns: label, nb_visits, nb_actions, conversion_rate, etc.) plus metadata for totals, filters applied, and period info
Constructed from archived data when reports are requested, transformed through filters and formatters, then serialized to JSON for API responses
plugins/*/Reports/PHP arrays defining report properties (name, module, action, dimension, metrics, category, subcategory, order, documentation, processedMetrics, default visualizations)
Defined statically in plugin Report classes, collected during plugin loading, used to build menus and determine report processing
plugins/CoreHome/vue/TypeScript objects with reportData (DataTable as JSON), reportMetadata (report config), requestConfig (period, site, segment), and visualization-specific options
Assembled by combining API report data with metadata and user preferences, passed to Vue components for rendering charts and tables
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
JavaScript tracking code always sends HTTP requests with required parameters (idsite, rec=1) and proper encoding, but Tracker doesn't validate parameter presence before processing
If this fails: Missing idsite parameter would cause database insertion failures or tracking data attributed to wrong sites, while missing rec parameter might skip tracking entirely without clear error messages
core/Tracker/Tracker.php:main tracking flow
Archive processing completes before the next scheduled run begins, with no overlap detection or queue management for long-running archiving jobs
If this fails: Multiple CronArchive instances could process the same data simultaneously, leading to duplicate calculations, race conditions in archive table updates, or incomplete/corrupted aggregated reports
core/CronArchive/CronArchive.php:archiving scheduler
Log tables (log_visit, log_link_visit_action) contain manageable amounts of data that can be processed in single queries without memory limits or timeouts
If this fails: Sites with millions of daily visits would cause archiving queries to exceed PHP memory limits or MySQL query timeouts, resulting in incomplete archives and missing report data
core/ArchiveProcessor/ArchiveProcessor.php:log table queries
Archive data retrieved from archive_blob tables contains properly serialized DataTable objects with expected column structure (label, nb_visits, nb_actions, etc.)
If this fails: Corrupted or differently-structured archived data would cause report generation to fail silently or display wrong metrics, especially after plugin updates that change report schemas
plugins/*/Reports/*.php:DataTable construction
MySQL database remains available and responsive throughout request processing, with no connection pooling or retry logic for temporary network issues
If this fails: Database connection drops during long-running archiving jobs would cause partial data loss and require manual recovery, while connection issues during tracking would result in lost visitor data
core/Db/Adapter.php:MySQL connection handling
Archive invalidation events are processed in the order they're created, ensuring dependent archives are recalculated after their dependencies
If this fails: Out-of-order invalidation could cause child archives to be recalculated with stale parent data, leading to inconsistent report hierarchies and incorrect drill-down analytics
core/Archive/ArchiveInvalidator.php:invalidation processing
All visitor IP addresses are valid IPv4/IPv6 addresses that exist in the GeoIP2 database, without handling for private networks, VPNs, or proxy servers
If this fails: Corporate users behind NAT or VPN would be geolocated to incorrect countries, while IPv6 addresses might fail lookup entirely, skewing geographic reports
plugins/GeoIp2/:IP geolocation
Plugin classes contain methods matching the API request format (PluginName.methodName) and return DataTable objects or primitive values
If this fails: Plugin methods that return unexpected types or throw exceptions would cause API responses to fail without helpful error messages, breaking dashboard widgets and report displays
core/API/Proxy.php:plugin method routing
All plugin directories contain valid plugin.json files with required metadata (name, version, php version) and PHP files are syntactically correct
If this fails: Malformed plugin files would cause the entire plugin system to fail loading, potentially breaking the entire Matomo installation during startup
core/Plugin/Manager.php:plugin loading
User's browser clock is reasonably accurate for timestamp generation, and tracking requests are sent within a reasonable time window of the actual page view
If this fails: Users with significantly wrong system clocks would generate tracking data with incorrect timestamps, skewing time-based reports and making visitor session reconstruction unreliable
matomo.js:JavaScript tracking
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Raw tracking data accumulates here as visitors interact with tracked websites - sessions in log_visit, page views in log_link_visit_action, conversions in log_conversion
Pre-calculated report data cached here by period and segment - numeric metrics in archive_numeric, complex data structures in archive_blob
Loaded plugin instances and their metadata accumulate here during application bootstrap, enabling dynamic report and visualization discovery
User authentication state and preferences stored here for web interface sessions
Feedback Loops
- Archive Invalidation (cache-invalidation, balancing) — Trigger: New tracking data arrives. Action: ArchiveInvalidator marks affected archives as outdated and schedules re-processing. Exit: All dependent archives are recalculated.
- Scheduled Archiving (polling, balancing) — Trigger: Cron job runs or API request detects missing archive. Action: CronArchive checks for sites needing processing and spawns ArchiveProcessor instances. Exit: All required archives are up to date.
- Plugin Loading (self-correction, balancing) — Trigger: Missing plugin dependency detected. Action: Plugin Manager loads required plugins and rebuilds dependency graph. Exit: All plugin dependencies satisfied.
Delays
- Archive Processing (batch-window, ~Configurable intervals (hourly/daily)) — Report data may be outdated until next archiving cycle completes
- GeoIP Lookup (async-processing, ~Network-dependent) — Visitor location data added during tracking may introduce latency
- Database Archiving (eventual-consistency, ~Minutes to hours for large datasets) — Recent tracking data not visible in reports until processing completes
Control Points
- Archive Processing Mode (feature-flag) — Controls: Whether archives are processed on-demand during API requests or only via scheduled jobs
- Tracking Configuration (env-var) — Controls: Which visitor data is collected (cookies, user ID, custom dimensions, goals)
- Plugin Activation (runtime-toggle) — Controls: Which analytics features are enabled and visible in the interface
- Privacy Settings (threshold) — Controls: Data retention periods, IP anonymization level, and visitor consent handling
Technology Stack
Server-side runtime for all backend logic including tracking, data processing, and API serving
Primary data store for raw tracking data, aggregated archives, and configuration settings
Frontend framework for building interactive dashboards and administration interfaces
DOM manipulation and AJAX requests in legacy frontend code and tracking scripts
IP geolocation service for enriching visitor data with country/city information
User-agent parsing to identify visitor devices, browsers, and operating systems
Template engine for rendering HTML in PHP backend components
Backend testing framework for unit and integration tests
Key Components
- Tracker (processor) — Receives tracking requests from JavaScript, validates data, enriches with IP geolocation and device detection, then stores visits and actions in log tables
core/Tracker/Tracker.php - ArchiveProcessor (processor) — Transforms raw log data into aggregated metrics for specific periods and sites, calculating totals, unique counts, and complex reports through plugin-specific aggregation logic
core/ArchiveProcessor/ArchiveProcessor.php - CronArchive (scheduler) — Orchestrates the archiving process by identifying which sites and periods need processing, spawning ArchiveProcessor instances, and managing concurrency limits
core/CronArchive/CronArchive.php - API (gateway) — Routes API requests to appropriate plugin methods, handles authentication, applies common filters, and serializes responses to JSON/XML/CSV formats
core/API/Proxy.php - Plugin Manager (registry) — Discovers and loads plugins from the plugins directory, manages their lifecycle, dependency injection, and provides hooks for extending core functionality
core/Plugin/Manager.php - DataTable (transformer) — Provides a structured data container with filtering, sorting, and formatting capabilities, used to manipulate report data before visualization
core/DataTable/DataTable.php - Request (dispatcher) — Parses incoming HTTP requests, validates parameters, determines which plugin method to call, and coordinates the request/response cycle
core/Request.php - Archive (processor) — Manages archive invalidation when new data arrives, determining which cached reports need to be recalculated and scheduling the appropriate archiving tasks
core/Archive/ArchiveInvalidator.php - Db (adapter) — Database abstraction layer providing consistent interface to MySQL operations, including connection management, query building, and transaction handling
core/Db/Adapter.php
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Fullstack Repositories
Frequently Asked Questions
What is matomo used for?
Tracks website visitor behavior and creates privacy-focused analytics reports matomo-org/matomo is a 9-component fullstack written in PHP. Data flows through 7 distinct pipeline stages. The codebase contains 3870 files.
How is matomo architected?
matomo is organized into 5 architecture layers: Tracking Layer, Core Framework, Plugin System, Data Processing, and 1 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through matomo?
Data moves through 7 stages: Capture tracking requests → Validate and enrich tracking data → Store raw visits → Process into archives → Generate reports → .... Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization. This pipeline design reflects a complex multi-stage processing system.
What technologies does matomo use?
The core stack includes PHP (Server-side runtime for all backend logic including tracking, data processing, and API serving), MySQL (Primary data store for raw tracking data, aggregated archives, and configuration settings), Vue.js (Frontend framework for building interactive dashboards and administration interfaces), jQuery (DOM manipulation and AJAX requests in legacy frontend code and tracking scripts), GeoIP2 (IP geolocation service for enriching visitor data with country/city information), DeviceDetector (User-agent parsing to identify visitor devices, browsers, and operating systems), and 2 more. A focused set of dependencies that keeps the build manageable.
What system dynamics does matomo have?
matomo exhibits 4 data pools (Log Tables, Archive Tables), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle cache-invalidation and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does matomo use?
4 design patterns detected: Plugin Architecture, Data Table Pattern, Archive-First Design, Multi-Layer Caching.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.