matomo-org/matomo

Empowering People Ethically 🚀 — Matomo is hiring! Join us → https://matomo.org/jobs Matomo is the leading open-source alternative to Google Analytics, giving you complete control and built-in privacy. Easily collect, visualise, and analyse data from websites & apps. Star us on GitHub ⭐️ – Pull Requests welcome!

21,436 stars PHP 9 components

Tracks website visitor behavior and creates privacy-focused analytics reports

Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 9-component fullstack. 3870 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Capture tracking requests — JavaScript tracking code (matomo.js) captures visitor interactions and sends HTTP POST requests to matomo.php with visitor data including page URL, referrer, custom variables, and goal conversions
Validate and enrich tracking data — Tracker validates request parameters, applies privacy settings, enriches with IP geolocation via GeoIP2, detects devices/browsers using DeviceDetector, and handles bot filtering [TrackingData → TrackingData]
Store raw visits — Tracker inserts validated tracking data into MySQL log tables (log_visit for sessions, log_link_visit_action for page views, log_conversion for goals) using the Db adapter [TrackingData → LogTables]
Process into archives — CronArchive spawns ArchiveProcessor instances that read from log tables, calculate aggregated metrics through plugin-specific logic, and store results in archive_numeric and archive_blob tables [LogTables → ArchiveData]
Generate reports — API Proxy routes requests to plugin Report classes which retrieve data from Archive tables, construct DataTable objects with filtering and formatting applied, ready for serialization [ArchiveData → DataTable]
Render visualizations — Vue.js components in plugins/*/vue/ receive DataTable JSON from API endpoints and render them as charts, tables, or custom visualizations using the configured visualization type [DataTable → VisualizationData]
Display dashboards — CoreHome Vue components compose individual report visualizations into dashboards, handling URL routing, period selection, and user interactions through MatomoUrl and other stores [VisualizationData]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

TrackingData core/Tracker/
PHP arrays with visitor info (idsite: int, rec: 1, action_name: string, url: string, urlref: string, _id: visitor_id, _idts: timestamp, custom variables, goal data, ecommerce items)
Created from JavaScript tracker HTTP requests, validated and enriched with IP geolocation and device detection, then inserted into log tables

LogTables core/DataAccess/
MySQL tables (log_visit: visitor sessions, log_link_visit_action: page views/downloads/outlinks, log_conversion: goal completions, log_conversion_item: ecommerce items)
Raw tracking data is inserted into these tables, then read during archiving to calculate aggregated metrics

ArchiveData core/Archive/
Serialized PHP objects in archive_numeric (single values) and archive_blob (complex data like DataTable) tables, keyed by period, site, segment, and metric name
Generated from log tables during scheduled archiving, cached for fast report serving, invalidated when new data arrives

DataTable core/DataTable/
PHP object with rows (each containing columns: label, nb_visits, nb_actions, conversion_rate, etc.) plus metadata for totals, filters applied, and period info
Constructed from archived data when reports are requested, transformed through filters and formatters, then serialized to JSON for API responses

ReportMetadata plugins/*/Reports/
PHP arrays defining report properties (name, module, action, dimension, metrics, category, subcategory, order, documentation, processedMetrics, default visualizations)
Defined statically in plugin Report classes, collected during plugin loading, used to build menus and determine report processing

VisualizationData plugins/CoreHome/vue/
TypeScript objects with reportData (DataTable as JSON), reportMetadata (report config), requestConfig (period, site, segment), and visualization-specific options
Assembled by combining API report data with metadata and user preferences, passed to Vue components for rendering charts and tables

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract weakly guarded

JavaScript tracking code always sends HTTP requests with required parameters (idsite, rec=1) and proper encoding, but Tracker doesn't validate parameter presence before processing

If this fails: Missing idsite parameter would cause database insertion failures or tracking data attributed to wrong sites, while missing rec parameter might skip tracking entirely without clear error messages

core/Tracker/Tracker.php:main tracking flow

critical Temporal unguarded

Archive processing completes before the next scheduled run begins, with no overlap detection or queue management for long-running archiving jobs

If this fails: Multiple CronArchive instances could process the same data simultaneously, leading to duplicate calculations, race conditions in archive table updates, or incomplete/corrupted aggregated reports

core/CronArchive/CronArchive.php:archiving scheduler

critical Scale unguarded

Log tables (log_visit, log_link_visit_action) contain manageable amounts of data that can be processed in single queries without memory limits or timeouts

If this fails: Sites with millions of daily visits would cause archiving queries to exceed PHP memory limits or MySQL query timeouts, resulting in incomplete archives and missing report data

core/ArchiveProcessor/ArchiveProcessor.php:log table queries

critical Shape weakly guarded

Archive data retrieved from archive_blob tables contains properly serialized DataTable objects with expected column structure (label, nb_visits, nb_actions, etc.)

If this fails: Corrupted or differently-structured archived data would cause report generation to fail silently or display wrong metrics, especially after plugin updates that change report schemas

plugins/*/Reports/*.php:DataTable construction

critical Resource unguarded

MySQL database remains available and responsive throughout request processing, with no connection pooling or retry logic for temporary network issues

If this fails: Database connection drops during long-running archiving jobs would cause partial data loss and require manual recovery, while connection issues during tracking would result in lost visitor data

core/Db/Adapter.php:MySQL connection handling

warning Ordering unguarded

Archive invalidation events are processed in the order they're created, ensuring dependent archives are recalculated after their dependencies

If this fails: Out-of-order invalidation could cause child archives to be recalculated with stale parent data, leading to inconsistent report hierarchies and incorrect drill-down analytics

core/Archive/ArchiveInvalidator.php:invalidation processing

warning Domain weakly guarded

All visitor IP addresses are valid IPv4/IPv6 addresses that exist in the GeoIP2 database, without handling for private networks, VPNs, or proxy servers

If this fails: Corporate users behind NAT or VPN would be geolocated to incorrect countries, while IPv6 addresses might fail lookup entirely, skewing geographic reports

plugins/GeoIp2/:IP geolocation

warning Contract weakly guarded

Plugin classes contain methods matching the API request format (PluginName.methodName) and return DataTable objects or primitive values

If this fails: Plugin methods that return unexpected types or throw exceptions would cause API responses to fail without helpful error messages, breaking dashboard widgets and report displays

core/API/Proxy.php:plugin method routing

warning Environment weakly guarded

All plugin directories contain valid plugin.json files with required metadata (name, version, php version) and PHP files are syntactically correct

If this fails: Malformed plugin files would cause the entire plugin system to fail loading, potentially breaking the entire Matomo installation during startup

core/Plugin/Manager.php:plugin loading

warning Temporal unguarded

User's browser clock is reasonably accurate for timestamp generation, and tracking requests are sent within a reasonable time window of the actual page view

If this fails: Users with significantly wrong system clocks would generate tracking data with incorrect timestamps, skewing time-based reports and making visitor session reconstruction unreliable

matomo.js:JavaScript tracking

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Log Tables (database)
Raw tracking data accumulates here as visitors interact with tracked websites - sessions in log_visit, page views in log_link_visit_action, conversions in log_conversion

Archive Tables (database)
Pre-calculated report data cached here by period and segment - numeric metrics in archive_numeric, complex data structures in archive_blob

Plugin Registry (registry)
Loaded plugin instances and their metadata accumulate here during application bootstrap, enabling dynamic report and visualization discovery

Session Storage (cache)
User authentication state and preferences stored here for web interface sessions

Feedback Loops

Archive Invalidation (cache-invalidation, balancing) — Trigger: New tracking data arrives. Action: ArchiveInvalidator marks affected archives as outdated and schedules re-processing. Exit: All dependent archives are recalculated.
Scheduled Archiving (polling, balancing) — Trigger: Cron job runs or API request detects missing archive. Action: CronArchive checks for sites needing processing and spawns ArchiveProcessor instances. Exit: All required archives are up to date.
Plugin Loading (self-correction, balancing) — Trigger: Missing plugin dependency detected. Action: Plugin Manager loads required plugins and rebuilds dependency graph. Exit: All plugin dependencies satisfied.

Delays

Archive Processing (batch-window, ~Configurable intervals (hourly/daily)) — Report data may be outdated until next archiving cycle completes
GeoIP Lookup (async-processing, ~Network-dependent) — Visitor location data added during tracking may introduce latency
Database Archiving (eventual-consistency, ~Minutes to hours for large datasets) — Recent tracking data not visible in reports until processing completes

Control Points

Archive Processing Mode (feature-flag) — Controls: Whether archives are processed on-demand during API requests or only via scheduled jobs
Tracking Configuration (env-var) — Controls: Which visitor data is collected (cookies, user ID, custom dimensions, goals)
Plugin Activation (runtime-toggle) — Controls: Which analytics features are enabled and visible in the interface
Privacy Settings (threshold) — Controls: Data retention periods, IP anonymization level, and visitor consent handling

Technology Stack

PHP (runtime)
Server-side runtime for all backend logic including tracking, data processing, and API serving

MySQL (database)
Primary data store for raw tracking data, aggregated archives, and configuration settings

Vue.js (framework)
Frontend framework for building interactive dashboards and administration interfaces

jQuery (library)
DOM manipulation and AJAX requests in legacy frontend code and tracking scripts

GeoIP2 (library)
IP geolocation service for enriching visitor data with country/city information

DeviceDetector (library)
User-agent parsing to identify visitor devices, browsers, and operating systems

Twig (framework)
Template engine for rendering HTML in PHP backend components

PHPUnit (testing)
Backend testing framework for unit and integration tests

Key Components

Tracker (processor) — Receives tracking requests from JavaScript, validates data, enriches with IP geolocation and device detection, then stores visits and actions in log tables core/Tracker/Tracker.php
ArchiveProcessor (processor) — Transforms raw log data into aggregated metrics for specific periods and sites, calculating totals, unique counts, and complex reports through plugin-specific aggregation logic core/ArchiveProcessor/ArchiveProcessor.php
CronArchive (scheduler) — Orchestrates the archiving process by identifying which sites and periods need processing, spawning ArchiveProcessor instances, and managing concurrency limits core/CronArchive/CronArchive.php
API (gateway) — Routes API requests to appropriate plugin methods, handles authentication, applies common filters, and serializes responses to JSON/XML/CSV formats core/API/Proxy.php
Plugin Manager (registry) — Discovers and loads plugins from the plugins directory, manages their lifecycle, dependency injection, and provides hooks for extending core functionality core/Plugin/Manager.php
DataTable (transformer) — Provides a structured data container with filtering, sorting, and formatting capabilities, used to manipulate report data before visualization core/DataTable/DataTable.php
Request (dispatcher) — Parses incoming HTTP requests, validates parameters, determines which plugin method to call, and coordinates the request/response cycle core/Request.php
Archive (processor) — Manages archive invalidation when new data arrives, determining which cached reports need to be recalculated and scheduling the appropriate archiving tasks core/Archive/ArchiveInvalidator.php
Db (adapter) — Database abstraction layer providing consistent interface to MySQL operations, including connection management, query building, and transaction handling core/Db/Adapter.php

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Fullstack Repositories

Frequently Asked Questions

What is matomo used for?

Tracks website visitor behavior and creates privacy-focused analytics reports matomo-org/matomo is a 9-component fullstack written in PHP. Data flows through 7 distinct pipeline stages. The codebase contains 3870 files.

How is matomo architected?

matomo is organized into 5 architecture layers: Tracking Layer, Core Framework, Plugin System, Data Processing, and 1 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through matomo?

Data moves through 7 stages: Capture tracking requests → Validate and enrich tracking data → Store raw visits → Process into archives → Generate reports → .... Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization. This pipeline design reflects a complex multi-stage processing system.

What technologies does matomo use?

The core stack includes PHP (Server-side runtime for all backend logic including tracking, data processing, and API serving), MySQL (Primary data store for raw tracking data, aggregated archives, and configuration settings), Vue.js (Frontend framework for building interactive dashboards and administration interfaces), jQuery (DOM manipulation and AJAX requests in legacy frontend code and tracking scripts), GeoIP2 (IP geolocation service for enriching visitor data with country/city information), DeviceDetector (User-agent parsing to identify visitor devices, browsers, and operating systems), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does matomo have?

matomo exhibits 4 data pools (Log Tables, Archive Tables), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle cache-invalidation and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does matomo use?

4 design patterns detected: Plugin Architecture, Data Table Pattern, Archive-First Design, Multi-Layer Caching.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.