matomo-org/matomo

Empowering People Ethically 🚀 — Matomo is hiring! Join us → https://matomo.org/jobs Matomo is the leading open-source alternative to Google Analytics, giving you complete control and built-in privacy. Easily collect, visualise, and analyse data from websites & apps. Star us on GitHub ⭐️ – Pull Requests welcome!

21,436 stars PHP 9 components

Tracks website visitor behavior and creates privacy-focused analytics reports

Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization.

Under the hood, the system uses 3 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 9-component fullstack. 3870 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization.

  1. Capture tracking requests — JavaScript tracking code (matomo.js) captures visitor interactions and sends HTTP POST requests to matomo.php with visitor data including page URL, referrer, custom variables, and goal conversions
  2. Validate and enrich tracking data — Tracker validates request parameters, applies privacy settings, enriches with IP geolocation via GeoIP2, detects devices/browsers using DeviceDetector, and handles bot filtering [TrackingData → TrackingData]
  3. Store raw visits — Tracker inserts validated tracking data into MySQL log tables (log_visit for sessions, log_link_visit_action for page views, log_conversion for goals) using the Db adapter [TrackingData → LogTables]
  4. Process into archives — CronArchive spawns ArchiveProcessor instances that read from log tables, calculate aggregated metrics through plugin-specific logic, and store results in archive_numeric and archive_blob tables [LogTables → ArchiveData]
  5. Generate reports — API Proxy routes requests to plugin Report classes which retrieve data from Archive tables, construct DataTable objects with filtering and formatting applied, ready for serialization [ArchiveData → DataTable]
  6. Render visualizations — Vue.js components in plugins/*/vue/ receive DataTable JSON from API endpoints and render them as charts, tables, or custom visualizations using the configured visualization type [DataTable → VisualizationData]
  7. Display dashboards — CoreHome Vue components compose individual report visualizations into dashboards, handling URL routing, period selection, and user interactions through MatomoUrl and other stores [VisualizationData]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

TrackingData core/Tracker/
PHP arrays with visitor info (idsite: int, rec: 1, action_name: string, url: string, urlref: string, _id: visitor_id, _idts: timestamp, custom variables, goal data, ecommerce items)
Created from JavaScript tracker HTTP requests, validated and enriched with IP geolocation and device detection, then inserted into log tables
LogTables core/DataAccess/
MySQL tables (log_visit: visitor sessions, log_link_visit_action: page views/downloads/outlinks, log_conversion: goal completions, log_conversion_item: ecommerce items)
Raw tracking data is inserted into these tables, then read during archiving to calculate aggregated metrics
ArchiveData core/Archive/
Serialized PHP objects in archive_numeric (single values) and archive_blob (complex data like DataTable) tables, keyed by period, site, segment, and metric name
Generated from log tables during scheduled archiving, cached for fast report serving, invalidated when new data arrives
DataTable core/DataTable/
PHP object with rows (each containing columns: label, nb_visits, nb_actions, conversion_rate, etc.) plus metadata for totals, filters applied, and period info
Constructed from archived data when reports are requested, transformed through filters and formatters, then serialized to JSON for API responses
ReportMetadata plugins/*/Reports/
PHP arrays defining report properties (name, module, action, dimension, metrics, category, subcategory, order, documentation, processedMetrics, default visualizations)
Defined statically in plugin Report classes, collected during plugin loading, used to build menus and determine report processing
VisualizationData plugins/CoreHome/vue/
TypeScript objects with reportData (DataTable as JSON), reportMetadata (report config), requestConfig (period, site, segment), and visualization-specific options
Assembled by combining API report data with metadata and user preferences, passed to Vue components for rendering charts and tables

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Contract weakly guarded

JavaScript tracking code always sends HTTP requests with required parameters (idsite, rec=1) and proper encoding, but Tracker doesn't validate parameter presence before processing

If this fails: Missing idsite parameter would cause database insertion failures or tracking data attributed to wrong sites, while missing rec parameter might skip tracking entirely without clear error messages

core/Tracker/Tracker.php:main tracking flow
critical Temporal unguarded

Archive processing completes before the next scheduled run begins, with no overlap detection or queue management for long-running archiving jobs

If this fails: Multiple CronArchive instances could process the same data simultaneously, leading to duplicate calculations, race conditions in archive table updates, or incomplete/corrupted aggregated reports

core/CronArchive/CronArchive.php:archiving scheduler
critical Scale unguarded

Log tables (log_visit, log_link_visit_action) contain manageable amounts of data that can be processed in single queries without memory limits or timeouts

If this fails: Sites with millions of daily visits would cause archiving queries to exceed PHP memory limits or MySQL query timeouts, resulting in incomplete archives and missing report data

core/ArchiveProcessor/ArchiveProcessor.php:log table queries
critical Shape weakly guarded

Archive data retrieved from archive_blob tables contains properly serialized DataTable objects with expected column structure (label, nb_visits, nb_actions, etc.)

If this fails: Corrupted or differently-structured archived data would cause report generation to fail silently or display wrong metrics, especially after plugin updates that change report schemas

plugins/*/Reports/*.php:DataTable construction
critical Resource unguarded

MySQL database remains available and responsive throughout request processing, with no connection pooling or retry logic for temporary network issues

If this fails: Database connection drops during long-running archiving jobs would cause partial data loss and require manual recovery, while connection issues during tracking would result in lost visitor data

core/Db/Adapter.php:MySQL connection handling
warning Ordering unguarded

Archive invalidation events are processed in the order they're created, ensuring dependent archives are recalculated after their dependencies

If this fails: Out-of-order invalidation could cause child archives to be recalculated with stale parent data, leading to inconsistent report hierarchies and incorrect drill-down analytics

core/Archive/ArchiveInvalidator.php:invalidation processing
warning Domain weakly guarded

All visitor IP addresses are valid IPv4/IPv6 addresses that exist in the GeoIP2 database, without handling for private networks, VPNs, or proxy servers

If this fails: Corporate users behind NAT or VPN would be geolocated to incorrect countries, while IPv6 addresses might fail lookup entirely, skewing geographic reports

plugins/GeoIp2/:IP geolocation
warning Contract weakly guarded

Plugin classes contain methods matching the API request format (PluginName.methodName) and return DataTable objects or primitive values

If this fails: Plugin methods that return unexpected types or throw exceptions would cause API responses to fail without helpful error messages, breaking dashboard widgets and report displays

core/API/Proxy.php:plugin method routing
warning Environment weakly guarded

All plugin directories contain valid plugin.json files with required metadata (name, version, php version) and PHP files are syntactically correct

If this fails: Malformed plugin files would cause the entire plugin system to fail loading, potentially breaking the entire Matomo installation during startup

core/Plugin/Manager.php:plugin loading
warning Temporal unguarded

User's browser clock is reasonably accurate for timestamp generation, and tracking requests are sent within a reasonable time window of the actual page view

If this fails: Users with significantly wrong system clocks would generate tracking data with incorrect timestamps, skewing time-based reports and making visitor session reconstruction unreliable

matomo.js:JavaScript tracking

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Log Tables (database)
Raw tracking data accumulates here as visitors interact with tracked websites - sessions in log_visit, page views in log_link_visit_action, conversions in log_conversion
Archive Tables (database)
Pre-calculated report data cached here by period and segment - numeric metrics in archive_numeric, complex data structures in archive_blob
Plugin Registry (registry)
Loaded plugin instances and their metadata accumulate here during application bootstrap, enabling dynamic report and visualization discovery
Session Storage (cache)
User authentication state and preferences stored here for web interface sessions

Feedback Loops

Delays

Control Points

Technology Stack

PHP (runtime)
Server-side runtime for all backend logic including tracking, data processing, and API serving
MySQL (database)
Primary data store for raw tracking data, aggregated archives, and configuration settings
Vue.js (framework)
Frontend framework for building interactive dashboards and administration interfaces
jQuery (library)
DOM manipulation and AJAX requests in legacy frontend code and tracking scripts
GeoIP2 (library)
IP geolocation service for enriching visitor data with country/city information
DeviceDetector (library)
User-agent parsing to identify visitor devices, browsers, and operating systems
Twig (framework)
Template engine for rendering HTML in PHP backend components
PHPUnit (testing)
Backend testing framework for unit and integration tests

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Fullstack Repositories

Frequently Asked Questions

What is matomo used for?

Tracks website visitor behavior and creates privacy-focused analytics reports matomo-org/matomo is a 9-component fullstack written in PHP. Data flows through 7 distinct pipeline stages. The codebase contains 3870 files.

How is matomo architected?

matomo is organized into 5 architecture layers: Tracking Layer, Core Framework, Plugin System, Data Processing, and 1 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through matomo?

Data moves through 7 stages: Capture tracking requests → Validate and enrich tracking data → Store raw visits → Process into archives → Generate reports → .... Website visitors trigger JavaScript tracking codes that send HTTP requests containing interaction data to matomo.php. The Tracker validates and enriches this data with geolocation and device detection, storing it in MySQL log tables. Scheduled CronArchive jobs read from log tables to generate aggregated reports stored in archive tables. When users request reports through the web interface, the API system retrieves data from archives, formats it into DataTable objects, and serves it to Vue.js components for visualization. This pipeline design reflects a complex multi-stage processing system.

What technologies does matomo use?

The core stack includes PHP (Server-side runtime for all backend logic including tracking, data processing, and API serving), MySQL (Primary data store for raw tracking data, aggregated archives, and configuration settings), Vue.js (Frontend framework for building interactive dashboards and administration interfaces), jQuery (DOM manipulation and AJAX requests in legacy frontend code and tracking scripts), GeoIP2 (IP geolocation service for enriching visitor data with country/city information), DeviceDetector (User-agent parsing to identify visitor devices, browsers, and operating systems), and 2 more. A focused set of dependencies that keeps the build manageable.

What system dynamics does matomo have?

matomo exhibits 4 data pools (Log Tables, Archive Tables), 3 feedback loops, 4 control points, 3 delays. The feedback loops handle cache-invalidation and polling. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does matomo use?

4 design patterns detected: Plugin Architecture, Data Table Pattern, Archive-First Design, Multi-Layer Caching.

Analyzed on April 20, 2026 by CodeSea. Written by .