valkey-io/valkey

A flexible distributed key-value database that is optimized for caching and other realtime workloads.

25,518 stars C 10 components

Stores and retrieves key-value pairs with low-latency access patterns and data structure operations

Client connections send Redis protocol commands over TCP, which are parsed into command arguments, validated, and dispatched to data structure operations. The operations modify the in-memory database and generate responses back to clients. Write operations are logged to AOF files and propagated to replicas, while periodic RDB snapshots provide point-in-time backups.

Under the hood, the system uses 4 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 10-component repository. 868 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Accept client connection — The event loop accepts TCP connections and creates Client structures with input/output buffers, associating them with the default database
Parse Redis protocol — processInputBuffer reads from client querybuf, parses RESP protocol into argc/argv arrays, and validates command syntax [Client → Client]
Dispatch command — processCommand looks up the command in the command table, checks client authorization, validates argument count, and calls the command implementation function [Client → Client]
Execute data operation — Command functions like setCommand or getCommand access the ValkeyDb keyspace, create or modify ValkeyObject values, and handle type-specific operations [ValkeyObject → ValkeyObject]
Log to AOF — feedAppendOnlyFile serializes write commands to the append-only file for persistence, buffering writes and flushing based on fsync policy [Client]
Replicate to slaves — replicationFeedSlaves sends write commands to connected replica servers over their replication connections [Client]
Generate response — addReply functions format responses according to RESP protocol and buffer them in the client output buffer for transmission [Client → Client]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ValkeyObject src/server.h
C struct with type: int (encoding type), encoding: int (storage format), lru: unsigned (LRU timestamp), refcount: int, ptr: void* (actual data)
Created when values are stored, reference-counted for memory safety, serialized during persistence, and freed when evicted or expired

ValkeyDb src/server.h
C struct with dict: dict* (main keyspace), expires: dict* (key TTL mapping), avg_ttl: long long, id: int
Initialized at server startup for each logical database, populated by SET/GET commands, cleaned during key expiration, persisted in RDB snapshots

Client src/server.h
C struct with fd: int (socket), argv: ValkeyObject** (command args), argc: int, db: ValkeyDb*, querybuf: sds (input buffer), buf: char[16384] (output buffer)
Created on connection accept, populated during command parsing, executes database operations, writes responses, destroyed on disconnect

Dict src/dict.h
Hash table with ht: dictht[2] (dual hash tables for rehashing), type: dictType* (callbacks), rehashidx: long (rehash progress)
Powers the main keyspace and hash objects, grows incrementally during rehashing, uses consistent hashing for distribution

RdbState src/rdb.h
C struct with rio: rio* (I/O abstraction), error: int, key_counter: int, keys_processed: long long
Created during BGSAVE or server restart, tracks serialization progress, handles I/O errors, destroyed after completion

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Resource unguarded

The histogram index calculations assume integer overflow will not occur when computing normalized_index adjustments, but index values can grow very large with high significant_figures and bucket_count parameters

If this fails: Integer overflow in index calculations causes incorrect bucket placement, silently corrupting histogram data and producing wrong percentile calculations

deps/hdr_histogram/hdr_histogram.c:normalize_index

critical Scale unguarded

The counts array length (counts_len) fits within int32_t range, but with highest_trackable_value up to INT64_MAX and small sub_bucket_count, the required array size can exceed 2^31 elements

If this fails: Large histograms silently truncate their counts array, losing data for high-value buckets and producing incorrect statistics

deps/hdr_histogram/hdr_histogram.h:struct hdr_histogram

critical Contract unguarded

The zcalloc_num function expects num * size to not overflow size_t, but callers pass arbitrary histogram dimensions without checking the multiplication

If this fails: Integer overflow in size calculation causes undersized memory allocation, leading to buffer overruns when the histogram writes beyond allocated bounds

deps/hdr_histogram/hdr_redis_malloc.h:zcalloc_num

critical Domain unguarded

64-bit atomic loads assume the underlying platform supports naturally aligned 64-bit reads, but on 32-bit systems or with packed structures, reads may not be atomic

If this fails: Torn reads of 64-bit values in multi-threaded environments produce inconsistent histogram state, causing race conditions and incorrect measurement results

deps/hdr_histogram/hdr_atomic.h:hdr_atomic_load_64

warning Environment unguarded

The test runner assumes fork() is available and behaves like Unix fork, but on some platforms (Windows, embedded systems) process creation works differently

If this fails: Test parallelization fails silently or hangs when fork() is unavailable, causing tests to run serially or not at all without clear error messages

deps/gtest-parallel/gtest_parallel.py:multiprocessing

warning Temporal unguarded

System time is monotonic and stable across test execution, but system clock adjustments (NTP, manual changes) can cause time to go backwards

If this fails: Negative test durations break timing calculations and cause test timeout logic to malfunction, potentially killing tests prematurely or never timing out

deps/gtest-parallel/gtest_parallel.py:time.time

warning Resource unguarded

File descriptor limits allow creating multiple concurrent subprocesses, but the parallel runner doesn't check ulimit -n before spawning workers

If this fails: Too many parallel tests exhaust file descriptors, causing subprocess creation to fail with cryptic errors rather than graceful degradation

deps/gtest-parallel/gtest_parallel.py:subprocess.Popen

warning Ordering unguarded

Test execution order doesn't matter for correctness, but some tests may have hidden dependencies on execution sequence or shared global state

If this fails: Tests pass when run sequentially but fail unpredictably in parallel due to race conditions or state pollution between test cases

deps/gtest-parallel/gtest_parallel_tests.py:threading

warning Shape unguarded

Double precision floating point values follow IEEE 754 format with specific bit layouts for mantissa and exponent, but some platforms use different floating point representations

If this fails: Float-to-string conversion produces wrong results on non-IEEE platforms, causing data serialization corruption and protocol violations

deps/fpconv/fpconv_dtoa.c:Grisu algorithm

info Domain unguarded

Pre-computed power-of-10 tables are accurate for the target platform's floating point precision, but different compiler optimizations or math libraries may introduce slight variations

If this fails: Accumulated rounding errors in float conversion cause string representations to differ slightly between platforms, breaking cross-platform data compatibility

deps/fpconv/fpconv_powers.h:power table

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Main keyspace (in-memory)
Hash table storing all key-value pairs in memory with O(1) average access time

AOF file (file-store)
Append-only log file that records write commands for durability and crash recovery

RDB snapshots (checkpoint)
Point-in-time binary snapshots of the entire database for backup and replication

Replication backlog (buffer)
Circular buffer of recent commands for partial resynchronization of lagging replicas

Feedback Loops

Incremental rehashing (recursive, balancing) — Trigger: Hash table load factor exceeds threshold. Action: dictRehashMilliseconds moves buckets from old to new hash table over multiple operations. Exit: All buckets migrated to new table.
Memory usage control (circuit-breaker, balancing) — Trigger: Memory usage exceeds maxmemory limit. Action: Eviction policies remove keys based on LRU, LFU, or random selection until memory is reclaimed. Exit: Memory usage falls below threshold.
Background AOF rewrite (self-correction, balancing) — Trigger: AOF file size grows beyond configured ratio. Action: rewriteAppendOnlyFileBackground forks child process to create compact AOF from current state. Exit: New AOF replaces old file.
Replica reconnection (retry, reinforcing) — Trigger: Master-replica connection lost. Action: Replica attempts reconnection with exponential backoff and requests resynchronization. Exit: Connection restored and data synced.

Delays

Background save (async-processing, ~seconds to minutes depending on dataset size) — RDB creation happens in forked child process without blocking main server
Key expiration (scheduled-job, ~100ms intervals) — Background task samples expired keys and removes them to reclaim memory
Replica lag (eventual-consistency, ~network latency + processing time) — Replicas may serve slightly stale data during high write loads

Control Points

maxmemory (threshold) — Controls: Triggers eviction policies when memory usage exceeds this limit. Default: 0 (unlimited)
save intervals (schedule) — Controls: Automatic RDB snapshot frequency based on time and number of changes. Default: 900 1, 300 10, 60 10000
appendfsync (durability-mode) — Controls: AOF file synchronization policy (no/always/everysec) balancing performance vs durability. Default: everysec
tcp-keepalive (network-tuning) — Controls: TCP keepalive timeout for detecting dead client connections. Default: 300

Technology Stack

jemalloc (library)
Memory allocator optimized for multi-threaded applications with reduced fragmentation

linenoise (library)
Command-line editing library providing history and completion for the CLI client

Lua (runtime)
Embedded scripting engine for server-side computation and atomic multi-command operations

hdr_histogram (library)
High dynamic range histogram for latency measurement and performance monitoring

CMake (build)
Cross-platform build system that handles compilation, linking, and test execution

libvalkey (sdk)
Client library providing C API for connecting to and interacting with Valkey servers

Key Components

serverMain (orchestrator) — Initializes the server, loads configuration, starts the event loop, and coordinates all subsystems src/server.c
aeMain (executor) — Event loop that multiplexes I/O events, timers, and file events using epoll/kqueue/select src/ae.c
processCommand (dispatcher) — Parses Redis protocol commands, validates arguments, checks authorization, and calls command implementations src/server.c
dictAdd (store) — Inserts key-value pairs into hash tables with collision handling and incremental rehashing src/dict.c
rdbSave (serializer) — Serializes the entire database to RDB format for point-in-time snapshots and persistence src/rdb.c
feedAppendOnlyFile (logger) — Appends write commands to the AOF file for incremental persistence and replay-based recovery src/aof.c
zmalloc (allocator) — Memory allocation wrapper that tracks usage statistics and integrates with jemalloc for performance src/zmalloc.c
expireIfNeeded (validator) — Checks key expiration times and removes expired keys during access or background cleanup src/db.c
replicationFeedSlaves (dispatcher) — Propagates write commands from master to replica servers for data synchronization src/replication.c
moduleLoad (loader) — Dynamically loads shared libraries as modules and registers their commands and data types src/module.c

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Repository Repositories

Frequently Asked Questions

What is valkey used for?

Stores and retrieves key-value pairs with low-latency access patterns and data structure operations valkey-io/valkey is a 10-component repository written in C. Data flows through 7 distinct pipeline stages. The codebase contains 868 files.

How is valkey architected?

valkey is organized into 7 architecture layers: Network Layer, Command Processing, Data Structures, Memory Management, and 3 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through valkey?

Data moves through 7 stages: Accept client connection → Parse Redis protocol → Dispatch command → Execute data operation → Log to AOF → .... Client connections send Redis protocol commands over TCP, which are parsed into command arguments, validated, and dispatched to data structure operations. The operations modify the in-memory database and generate responses back to clients. Write operations are logged to AOF files and propagated to replicas, while periodic RDB snapshots provide point-in-time backups. This pipeline design reflects a complex multi-stage processing system.

What technologies does valkey use?

The core stack includes jemalloc (Memory allocator optimized for multi-threaded applications with reduced fragmentation), linenoise (Command-line editing library providing history and completion for the CLI client), Lua (Embedded scripting engine for server-side computation and atomic multi-command operations), hdr_histogram (High dynamic range histogram for latency measurement and performance monitoring), CMake (Cross-platform build system that handles compilation, linking, and test execution), libvalkey (Client library providing C API for connecting to and interacting with Valkey servers). A focused set of dependencies that keeps the build manageable.

What system dynamics does valkey have?

valkey exhibits 4 data pools (Main keyspace, AOF file), 4 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and circuit-breaker. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does valkey use?

5 design patterns detected: Event-driven architecture, Copy-on-write persistence, Incremental rehashing, Plugin architecture, Command table dispatch.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.