valkey-io/valkey

A flexible distributed key-value database that is optimized for caching and other realtime workloads.

25,518 stars C 10 components

Stores and retrieves key-value pairs with low-latency access patterns and data structure operations

Client connections send Redis protocol commands over TCP, which are parsed into command arguments, validated, and dispatched to data structure operations. The operations modify the in-memory database and generate responses back to clients. Write operations are logged to AOF files and propagated to replicas, while periodic RDB snapshots provide point-in-time backups.

Under the hood, the system uses 4 feedback loops, 4 data pools, 4 control points to manage its runtime behavior.

A 10-component repository. 868 files analyzed. Data flows through 7 distinct pipeline stages.

How Data Flows Through the System

Client connections send Redis protocol commands over TCP, which are parsed into command arguments, validated, and dispatched to data structure operations. The operations modify the in-memory database and generate responses back to clients. Write operations are logged to AOF files and propagated to replicas, while periodic RDB snapshots provide point-in-time backups.

  1. Accept client connection — The event loop accepts TCP connections and creates Client structures with input/output buffers, associating them with the default database
  2. Parse Redis protocol — processInputBuffer reads from client querybuf, parses RESP protocol into argc/argv arrays, and validates command syntax [Client → Client]
  3. Dispatch command — processCommand looks up the command in the command table, checks client authorization, validates argument count, and calls the command implementation function [Client → Client]
  4. Execute data operation — Command functions like setCommand or getCommand access the ValkeyDb keyspace, create or modify ValkeyObject values, and handle type-specific operations [ValkeyObject → ValkeyObject]
  5. Log to AOF — feedAppendOnlyFile serializes write commands to the append-only file for persistence, buffering writes and flushing based on fsync policy [Client]
  6. Replicate to slaves — replicationFeedSlaves sends write commands to connected replica servers over their replication connections [Client]
  7. Generate response — addReply functions format responses according to RESP protocol and buffer them in the client output buffer for transmission [Client → Client]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

ValkeyObject src/server.h
C struct with type: int (encoding type), encoding: int (storage format), lru: unsigned (LRU timestamp), refcount: int, ptr: void* (actual data)
Created when values are stored, reference-counted for memory safety, serialized during persistence, and freed when evicted or expired
ValkeyDb src/server.h
C struct with dict: dict* (main keyspace), expires: dict* (key TTL mapping), avg_ttl: long long, id: int
Initialized at server startup for each logical database, populated by SET/GET commands, cleaned during key expiration, persisted in RDB snapshots
Client src/server.h
C struct with fd: int (socket), argv: ValkeyObject** (command args), argc: int, db: ValkeyDb*, querybuf: sds (input buffer), buf: char[16384] (output buffer)
Created on connection accept, populated during command parsing, executes database operations, writes responses, destroyed on disconnect
Dict src/dict.h
Hash table with ht: dictht[2] (dual hash tables for rehashing), type: dictType* (callbacks), rehashidx: long (rehash progress)
Powers the main keyspace and hash objects, grows incrementally during rehashing, uses consistent hashing for distribution
RdbState src/rdb.h
C struct with rio: rio* (I/O abstraction), error: int, key_counter: int, keys_processed: long long
Created during BGSAVE or server restart, tracks serialization progress, handles I/O errors, destroyed after completion

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Resource unguarded

The histogram index calculations assume integer overflow will not occur when computing normalized_index adjustments, but index values can grow very large with high significant_figures and bucket_count parameters

If this fails: Integer overflow in index calculations causes incorrect bucket placement, silently corrupting histogram data and producing wrong percentile calculations

deps/hdr_histogram/hdr_histogram.c:normalize_index
critical Scale unguarded

The counts array length (counts_len) fits within int32_t range, but with highest_trackable_value up to INT64_MAX and small sub_bucket_count, the required array size can exceed 2^31 elements

If this fails: Large histograms silently truncate their counts array, losing data for high-value buckets and producing incorrect statistics

deps/hdr_histogram/hdr_histogram.h:struct hdr_histogram
critical Contract unguarded

The zcalloc_num function expects num * size to not overflow size_t, but callers pass arbitrary histogram dimensions without checking the multiplication

If this fails: Integer overflow in size calculation causes undersized memory allocation, leading to buffer overruns when the histogram writes beyond allocated bounds

deps/hdr_histogram/hdr_redis_malloc.h:zcalloc_num
critical Domain unguarded

64-bit atomic loads assume the underlying platform supports naturally aligned 64-bit reads, but on 32-bit systems or with packed structures, reads may not be atomic

If this fails: Torn reads of 64-bit values in multi-threaded environments produce inconsistent histogram state, causing race conditions and incorrect measurement results

deps/hdr_histogram/hdr_atomic.h:hdr_atomic_load_64
warning Environment unguarded

The test runner assumes fork() is available and behaves like Unix fork, but on some platforms (Windows, embedded systems) process creation works differently

If this fails: Test parallelization fails silently or hangs when fork() is unavailable, causing tests to run serially or not at all without clear error messages

deps/gtest-parallel/gtest_parallel.py:multiprocessing
warning Temporal unguarded

System time is monotonic and stable across test execution, but system clock adjustments (NTP, manual changes) can cause time to go backwards

If this fails: Negative test durations break timing calculations and cause test timeout logic to malfunction, potentially killing tests prematurely or never timing out

deps/gtest-parallel/gtest_parallel.py:time.time
warning Resource unguarded

File descriptor limits allow creating multiple concurrent subprocesses, but the parallel runner doesn't check ulimit -n before spawning workers

If this fails: Too many parallel tests exhaust file descriptors, causing subprocess creation to fail with cryptic errors rather than graceful degradation

deps/gtest-parallel/gtest_parallel.py:subprocess.Popen
warning Ordering unguarded

Test execution order doesn't matter for correctness, but some tests may have hidden dependencies on execution sequence or shared global state

If this fails: Tests pass when run sequentially but fail unpredictably in parallel due to race conditions or state pollution between test cases

deps/gtest-parallel/gtest_parallel_tests.py:threading
warning Shape unguarded

Double precision floating point values follow IEEE 754 format with specific bit layouts for mantissa and exponent, but some platforms use different floating point representations

If this fails: Float-to-string conversion produces wrong results on non-IEEE platforms, causing data serialization corruption and protocol violations

deps/fpconv/fpconv_dtoa.c:Grisu algorithm
info Domain unguarded

Pre-computed power-of-10 tables are accurate for the target platform's floating point precision, but different compiler optimizations or math libraries may introduce slight variations

If this fails: Accumulated rounding errors in float conversion cause string representations to differ slightly between platforms, breaking cross-platform data compatibility

deps/fpconv/fpconv_powers.h:power table

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

Main keyspace (in-memory)
Hash table storing all key-value pairs in memory with O(1) average access time
AOF file (file-store)
Append-only log file that records write commands for durability and crash recovery
RDB snapshots (checkpoint)
Point-in-time binary snapshots of the entire database for backup and replication
Replication backlog (buffer)
Circular buffer of recent commands for partial resynchronization of lagging replicas

Feedback Loops

Delays

Control Points

Technology Stack

jemalloc (library)
Memory allocator optimized for multi-threaded applications with reduced fragmentation
linenoise (library)
Command-line editing library providing history and completion for the CLI client
Lua (runtime)
Embedded scripting engine for server-side computation and atomic multi-command operations
hdr_histogram (library)
High dynamic range histogram for latency measurement and performance monitoring
CMake (build)
Cross-platform build system that handles compilation, linking, and test execution
libvalkey (sdk)
Client library providing C API for connecting to and interacting with Valkey servers

Key Components

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Repository Repositories

Frequently Asked Questions

What is valkey used for?

Stores and retrieves key-value pairs with low-latency access patterns and data structure operations valkey-io/valkey is a 10-component repository written in C. Data flows through 7 distinct pipeline stages. The codebase contains 868 files.

How is valkey architected?

valkey is organized into 7 architecture layers: Network Layer, Command Processing, Data Structures, Memory Management, and 3 more. Data flows through 7 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through valkey?

Data moves through 7 stages: Accept client connection → Parse Redis protocol → Dispatch command → Execute data operation → Log to AOF → .... Client connections send Redis protocol commands over TCP, which are parsed into command arguments, validated, and dispatched to data structure operations. The operations modify the in-memory database and generate responses back to clients. Write operations are logged to AOF files and propagated to replicas, while periodic RDB snapshots provide point-in-time backups. This pipeline design reflects a complex multi-stage processing system.

What technologies does valkey use?

The core stack includes jemalloc (Memory allocator optimized for multi-threaded applications with reduced fragmentation), linenoise (Command-line editing library providing history and completion for the CLI client), Lua (Embedded scripting engine for server-side computation and atomic multi-command operations), hdr_histogram (High dynamic range histogram for latency measurement and performance monitoring), CMake (Cross-platform build system that handles compilation, linking, and test execution), libvalkey (Client library providing C API for connecting to and interacting with Valkey servers). A focused set of dependencies that keeps the build manageable.

What system dynamics does valkey have?

valkey exhibits 4 data pools (Main keyspace, AOF file), 4 feedback loops, 4 control points, 3 delays. The feedback loops handle recursive and circuit-breaker. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does valkey use?

5 design patterns detected: Event-driven architecture, Copy-on-write persistence, Incremental rehashing, Plugin architecture, Command table dispatch.

Analyzed on April 20, 2026 by CodeSea. Written by .