milvus-io/milvus

Q: How is milvus architected?

milvus is organized into 5 architecture layers: Client Layer, Proxy Layer, Coordinator Layer, Worker Node Layer, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

Q: How does data flow through milvus?

Data moves through 6 stages: Client Request Validation → Vector Data Ingestion → Segment Sealing and Indexing → Query Planning and Distribution → Parallel ANN Search Execution → .... Vector data flows from client through proxy validation to DataNode for ingestion and index building, gets stored as segments in object storage, then loaded by QueryNodes for ANN search execution. Search requests follow the reverse path - proxy routes to QueryCoord for node assignment, QueryNodes execute parallel search on their segments, and results are merged before returning to client. This pipeline design reflects a complex multi-stage processing system.

Q: What technologies does milvus use?

The core stack includes Go (Primary language for distributed system logic, gRPC services, and client SDK implementation), gRPC/Protocol Buffers (Inter-service communication protocol providing type-safe, high-performance RPC between Milvus components), etcd (Distributed key-value store for cluster metadata, service discovery, and configuration management with strong consistency), Apache Pulsar/Kafka (Message queue for coordinating data flow between components with guaranteed message ordering and durability), MinIO/S3 (Object storage backend for persisting vector segments, index files, and write-ahead logs with high availability), FAISS (Vector similarity search library providing optimized ANN algorithms (HNSW, IVF) with GPU acceleration support). A focused set of dependencies that keeps the build manageable.

Q: What system dynamics does milvus have?

milvus exhibits 4 data pools (etcd, Object Storage), 3 feedback loops, 3 control points, 3 delays. The feedback loops handle auto-scale and auto-scale. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

Q: What design patterns does milvus use?

4 design patterns detected: Coordinator-Worker Pattern, Segment-Based Storage, Pluggable Backend Architecture, Streaming Data Pipeline.

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

43,872 stars Go 8 components

Distributes vector similarity search across multiple nodes with real-time indexing

Vector data flows from client through proxy validation to DataNode for ingestion and index building, gets stored as segments in object storage, then loaded by QueryNodes for ANN search execution. Search requests follow the reverse path - proxy routes to QueryCoord for node assignment, QueryNodes execute parallel search on their segments, and results are merged before returning to client.

Under the hood, the system uses 3 feedback loops, 4 data pools, 3 control points to manage its runtime behavior.

A 8-component repository. 4309 files analyzed. Data flows through 6 distinct pipeline stages.

How Data Flows Through the System

Client Request Validation — Proxy receives client requests, validates authentication using Casbin RBAC, checks collection existence against RootCoord metadata, and performs request format validation before routing [SearchRequest → Validated Request]
Vector Data Ingestion — DataNode receives vector batches from streaming layer, validates schema compliance, builds growing segments with columnar storage, and constructs incremental indexes using configured index types [VectorData → Growing Segment]
Segment Sealing and Indexing — When segments reach size threshold, DataCoord triggers sealing operation where DataNode builds final indexes (HNSW, IVF_FLAT, etc.), compresses data, and uploads to object storage [Growing Segment → Sealed Segment]
Query Planning and Distribution — QueryCoordV2 receives search requests, determines which segments contain relevant data based on time ranges and bloom filters, and assigns query tasks to available QueryNodes [SearchRequest → Query Plan]
Parallel ANN Search Execution — QueryNodes load assigned segment indexes into memory, execute approximate nearest neighbor search using FAISS/other engines, and return top-K candidates with distance scores [Query Plan → Search Results]
Result Merging and Response — Proxy collects partial results from all QueryNodes, performs global top-K merge sorting by distance scores, applies output field projection, and returns final results to client [Search Results → Final Response]

Data Models

The data structures that flow between stages — the contracts that hold the system together.

Collection internal/core/
entity.Collection with Name: string, Schema: entity.Schema containing FieldSchema array with Name, DataType, Dimension for vectors, PrimaryKey flag, and IndexParams
Created via CreateCollection with schema definition, populated with vectors via Insert, queried via Search, and managed through lifecycle operations

VectorData client/column/
column.Vector interface with FloatVector: [][]float32[dim], BinaryVector: [][]byte, or SparseFloatVector with indices and values, plus metadata fields
Ingested through Insert operations, transformed by DataNode into segments with indexes, and retrieved during Search queries

SearchRequest internal/proxy/
milvuspb.SearchRequest with CollectionName: string, Vectors: [][]float32, TopK: int, SearchParams: key-value pairs for index parameters, OutputFields: []string
Created by client SDK, validated and routed by Proxy, executed across QueryNodes, and results merged before returning to client

Segment internal/storage/
Segment with ID: int64, vectors stored in columnar format, inverted indexes, bloom filters, and metadata including row count and time range
Created during data ingestion, sealed when size threshold reached, indexed by background compaction, and queried during search operations

TelemetryMetrics examples/telemetry_demo/main.go
telemetryOperation with Operation: string, Count: int64, LatencyP99: float64, Database: string, Collection: string, plus client info and status
Collected during client operations, aggregated in heartbeat intervals, pushed to server via telemetry protocol for monitoring and analysis

Hidden Assumptions

Things this code relies on but never validates. These are the things that cause silent failures when the system changes.

critical Scale unguarded

Vector dimension is fixed at 128 for all collections and operations, with numEntities hardcoded to 500

If this fails: Demo breaks silently if real data has different dimensions - vector operations will fail with cryptic FAISS errors or dimension mismatch panics

examples/telemetry_demo/main.go:dim

critical Environment unguarded

Milvus server is running on localhost:19530 with proxy HTTP server enabled on localhost:9091, and both are accepting connections

If this fails: Test hangs indefinitely on connection attempts if server is down, on different host, or behind firewall - no timeout or health check implemented

examples/telemetry_e2e_test/main.go:address

critical Temporal unguarded

10 second heartbeat interval is sufficient for telemetry data collection and command propagation during test execution

If this fails: Test may pass/fail randomly if network latency exceeds heartbeat window - metrics could be lost or commands delayed beyond test timeout

examples/telemetry_e2e_test/main.go:HeartbeatInterval

critical Contract unguarded

HTTP telemetry API returns JSON with exact field names (client_id, client_info, metrics) and nested structure matching struct tags

If this fails: JSON unmarshaling fails silently with zero values if server changes field names, adds required fields, or changes nesting - leads to empty metrics display

examples/telemetry_demo/main.go:telemetryClientResponse

warning Resource weakly guarded

Atomic operations on global state (receivedPushConfig, lastPushConfigPayload) are sufficient for thread safety without additional synchronization

If this fails: Race conditions during concurrent command processing could corrupt telemetry state or lose commands - atomic.Value doesn't guarantee consistency between reads and writes

examples/telemetry_demo/main.go:receivedPushConfig

warning Domain unguarded

Collection categories array contains valid string values that match expected telemetry filtering dimensions

If this fails: Telemetry filtering breaks if categories contain special characters, are empty, or don't match server-side enum values - results in missing or incorrectly categorized metrics

examples/telemetry_demo/main.go:collections

warning Ordering unguarded

Command handlers are registered before any telemetry operations occur, and handler registration completes synchronously

If this fails: Commands received during startup window are lost if handlers aren't ready - no queuing or replay mechanism for early commands

examples/telemetry_e2e_test/main.go:registerCommandHandlers

warning Scale unguarded

JSON config file fits in memory and contains reasonable number of message types (likely < 1000 entries)

If this fails: Code generator crashes with OOM if config file is huge - no streaming parser or size limits implemented

pkg/streaming/util/message/codegen/main.go:MessageReflectInfoTable

warning Environment weakly guarded

Command line arguments are well-formed strings without special characters, and file system supports creating files in current directory

If this fails: Tool crashes if args contain null bytes or current directory is read-only - no validation of arguments or output path permissions

cmd/tools/config/main.go:os.Args

info Contract unguarded

Input JSON config exactly matches JSONConfig struct fields with correct types - no extra fields or missing required fields

If this fails: Code generation produces invalid Go code if JSON contains unexpected fields or wrong types - compilation fails downstream with unclear errors

pkg/streaming/util/message/codegen/main.go:JSONConfig

System Behavior

How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.

Data Pools

etcd (database)
Distributed metadata store containing collection schemas, segment metadata, node assignments, and cluster configuration state

Object Storage (file-store)
Persistent storage for sealed vector segments, index files, and logs using S3-compatible backends like MinIO

Message Queue (queue)
Event streaming system (Pulsar/Kafka) for coordinating data flow between components and ensuring ordered message delivery

QueryNode Memory (cache)
In-memory segment and index cache for fast ANN search, with LRU eviction when memory pressure occurs

Feedback Loops

Compaction Loop (auto-scale, balancing) — Trigger: DataCoord monitors segment sizes and row counts against configured thresholds. Action: Merges small segments or splits large ones, rebuilds indexes, and updates metadata in etcd. Exit: Segment sizes stabilize within optimal range for query performance.
Load Balancing Loop (auto-scale, balancing) — Trigger: QueryCoordV2 detects uneven query load distribution across QueryNodes via metrics. Action: Migrates segment replicas between nodes, updates routing tables, and adjusts resource allocation. Exit: Query latency and throughput reach acceptable levels across all nodes.
Telemetry Heartbeat (polling, reinforcing) — Trigger: Client telemetry timer expires based on configured heartbeat interval. Action: Aggregates operation metrics, pushes to server, and processes any received commands like configuration updates. Exit: Continuous loop until client disconnection.

Delays

Index Building Delay (batch-window, ~varies by segment size and index type) — New data unavailable for search until indexing completes, affecting real-time search freshness
Segment Loading Delay (warmup, ~proportional to segment size and network bandwidth) — QueryNode cannot serve requests for segments until indexes are loaded from object storage into memory
etcd Consensus Delay (eventual-consistency, ~typically milliseconds for cluster agreement) — Metadata updates may not be immediately visible across all components, requiring retry logic

Control Points

Segment Size Threshold (threshold) — Controls: When growing segments are sealed and indexed, affecting ingestion throughput vs search latency tradeoff. Default: configurable
Index Type Selection (architecture-switch) — Controls: ANN algorithm used (HNSW, IVF_FLAT, IVF_PQ) determining search accuracy vs speed tradeoff. Default: per-collection configuration
Telemetry Sampling Rate (sampling-strategy) — Controls: Percentage of operations that generate telemetry metrics, balancing observability with performance overhead. Default: 1.0 (100%)

Technology Stack

Go (runtime)
Primary language for distributed system logic, gRPC services, and client SDK implementation

gRPC/Protocol Buffers (framework)
Inter-service communication protocol providing type-safe, high-performance RPC between Milvus components

etcd (database)
Distributed key-value store for cluster metadata, service discovery, and configuration management with strong consistency

Apache Pulsar/Kafka (infra)
Message queue for coordinating data flow between components with guaranteed message ordering and durability

MinIO/S3 (database)
Object storage backend for persisting vector segments, index files, and write-ahead logs with high availability

FAISS (compute)
Vector similarity search library providing optimized ANN algorithms (HNSW, IVF) with GPU acceleration support

Key Components

MilvusClient (gateway) — Primary client interface that manages gRPC connections to Milvus, provides high-level APIs for collection management, vector operations, and handles request serialization/response parsing client/milvusclient/
Proxy (gateway) — Request router that validates client requests, enforces authentication/authorization, coordinates with metadata services, and distributes operations to appropriate worker nodes internal/proxy/
RootCoord (orchestrator) — Cluster metadata manager that maintains collection schemas, handles DDL operations, manages global timestamps, and coordinates with other coordinators for schema changes internal/rootcoord/
DataCoord (orchestrator) — Data ingestion coordinator that manages segment allocation, triggers compaction when segments reach size limits, coordinates index building, and maintains data node assignments internal/datacoord/
QueryCoordV2 (scheduler) — Query execution coordinator that manages QueryNode resource allocation, distributes segment replicas for load balancing, handles failover, and optimizes query routing internal/querycoordv2/
DataNode (processor) — Vector data ingestion worker that receives vector batches, builds growing segments, constructs indexes using FAISS/other libraries, and persists sealed segments to object storage internal/datanode/
QueryNodeV2 (executor) — Search execution engine that loads segment indexes into memory, performs parallel ANN search across assigned segments, and returns top-K results with scores internal/querynodev2/
StreamingNode (processor) — Real-time data flow manager that handles streaming ingestion, maintains write-ahead logs, coordinates with message queues, and ensures data consistency during ingestion internal/streamingnode/

Package Structure

client (library)
Go client SDK for Milvus, providing high-level APIs for vector database operations including collections, indexes, search, and bulk operations.

examples-telemetry_demo (app)
Demonstration programs showing Milvus telemetry features including client metrics collection, command push, and multi-database scenarios.

examples-telemetry_e2e_test (app)
End-to-end test for Milvus telemetry system, verifying command push and client metrics flow.

pkg (shared)
Shared package containing common utilities, configuration, streaming, message queues, storage adapters, and telemetry infrastructure used across Milvus components.

tests-go_client (tooling)
Test suite for the Go client SDK, providing test utilities and comprehensive validation of client operations.

Explore the interactive analysis

See the full architecture map, data flow, and code patterns visualization.

Analyze on CodeSea

Related Repository Repositories

Frequently Asked Questions

What is milvus used for?

Distributes vector similarity search across multiple nodes with real-time indexing milvus-io/milvus is a 8-component repository written in Go. Data flows through 6 distinct pipeline stages. The codebase contains 4309 files.

How is milvus architected?

milvus is organized into 5 architecture layers: Client Layer, Proxy Layer, Coordinator Layer, Worker Node Layer, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.

How does data flow through milvus?

Data moves through 6 stages: Client Request Validation → Vector Data Ingestion → Segment Sealing and Indexing → Query Planning and Distribution → Parallel ANN Search Execution → .... Vector data flows from client through proxy validation to DataNode for ingestion and index building, gets stored as segments in object storage, then loaded by QueryNodes for ANN search execution. Search requests follow the reverse path - proxy routes to QueryCoord for node assignment, QueryNodes execute parallel search on their segments, and results are merged before returning to client. This pipeline design reflects a complex multi-stage processing system.

What technologies does milvus use?

The core stack includes Go (Primary language for distributed system logic, gRPC services, and client SDK implementation), gRPC/Protocol Buffers (Inter-service communication protocol providing type-safe, high-performance RPC between Milvus components), etcd (Distributed key-value store for cluster metadata, service discovery, and configuration management with strong consistency), Apache Pulsar/Kafka (Message queue for coordinating data flow between components with guaranteed message ordering and durability), MinIO/S3 (Object storage backend for persisting vector segments, index files, and write-ahead logs with high availability), FAISS (Vector similarity search library providing optimized ANN algorithms (HNSW, IVF) with GPU acceleration support). A focused set of dependencies that keeps the build manageable.

What system dynamics does milvus have?

milvus exhibits 4 data pools (etcd, Object Storage), 3 feedback loops, 3 control points, 3 delays. The feedback loops handle auto-scale and auto-scale. These runtime behaviors shape how the system responds to load, failures, and configuration changes.

What design patterns does milvus use?

4 design patterns detected: Coordinator-Worker Pattern, Segment-Based Storage, Pluggable Backend Architecture, Streaming Data Pipeline.

Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.