milvus-io/milvus
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Distributes vector similarity search across multiple nodes with real-time indexing
Vector data flows from client through proxy validation to DataNode for ingestion and index building, gets stored as segments in object storage, then loaded by QueryNodes for ANN search execution. Search requests follow the reverse path - proxy routes to QueryCoord for node assignment, QueryNodes execute parallel search on their segments, and results are merged before returning to client.
Under the hood, the system uses 3 feedback loops, 4 data pools, 3 control points to manage its runtime behavior.
A 8-component repository. 4309 files analyzed. Data flows through 6 distinct pipeline stages.
How Data Flows Through the System
Vector data flows from client through proxy validation to DataNode for ingestion and index building, gets stored as segments in object storage, then loaded by QueryNodes for ANN search execution. Search requests follow the reverse path - proxy routes to QueryCoord for node assignment, QueryNodes execute parallel search on their segments, and results are merged before returning to client.
- Client Request Validation — Proxy receives client requests, validates authentication using Casbin RBAC, checks collection existence against RootCoord metadata, and performs request format validation before routing [SearchRequest → Validated Request]
- Vector Data Ingestion — DataNode receives vector batches from streaming layer, validates schema compliance, builds growing segments with columnar storage, and constructs incremental indexes using configured index types [VectorData → Growing Segment]
- Segment Sealing and Indexing — When segments reach size threshold, DataCoord triggers sealing operation where DataNode builds final indexes (HNSW, IVF_FLAT, etc.), compresses data, and uploads to object storage [Growing Segment → Sealed Segment]
- Query Planning and Distribution — QueryCoordV2 receives search requests, determines which segments contain relevant data based on time ranges and bloom filters, and assigns query tasks to available QueryNodes [SearchRequest → Query Plan]
- Parallel ANN Search Execution — QueryNodes load assigned segment indexes into memory, execute approximate nearest neighbor search using FAISS/other engines, and return top-K candidates with distance scores [Query Plan → Search Results]
- Result Merging and Response — Proxy collects partial results from all QueryNodes, performs global top-K merge sorting by distance scores, applies output field projection, and returns final results to client [Search Results → Final Response]
Data Models
The data structures that flow between stages — the contracts that hold the system together.
internal/core/entity.Collection with Name: string, Schema: entity.Schema containing FieldSchema array with Name, DataType, Dimension for vectors, PrimaryKey flag, and IndexParams
Created via CreateCollection with schema definition, populated with vectors via Insert, queried via Search, and managed through lifecycle operations
client/column/column.Vector interface with FloatVector: [][]float32[dim], BinaryVector: [][]byte, or SparseFloatVector with indices and values, plus metadata fields
Ingested through Insert operations, transformed by DataNode into segments with indexes, and retrieved during Search queries
internal/proxy/milvuspb.SearchRequest with CollectionName: string, Vectors: [][]float32, TopK: int, SearchParams: key-value pairs for index parameters, OutputFields: []string
Created by client SDK, validated and routed by Proxy, executed across QueryNodes, and results merged before returning to client
internal/storage/Segment with ID: int64, vectors stored in columnar format, inverted indexes, bloom filters, and metadata including row count and time range
Created during data ingestion, sealed when size threshold reached, indexed by background compaction, and queried during search operations
examples/telemetry_demo/main.gotelemetryOperation with Operation: string, Count: int64, LatencyP99: float64, Database: string, Collection: string, plus client info and status
Collected during client operations, aggregated in heartbeat intervals, pushed to server via telemetry protocol for monitoring and analysis
Hidden Assumptions
Things this code relies on but never validates. These are the things that cause silent failures when the system changes.
Vector dimension is fixed at 128 for all collections and operations, with numEntities hardcoded to 500
If this fails: Demo breaks silently if real data has different dimensions - vector operations will fail with cryptic FAISS errors or dimension mismatch panics
examples/telemetry_demo/main.go:dim
Milvus server is running on localhost:19530 with proxy HTTP server enabled on localhost:9091, and both are accepting connections
If this fails: Test hangs indefinitely on connection attempts if server is down, on different host, or behind firewall - no timeout or health check implemented
examples/telemetry_e2e_test/main.go:address
10 second heartbeat interval is sufficient for telemetry data collection and command propagation during test execution
If this fails: Test may pass/fail randomly if network latency exceeds heartbeat window - metrics could be lost or commands delayed beyond test timeout
examples/telemetry_e2e_test/main.go:HeartbeatInterval
HTTP telemetry API returns JSON with exact field names (client_id, client_info, metrics) and nested structure matching struct tags
If this fails: JSON unmarshaling fails silently with zero values if server changes field names, adds required fields, or changes nesting - leads to empty metrics display
examples/telemetry_demo/main.go:telemetryClientResponse
Atomic operations on global state (receivedPushConfig, lastPushConfigPayload) are sufficient for thread safety without additional synchronization
If this fails: Race conditions during concurrent command processing could corrupt telemetry state or lose commands - atomic.Value doesn't guarantee consistency between reads and writes
examples/telemetry_demo/main.go:receivedPushConfig
Collection categories array contains valid string values that match expected telemetry filtering dimensions
If this fails: Telemetry filtering breaks if categories contain special characters, are empty, or don't match server-side enum values - results in missing or incorrectly categorized metrics
examples/telemetry_demo/main.go:collections
Command handlers are registered before any telemetry operations occur, and handler registration completes synchronously
If this fails: Commands received during startup window are lost if handlers aren't ready - no queuing or replay mechanism for early commands
examples/telemetry_e2e_test/main.go:registerCommandHandlers
JSON config file fits in memory and contains reasonable number of message types (likely < 1000 entries)
If this fails: Code generator crashes with OOM if config file is huge - no streaming parser or size limits implemented
pkg/streaming/util/message/codegen/main.go:MessageReflectInfoTable
Command line arguments are well-formed strings without special characters, and file system supports creating files in current directory
If this fails: Tool crashes if args contain null bytes or current directory is read-only - no validation of arguments or output path permissions
cmd/tools/config/main.go:os.Args
Input JSON config exactly matches JSONConfig struct fields with correct types - no extra fields or missing required fields
If this fails: Code generation produces invalid Go code if JSON contains unexpected fields or wrong types - compilation fails downstream with unclear errors
pkg/streaming/util/message/codegen/main.go:JSONConfig
System Behavior
How the system operates at runtime — where data accumulates, what loops, what waits, and what controls what.
Data Pools
Distributed metadata store containing collection schemas, segment metadata, node assignments, and cluster configuration state
Persistent storage for sealed vector segments, index files, and logs using S3-compatible backends like MinIO
Event streaming system (Pulsar/Kafka) for coordinating data flow between components and ensuring ordered message delivery
In-memory segment and index cache for fast ANN search, with LRU eviction when memory pressure occurs
Feedback Loops
- Compaction Loop (auto-scale, balancing) — Trigger: DataCoord monitors segment sizes and row counts against configured thresholds. Action: Merges small segments or splits large ones, rebuilds indexes, and updates metadata in etcd. Exit: Segment sizes stabilize within optimal range for query performance.
- Load Balancing Loop (auto-scale, balancing) — Trigger: QueryCoordV2 detects uneven query load distribution across QueryNodes via metrics. Action: Migrates segment replicas between nodes, updates routing tables, and adjusts resource allocation. Exit: Query latency and throughput reach acceptable levels across all nodes.
- Telemetry Heartbeat (polling, reinforcing) — Trigger: Client telemetry timer expires based on configured heartbeat interval. Action: Aggregates operation metrics, pushes to server, and processes any received commands like configuration updates. Exit: Continuous loop until client disconnection.
Delays
- Index Building Delay (batch-window, ~varies by segment size and index type) — New data unavailable for search until indexing completes, affecting real-time search freshness
- Segment Loading Delay (warmup, ~proportional to segment size and network bandwidth) — QueryNode cannot serve requests for segments until indexes are loaded from object storage into memory
- etcd Consensus Delay (eventual-consistency, ~typically milliseconds for cluster agreement) — Metadata updates may not be immediately visible across all components, requiring retry logic
Control Points
- Segment Size Threshold (threshold) — Controls: When growing segments are sealed and indexed, affecting ingestion throughput vs search latency tradeoff. Default: configurable
- Index Type Selection (architecture-switch) — Controls: ANN algorithm used (HNSW, IVF_FLAT, IVF_PQ) determining search accuracy vs speed tradeoff. Default: per-collection configuration
- Telemetry Sampling Rate (sampling-strategy) — Controls: Percentage of operations that generate telemetry metrics, balancing observability with performance overhead. Default: 1.0 (100%)
Technology Stack
Primary language for distributed system logic, gRPC services, and client SDK implementation
Inter-service communication protocol providing type-safe, high-performance RPC between Milvus components
Distributed key-value store for cluster metadata, service discovery, and configuration management with strong consistency
Message queue for coordinating data flow between components with guaranteed message ordering and durability
Object storage backend for persisting vector segments, index files, and write-ahead logs with high availability
Vector similarity search library providing optimized ANN algorithms (HNSW, IVF) with GPU acceleration support
Key Components
- MilvusClient (gateway) — Primary client interface that manages gRPC connections to Milvus, provides high-level APIs for collection management, vector operations, and handles request serialization/response parsing
client/milvusclient/ - Proxy (gateway) — Request router that validates client requests, enforces authentication/authorization, coordinates with metadata services, and distributes operations to appropriate worker nodes
internal/proxy/ - RootCoord (orchestrator) — Cluster metadata manager that maintains collection schemas, handles DDL operations, manages global timestamps, and coordinates with other coordinators for schema changes
internal/rootcoord/ - DataCoord (orchestrator) — Data ingestion coordinator that manages segment allocation, triggers compaction when segments reach size limits, coordinates index building, and maintains data node assignments
internal/datacoord/ - QueryCoordV2 (scheduler) — Query execution coordinator that manages QueryNode resource allocation, distributes segment replicas for load balancing, handles failover, and optimizes query routing
internal/querycoordv2/ - DataNode (processor) — Vector data ingestion worker that receives vector batches, builds growing segments, constructs indexes using FAISS/other libraries, and persists sealed segments to object storage
internal/datanode/ - QueryNodeV2 (executor) — Search execution engine that loads segment indexes into memory, performs parallel ANN search across assigned segments, and returns top-K results with scores
internal/querynodev2/ - StreamingNode (processor) — Real-time data flow manager that handles streaming ingestion, maintains write-ahead logs, coordinates with message queues, and ensures data consistency during ingestion
internal/streamingnode/
Package Structure
Go client SDK for Milvus, providing high-level APIs for vector database operations including collections, indexes, search, and bulk operations.
Demonstration programs showing Milvus telemetry features including client metrics collection, command push, and multi-database scenarios.
End-to-end test for Milvus telemetry system, verifying command push and client metrics flow.
Shared package containing common utilities, configuration, streaming, message queues, storage adapters, and telemetry infrastructure used across Milvus components.
Test suite for the Go client SDK, providing test utilities and comprehensive validation of client operations.
Explore the interactive analysis
See the full architecture map, data flow, and code patterns visualization.
Analyze on CodeSeaRelated Repository Repositories
Frequently Asked Questions
What is milvus used for?
Distributes vector similarity search across multiple nodes with real-time indexing milvus-io/milvus is a 8-component repository written in Go. Data flows through 6 distinct pipeline stages. The codebase contains 4309 files.
How is milvus architected?
milvus is organized into 5 architecture layers: Client Layer, Proxy Layer, Coordinator Layer, Worker Node Layer, and 1 more. Data flows through 6 distinct pipeline stages. This layered structure keeps concerns separated and modules independent.
How does data flow through milvus?
Data moves through 6 stages: Client Request Validation → Vector Data Ingestion → Segment Sealing and Indexing → Query Planning and Distribution → Parallel ANN Search Execution → .... Vector data flows from client through proxy validation to DataNode for ingestion and index building, gets stored as segments in object storage, then loaded by QueryNodes for ANN search execution. Search requests follow the reverse path - proxy routes to QueryCoord for node assignment, QueryNodes execute parallel search on their segments, and results are merged before returning to client. This pipeline design reflects a complex multi-stage processing system.
What technologies does milvus use?
The core stack includes Go (Primary language for distributed system logic, gRPC services, and client SDK implementation), gRPC/Protocol Buffers (Inter-service communication protocol providing type-safe, high-performance RPC between Milvus components), etcd (Distributed key-value store for cluster metadata, service discovery, and configuration management with strong consistency), Apache Pulsar/Kafka (Message queue for coordinating data flow between components with guaranteed message ordering and durability), MinIO/S3 (Object storage backend for persisting vector segments, index files, and write-ahead logs with high availability), FAISS (Vector similarity search library providing optimized ANN algorithms (HNSW, IVF) with GPU acceleration support). A focused set of dependencies that keeps the build manageable.
What system dynamics does milvus have?
milvus exhibits 4 data pools (etcd, Object Storage), 3 feedback loops, 3 control points, 3 delays. The feedback loops handle auto-scale and auto-scale. These runtime behaviors shape how the system responds to load, failures, and configuration changes.
What design patterns does milvus use?
4 design patterns detected: Coordinator-Worker Pattern, Segment-Based Storage, Pluggable Backend Architecture, Streaming Data Pipeline.
Analyzed on April 20, 2026 by CodeSea. Written by Karolina Sarna.