Noureddine RAMDI / Milvus: a distributed vector database with Go orchestration and C++ search engine

Created Sat, 09 May 2026 11:42:26 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

milvus-io/milvus

Milvus tackles a real challenge in AI development: efficiently storing and searching high-dimensional vector data at scale. What sets Milvus apart is its architectural split — a Go-based system layer managing distributed orchestration and API access, paired with a C++ core vector search engine that supports every major indexing algorithm. This separation allows Milvus to maintain a fully distributed, Kubernetes-native deployment with independent scaling of query and data nodes, a key advantage for large-scale AI applications.

Architecture and core functionality of Milvus

Milvus is an open-source vector database designed specifically for AI workloads that rely on nearest neighbor search in high-dimensional spaces. Its core strength is handling dense vector embeddings generated by models like transformers, alongside sparse vectors used in hybrid semantic/BM25 search.

Under the hood, Milvus uses a hybrid technology stack: the system management layer is written in Go, while the core vector search engine is implemented in C++. This split is not accidental; it allows high-performance execution of complex indexing algorithms in C++, while the Go layer handles distributed coordination, metadata management, and exposes APIs.

The architecture features compute-storage separation. Query nodes handle search requests and computations, while data nodes manage storage and persistence. This separation means both can scale horizontally and independently depending on workload requirements — a common pattern in distributed databases but still not widespread in vector search engines.

Milvus supports multiple vector index types, including HNSW (Hierarchical Navigable Small World), IVF (Inverted File), DiskANN (disk-based approximate nearest neighbor), SCANN (Google’s scalable nearest neighbor), and CAGRA for GPU acceleration. The inclusion of GPU acceleration (CAGRA) and CPU-optimized indexes allows Milvus to perform well across different hardware setups.

Enterprise features include multi-tenancy, role-based access control (RBAC), TLS encryption for secure communication, and hot/cold storage tiering to optimize data lifecycle and cost management. These features make Milvus suitable for production environments where security and compliance matter.

Milvus also integrates with popular AI and machine learning frameworks such as LangChain, LlamaIndex, OpenAI, and HuggingFace, which helps developers build advanced semantic search, recommendation, and question-answering systems with vector search capabilities.

Technical strengths and design tradeoffs

What distinguishes Milvus is the clear-cut separation between the orchestration and search engine components. The Go system layer handles distributed cluster management, API routing, and scaling orchestration using Kubernetes-native patterns. This makes Milvus compatible with cloud-native deployment and container orchestration tools.

On the other hand, the C++ core focuses solely on the computationally intensive vector search algorithms. This division keeps the code modular and allows performance optimizations in C++ without compromising system-level control and flexibility.

This design comes with tradeoffs. Cross-language communication between Go and C++ introduces complexity and potential overhead. Debugging distributed systems with a split codebase is also more challenging than single-language implementations.

Despite these challenges, the codebase quality is robust, with clear modular boundaries and use of hardware acceleration where possible. The support for all major indexing types means users can pick indexes tailored to their data characteristics and latency requirements. For example, HNSW offers fast approximate search in memory, IVF is efficient for large datasets, and DiskANN supports disk-backed indexes for massive scale.

Hybrid search capabilities allow combining dense vector search with traditional sparse BM25 text search, which is valuable for many real-world semantic search applications where keyword relevance and semantic similarity must be balanced.

From a performance perspective, supporting GPU acceleration through CAGRA allows leveraging specialized hardware for workloads that benefit from parallelism, such as large-scale real-time search.

One limitation to note is that deploying and managing Milvus requires familiarity with distributed system concepts and Kubernetes. While the system is Kubernetes-native, this might be a barrier for smaller teams or less complex use cases.

Quickstart with Python SDK

Milvus provides a Python SDK called pymilvus that offers a straightforward way to interact with the database. Installing the client is simple:

$ pip install -U pymilvus

You can instantiate a client and connect to a Milvus server or use Milvus Lite for local testing:

from pymilvus import MilvusClient

# Connect to a local Milvus Lite database
client = MilvusClient("milvus_demo.db")

# Or connect to a deployed Milvus server or Zilliz Cloud
client = MilvusClient(
    uri="<endpoint_of_self_hosted_milvus_or_zilliz_cloud>",
    token="<username_and_password_or_zilliz_cloud_api_key>"
)

Creating a collection is straightforward, specifying the vector dimension:

client.create_collection(
    collection_name="demo_collection",
    dimension=768,  # typical embedding size
)

You can then insert data and perform vector searches:

# Insert data (assuming 'data' is a list of vectors)
res = client.insert(collection_name="demo_collection", data=data)

# Search using query vectors
query_vectors = embedding_fn.encode_queries(["Who is Alan Turing?", "What is AI?"])
res = client.search(
    collection_name="demo_collection",
    data=query_vectors,
    limit=2,  # top 2 results
    output_fields=["vector", "text", "subject"]
)

This quickstart focuses on Python, but Milvus also supports other languages and deployment modes, including self-hosted servers and managed cloud services.

verdict

Milvus is a mature, production-ready vector database built for large-scale AI applications. Its architecture, splitting Go-based system orchestration and C++-based vector search, is a practical design that balances performance, scalability, and feature richness.

It is particularly well-suited for teams operating in Kubernetes environments who need independent scaling of query and data nodes. The support for multiple index types and hybrid search makes it flexible for a variety of vector search use cases.

That said, the complexity of distributed deployment and the cross-language codebase might be overkill for smaller projects or those new to vector search.

If your application involves large-scale semantic search, recommendation, or AI-driven retrieval in production, Milvus warrants a serious look. The Python SDK and integrations with popular AI frameworks also ease adoption.

For simpler use cases or local prototyping, Milvus Lite offers an easy way to experiment with vector search without the overhead of distributed infrastructure.

In short, Milvus is a solid choice for teams who need a scalable, feature-rich vector database and are comfortable with Kubernetes and distributed systems.


→ GitHub Repo: milvus-io/milvus ⭐ 44,116 · Go