Vector Database Showdown: Qdrant vs. pgvector
Okay, we just got buy-in for our RAG project. Step one: we need to store 10 million document embeddings. The architect says 'vector database,' but when I Google that, I get 15 different options. Do we really need another database?
Ah, the classic trap! Here's the thing: not all 'vector databases' are created equal. Some are purpose-built race cars (fast, optimized, but you better know how to drive). Others are your trusty pickup truck with a souped-up engine (familiar, practical, but it has limits). And then there's the luxury sedan with autopilotâconvenient, but you're paying for it.
Translation: There's no 'best' option, just trade-offs. Great. So what ARE we actually choosing between?
Let's focus on the two that actually matter for most enterprises: Qdrant (the race car) and pgvector (the souped-up pickup). Qdrant is purpose-built for vector searchâthink billions of vectors, sub-100ms queries, all the bells and whistles. It's open-source, Rust-based, and just launched a Hybrid Cloud offering in 2024, so you can run it anywhere while keeping it managed. pgvector, on the other hand, is PostgreSQL with vector superpowers. If you already run Postgres, it's stupid simple to add. Just install an extension, create a VECTOR column, and boomâyou're doing similarity search.
Okay, but PostgreSQL wasn't built for this. How is it competitive with a purpose-built solution?
Fair skepticism. Historically, it wasn't. But in 2025, pgvector got a 150x speedup over the past year. They added HNSW indexing (same algorithm as Qdrant), adaptive index management, and 'iterative scan' mode for better recall when you filter results. At <5 million vectors, the performance difference is negligibleâp95 latency under 200ms. Plus, pgvector uses 2-10x less storage than specialized vector DBs because PostgreSQL is just damn efficient at data management.
So when do I actually need Qdrant over pgvector?
Scale and specialization. If you're pushing 10 million+ vectors, need sub-50ms latency at high query throughput, or want advanced features like Qdrant's BM42 hybrid search algorithm (combines dense semantic vectors with sparse keyword vectors in one query), Qdrant wins. It's also better if your team has no PostgreSQL expertiseâQdrant Cloud is stupid easy to spin up, and their Terraform/Kubernetes integrations are top-notch. But if you're a Postgres shop with <5M vectors and want to keep everything in one database (transactional data + vectors + full-text search), pgvector is the pragmatic choice.
What about MongoDB Atlas? I keep seeing ads for their vector search.
MongoDB Atlas is the luxury sedan. It's fully managed, has enterprise compliance baked in (SOC2, GDPR-ready), and you get hybrid search out-of-the-boxâvector similarity, full-text, geospatial, all in one query. If you're already on MongoDB and want zero DevOps overhead, it's solid. But you're paying for that convenience: it's proprietary (not open-source), cloud-locked (no real on-prem option), and less flexible than Qdrant for experimentation. Think of it as the 'safe corporate choice'âit won't blow anyone's mind, but it won't blow up in production either.
The "Million Dollar" Question
"Can't I just use my existing database and call it done?"
Technical Reality Check
Why Your Existing Database Probably Won't Work
1. Vectors aren't normal data. Storing a 1536-dimensional embedding isn't like storing JSON or a string. It's ~6KB per vector. At 10 million vectors, that's 60GB of raw vector dataâbefore indexing. Traditional databases don't optimize for this.
2. Similarity search â SQL queries. Finding 'the 10 most similar vectors' requires specialized indexing algorithms like HNSW or IVFFlat. A standard B-tree index does nothing for you here. Without the right index, you're doing a full table scan comparing every vector to your query vector. That's computational suicide at scale.
3. The hidden cost: Indexing. HNSW indexes for 10M vectors can take 10-50GB of RAM. If your database wasn't designed for this, it'll choke. Qdrant and pgvector preallocate and manage this memory intelligently. Your standard MySQL or MongoDB (without vector extensions)? Good luck.
Alright, I'm convinced we need a real solution. Give me the decision framework.
Here's the cheat sheet:
Choose Qdrant if:
- You have >10M vectors or plan to scale there
- Latency <50ms is critical
- You need advanced features (hybrid search, multimodal embeddings)
- Your team is comfortable with microservices/Kubernetes
- Open-source + vendor flexibility matters
Choose pgvector if:
- You already run PostgreSQL
- You're under 5M vectors
- You want one database for everything (transactional + vector + full-text)
- Your team knows SQL better than vector DB APIs
- You need atomic transactions between relational data and vectors
Choose MongoDB Atlas if:
- You're already on MongoDB
- You want zero ops overhead (fully managed)
- Compliance/security out-of-the-box is worth the premium
- You value stability over cutting-edge performance
What if I choose wrong?
You won't be stuck forever. Migrating vectors is annoying but not catastrophicâyou're basically re-embedding and re-indexing. The bigger lock-in is your query API. If you build your entire RAG pipeline around Qdrant's Python SDK, switching to pgvector means rewriting SQL queries. That's why I recommend wrapping your vector DB calls behind an abstraction layer from day oneâa simple interface that says 'search(query_vector, top_k=10)' and you can swap the backend later.
Final question: What do YOU use?
For clients with <2M vectors and existing Postgres infrastructure? pgvector, no question. For greenfield projects or scale-critical apps? Qdrant. For enterprises that demand 'proven, managed, compliant, boring'? MongoDB Atlas. There's no universal winnerâjust the right tool for your constraints.
Technical Reality Check
The Things Nobody Tells You
1. Vector databases don't solve data quality. If your embeddings suck (bad chunking, wrong model, noisy data), even Qdrant at 10ms latency won't save you. Garbage in, garbage outâjust faster.
2. Operational complexity is real. Qdrant self-hosted means you're managing Kubernetes, monitoring Prometheus metrics, tuning shard configurations. pgvector means you're scaling Postgres (vacuuming, reindexing, backups). MongoDB Atlas means you're paying MongoDB to handle it, but you lose visibility.
3. The hidden cost: Embedding generation. Everyone focuses on vector storage. But generating those embeddings? That's where your money goes. OpenAI charges ~$0.10 per 1M tokens. Embedding 10M documents can cost thousands. And you'll re-embed when you upgrade models or change chunking strategies. Budget for it.
Bottom line: Pick based on your team's strengths, not the vendor's marketing. If your team rocks at Postgres, use pgvector. If you have a dedicated platform team that loves new tech, try Qdrant. If you have budget and need sleep at night, pay for Atlas.