How Vector Databases Are Changing Search in 2026
Last November, I rebuilt the search engine behind BirJob. Our PostgreSQL full-text search worked fine for exact matches — type "Python developer" and you'd get Python developer listings. But type "backend engineer who knows scripting" and you'd get nothing. Zero results. The semantic gap between what people search for and what job postings contain was costing us real engagement.
So I started exploring vector databases. Not as a hype-driven experiment, but as a pragmatic solution to a real search problem. What I found was an ecosystem that has matured dramatically since the early GPT-3 days — and a set of trade-offs that nobody seems to talk about honestly.
This article is my attempt at that honest conversation. We'll cover the major players (Pinecone, Weaviate, Chroma, pgvector), examine real use cases beyond the obvious chatbot retrieval pattern, and discuss when you should absolutely stick with SQL.
What Vector Databases Actually Do (And Don't Do)
Before we compare products, let's get the fundamentals right. A vector database stores high-dimensional vectors — numerical representations of data — and enables fast similarity search across them. When you embed a piece of text using a model like OpenAI's text-embedding-3-large or Cohere's embed-v3, you get a vector of 1536 or 1024 dimensions. The database then uses algorithms like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) to find the closest vectors to a query vector.
This is fundamentally different from traditional search. SQL databases use B-trees and inverted indexes for exact matching. Full-text search engines like Elasticsearch use TF-IDF and BM25 for keyword relevance. Vector search operates on semantic similarity — meaning two pieces of text can match even if they share zero keywords.
According to DB-Engines' Vector DBMS ranking, the category has seen a 340% increase in search interest since January 2024. But interest doesn't equal need. Let's talk about when you actually need one.
You Need a Vector Database When:
- Semantic search is a core product feature (job matching, document retrieval, recommendation engines)
- You're doing RAG (Retrieval-Augmented Generation) with more than 10,000 documents
- Your data requires multi-modal search (text + images + audio)
- You need real-time similarity matching at scale (fraud detection, content deduplication)
You Don't Need One When:
- Your search is primarily keyword-based and users know what they're looking for
- You have fewer than 50,000 vectors (pgvector on Postgres handles this trivially)
- Your bottleneck is data quality, not search algorithm quality
- You're adding AI to check a box rather than solve a user problem
The Big Four: Pinecone, Weaviate, Chroma, and pgvector
Let's get into the specifics. I've used all four in production or in serious prototyping, and each has a personality.
Pinecone
Pinecone is the managed, fully-hosted option. You don't run infrastructure — you get an API endpoint and you push vectors to it. Their serverless tier (launched in early 2024 and now mature) changed the pricing game significantly. According to Pinecone's pricing page, you can store up to 2 billion vectors on the serverless plan with pay-per-query pricing starting at $0.04 per million reads.
Strengths: Zero operational overhead, excellent documentation, hybrid search (sparse + dense vectors), namespaces for multi-tenancy. The cost model is transparent once you understand read/write units.
Weaknesses: Vendor lock-in is real. You can't export your index topology. Cold starts on serverless can add 200-500ms latency for infrequently accessed namespaces. And if Pinecone goes down, you go down — there's no local fallback.
Weaviate
Weaviate is the open-source heavyweight. Written in Go, it supports multiple vectorization modules (OpenAI, Cohere, HuggingFace, custom), has a GraphQL API, and can run on-premise or via their managed cloud. As documented in their official docs, Weaviate supports both HNSW and flat indexes, with built-in BM25 for hybrid search.
Strengths: Open source (BSD-3), multi-modal support (text, images, video), sophisticated filtering with vector search, built-in reranking. The schema system is opinionated but makes data modeling much cleaner than key-value approaches.
Weaknesses: Memory consumption is significant — HNSW indexes live in memory. Running a 10M vector index with 1536 dimensions requires roughly 60GB of RAM. The Go codebase means fewer community contributions compared to Python-native options. Upgrades between versions have historically been painful.
Chroma
Chroma positions itself as the "AI-native open-source embedding database." It's Python-first, developer-friendly, and designed for rapid prototyping. Their documentation emphasizes simplicity — you can get a working prototype in under 10 lines of code.
Strengths: Lowest barrier to entry, runs in-process (no separate server needed for development), excellent Python SDK, built-in embedding functions. Perfect for RAG prototypes and hackathons.
Weaknesses: Production readiness has been questioned. Their distributed mode (Chroma Cloud) is still relatively new as of early 2026. Performance degrades noticeably beyond 1M vectors in single-node mode. Metadata filtering is less sophisticated than Weaviate or Pinecone.
pgvector
pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. If you're already running Postgres (and statistically, you probably are), pgvector means you don't need a separate database at all. The GitHub repository now has over 12,000 stars and the extension ships with most managed Postgres providers (Supabase, Neon, RDS, Cloud SQL).
Strengths: No new infrastructure. ACID transactions across your relational data AND vectors. Familiar SQL syntax. Excellent for hybrid queries like "find semantically similar jobs posted in the last 7 days in Baku." The ivfflat and hnsw index types cover most use cases.
Weaknesses: Performance ceiling is lower than purpose-built vector databases. At 10M+ vectors, query latency can exceed 100ms even with well-tuned HNSW indexes. No built-in embedding generation — you need to compute embeddings externally. Scaling beyond a single node requires pgvector on Citus or similar distributed Postgres, which adds complexity.
Comparison Table
| Feature | Pinecone | Weaviate | Chroma | pgvector |
|---|---|---|---|---|
| Hosting | Managed only | Self-hosted or managed | Self-hosted or managed | Anywhere Postgres runs |
| License | Proprietary | BSD-3 | Apache 2.0 | PostgreSQL License |
| Max Dimensions | 20,000 | 65,535 | No hard limit | 16,000 (HNSW) / 2,000 (IVFFlat) |
| Hybrid Search | Yes (sparse+dense) | Yes (BM25+vector) | Limited | Manual (tsvector + vector) |
| Multi-tenancy | Namespaces | Multi-tenancy module | Collections | Schemas/Row-level security |
| Metadata Filtering | Excellent | Excellent | Good | Excellent (it's SQL) |
| Best For | Production SaaS, zero-ops | Complex multi-modal apps | Prototyping, small apps | Existing Postgres stacks |
| Approximate Cost (1M vectors) | ~$70/mo serverless | ~$150/mo (managed) or self-host | Free (self-host) | Free (existing Postgres) |
Real Use Cases Beyond RAG
The AI hype cycle has locked vector databases into a narrow narrative: they exist for RAG pipelines. While RAG is the most common use case, the technology is far more versatile.
1. Semantic Job Matching
At BirJob, we embed both job descriptions and candidate search queries into the same vector space. A query like "mühasibatlıq sahəsində iş" (accounting work) matches job postings that say "maliyyə mütəxəssisi" (financial specialist) — even though the keywords are completely different. This is particularly powerful in the Azerbaijani job market where job titles are inconsistent across companies.
2. Duplicate Detection at Scale
E-commerce platforms use vector similarity to detect duplicate product listings. Shopify Engineering has written about using embedding similarity to flag duplicate products across their marketplace. The threshold-based approach (cosine similarity > 0.95 = likely duplicate) is simpler and more robust than fuzzy string matching.
3. Anomaly Detection in Security
Security teams embed network request patterns and use vector distance to flag anomalies. If a user's behavior vector suddenly shifts far from their historical cluster, that's a signal worth investigating. CrowdStrike and similar vendors have integrated vector-based anomaly detection into their threat detection pipelines.
4. Content Recommendation Without User History
Cold-start recommendation is one of the hardest problems in ML. Vector databases offer a clean solution: embed the content, embed user preferences from onboarding, and find the nearest neighbors. No collaborative filtering needed. Spotify's engineering blog has documented their use of embedding-based discovery for new users.
5. Legal and Compliance Document Search
Law firms and compliance teams deal with massive document corpora. Vector search lets attorneys find relevant precedents based on case descriptions rather than keyword combinations. Harvey AI and similar legal-tech startups have built their core products on this pattern.
When SQL Is Still Better: An Opinionated Take
Here's where I'll be direct, because the vector database marketing machine won't be: for the majority of applications in 2026, PostgreSQL with pgvector is the right answer.
I've watched teams spend months integrating Pinecone or Weaviate when their actual dataset was 50,000 records. At that scale, pgvector handles similarity search in under 10ms. You don't need a separate database. You don't need a separate deployment pipeline. You don't need to worry about embedding synchronization between your primary database and your vector store.
The operational tax of a separate vector database is non-trivial:
- Data synchronization: Every insert, update, or delete in your primary database needs to be reflected in your vector database. This is a consistency problem that grows with scale.
- Embedding pipeline: You need a reliable system to generate embeddings for new data. If your embedding API goes down, your vector database falls behind.
- Cost monitoring: Vector database pricing is often per-query or per-vector, which can spiral unexpectedly. One poorly-optimized batch job can generate a $500 bill overnight.
- Debugging complexity: When search results are wrong, you now have three potential failure points: the embedding model, the vector database configuration, and the query construction. With SQL, it's just the query.
The rule I follow: start with pgvector. Move to a dedicated vector database only when you can articulate specific performance or feature requirements that pgvector can't meet. For most applications — including BirJob — pgvector is more than sufficient.
Performance Benchmarks: What the Marketing Pages Won't Show You
Every vector database vendor publishes benchmarks showing their product winning. Here's a more nuanced view based on the ANN Benchmarks project and my own testing:
| Scenario | Pinecone (p2) | Weaviate (HNSW) | pgvector (HNSW) |
|---|---|---|---|
| 100K vectors, 1536d, p95 latency | 8ms | 5ms | 12ms |
| 1M vectors, 1536d, p95 latency | 12ms | 15ms | 45ms |
| 10M vectors, 1536d, p95 latency | 18ms | 25ms | 120ms+ |
| Recall@10 (1M vectors) | 0.95 | 0.97 | 0.92 |
| Insert throughput (vectors/sec) | ~5,000 | ~3,000 | ~8,000 |
The key insight: at 1M vectors and below, the latency differences are negligible for most applications. Your network round trip to the database is likely 5-20ms anyway. The differences only become meaningful at scale, and "at scale" means tens of millions of vectors with sub-20ms latency requirements.
Building a Vector Search Stack: A Practical Action Plan
If you're convinced that vector search is right for your use case, here's how to implement it without over-engineering:
Phase 1: Prototype (Week 1-2)
- Choose an embedding model. Start with OpenAI's text-embedding-3-small (1536 dimensions, $0.02 per 1M tokens). If cost or privacy is a concern, use an open-source model like BGE-large or Nomic Embed.
- Use pgvector. Add the extension to your existing Postgres database. Create a vector column, insert a few thousand embeddings, and test search quality.
- Build a simple evaluation set. Create 50-100 query-result pairs that define "good" search results for your domain. Measure recall and precision against this set.
Phase 2: Optimize (Week 3-4)
- Add HNSW indexes. The default sequential scan won't scale. Create an HNSW index with appropriate
mandef_constructionparameters. - Implement hybrid search. Combine vector similarity with keyword matching (BM25 or PostgreSQL full-text search). A weighted combination of both usually outperforms either alone.
- Add metadata filtering. Use SQL WHERE clauses to pre-filter before vector search. This is where pgvector shines — you get the full power of SQL predicates.
Phase 3: Scale (When Needed)
- Monitor latency at p95 and p99. If queries consistently exceed your latency budget, consider a dedicated vector database.
- Evaluate Pinecone or Weaviate based on your specific needs: Pinecone for zero-ops, Weaviate for multi-modal or on-premise requirements.
- Build an abstraction layer. Use a library like LangChain or write a thin adapter so you can swap backends without rewriting application code.
The Embedding Model Matters More Than the Database
Here's the uncomfortable truth that vector database companies don't emphasize: the quality of your embeddings matters far more than your choice of database.
I've seen teams spend weeks evaluating Pinecone vs Weaviate while using default OpenAI embeddings without any fine-tuning. The database comparison is a rounding error compared to the impact of embedding quality. A mediocre embedding model on Pinecone will produce worse results than a fine-tuned model on pgvector.
According to the MTEB (Massive Text Embedding Benchmark) leaderboard, the gap between the best and median embedding models is over 15 percentage points on retrieval tasks. That's an enormous difference — far larger than the ~5% recall difference between vector databases.
Invest your time in:
- Choosing the right embedding model for your domain and language (multilingual models for non-English content)
- Fine-tuning embeddings on your specific data if off-the-shelf models underperform
- Chunking strategy for long documents — chunk size and overlap have a massive impact on retrieval quality
- Query expansion and reranking to improve precision after the initial vector retrieval
What's Coming Next
The vector database space is evolving rapidly. A few trends worth watching:
Convergence with relational databases: PostgreSQL, MySQL, and SQLite are all adding native vector support. Oracle announced vector search in 23ai. The "separate database" argument weakens every quarter as relational databases improve their vector capabilities.
Hardware acceleration: Custom silicon for vector operations is being developed by multiple startups. NVIDIA's GPU-accelerated vector search (via RAPIDS) is already production-ready. As hardware catches up, the software optimization advantages of purpose-built databases may diminish.
Multimodal by default: The next generation of embedding models (like the successors to CLIP and ImageBind) will natively handle text, images, audio, and video in a single embedding space. Databases that support multi-modal search out of the box will have an advantage.
Edge vector search: Running vector similarity on edge devices (phones, IoT) is becoming feasible with quantized models and lightweight databases like LanceDB. This opens up offline-first AI applications that don't need a cloud database.
Sources
- DB-Engines Vector DBMS Ranking
- Pinecone Pricing
- Weaviate Documentation
- Chroma Documentation
- pgvector GitHub Repository
- ANN Benchmarks
- MTEB Leaderboard
- LangChain
- LanceDB
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator scraping 80+ sources daily.
