How to Build a Hybrid Search System with pgvector
Why Hybrid Search
Pure vector search is great at finding semantically similar content, but it sometimes misses exact keyword matches that matter. Pure keyword search finds exact matches but misses conceptual relevance. Hybrid search combines both approaches, giving you the best of both worlds.
I use hybrid search in several AI applications, from RAG (retrieval-augmented generation) systems to content recommendation engines. The entire system runs in PostgreSQL with pgvector, which means no additional vector database service to manage.
Setting Up pgvector
If you are using Supabase, pgvector is already available. Just enable the extension:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create an index for fast vector search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Generating Embeddings
I use OpenAI's embedding model for generating vectors. The embeddings are stored alongside the text content:
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
async def store_document(title: str, content: str):
embedding = get_embedding(content)
await supabase.table('documents').insert({
'title': title,
'content': content,
'embedding': embedding
}).execute()
The Hybrid Search Query
The core of hybrid search is a SQL query that combines vector similarity with full-text search. I use a weighted combination where you can tune the balance between semantic and keyword relevance:
CREATE OR REPLACE FUNCTION hybrid_search(
query_text TEXT,
query_embedding vector(1536),
match_count INT DEFAULT 10,
keyword_weight FLOAT DEFAULT 0.3,
semantic_weight FLOAT DEFAULT 0.7
)
RETURNS TABLE (
id UUID,
title TEXT,
content TEXT,
similarity FLOAT,
keyword_rank FLOAT,
combined_score FLOAT
)
LANGUAGE plpgsql AS $$
BEGIN
RETURN QUERY
WITH semantic AS (
SELECT
d.id,
d.title,
d.content,
1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
ORDER BY d.embedding <=> query_embedding
LIMIT match_count * 2
),
keyword AS (
SELECT
d.id,
ts_rank(
to_tsvector('english', d.content),
plainto_tsquery('english', query_text)
) AS rank
FROM documents d
WHERE to_tsvector('english', d.content) @@
plainto_tsquery('english', query_text)
)
SELECT
s.id,
s.title,
s.content,
s.similarity,
COALESCE(k.rank, 0) AS keyword_rank,
(s.similarity * semantic_weight +
COALESCE(k.rank, 0) * keyword_weight) AS combined_score
FROM semantic s
LEFT JOIN keyword k ON s.id = k.id
ORDER BY combined_score DESC
LIMIT match_count;
END;
$$;
Calling Hybrid Search from Python
async def search(query: str, limit: int = 10) -> list[dict]:
# Generate embedding for the query
query_embedding = get_embedding(query)
# Call the hybrid search function
result = await supabase.rpc('hybrid_search', {
'query_text': query,
'query_embedding': query_embedding,
'match_count': limit,
'keyword_weight': 0.3,
'semantic_weight': 0.7
}).execute()
return result.data
Tuning the Weights
The balance between keyword and semantic weight depends on your use case:
- Technical documentation: Higher keyword weight (0.5/0.5). Users often search for specific function names, error codes, or technical terms
- General content: Higher semantic weight (0.3/0.7). Users describe what they want conceptually
- Product search: Balanced (0.4/0.6). Product names need exact matching, but descriptions benefit from semantic understanding
Performance Optimization
Hybrid search hits the database harder than either approach alone. Here are the optimizations I use:
- IVFFlat index: Speeds up vector similarity search dramatically. Set the lists parameter to approximately the square root of your row count
- GIN index for full-text search: Essential for keyword performance on large tables
- Limit the vector search pool: Search the top N*2 vector results, then apply keyword scoring. This avoids full table scans
- Caching: Cache embeddings for common queries to avoid repeated API calls
-- GIN index for full-text search
CREATE INDEX idx_documents_fts
ON documents
USING gin(to_tsvector('english', content));
Using Hybrid Search in RAG
The primary use case for hybrid search in my applications is RAG. When a user asks a question, I search for relevant documents using hybrid search, then pass those documents as context to the LLM:
async def answer_with_context(question: str) -> str:
# Retrieve relevant documents
docs = await search(question, limit=5)
# Build context from retrieved documents
context = "\n\n".join([d['content'] for d in docs])
# Generate answer with context
response = await claude.messages.create(
model="claude-sonnet-4-20250514",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
return response.content[0].text
Hybrid search is the secret weapon of production RAG systems. Pure vector search misses important keyword matches. Pure keyword search misses semantic relevance. Combine them and your retrieval quality jumps significantly.
Getting Started
If you are already using Supabase, you are five minutes away from hybrid search. Enable pgvector, add an embedding column to your table, create the search function, and start experimenting. The improvement in retrieval quality over pure vector search is immediately noticeable.