| 3 min read

How to Build a Hybrid Search System with pgvector

pgvector hybrid search PostgreSQL vector search RAG

Why Hybrid Search

Pure vector search is great at finding semantically similar content, but it sometimes misses exact keyword matches that matter. Pure keyword search finds exact matches but misses conceptual relevance. Hybrid search combines both approaches, giving you the best of both worlds.

I use hybrid search in several AI applications, from RAG (retrieval-augmented generation) systems to content recommendation engines. The entire system runs in PostgreSQL with pgvector, which means no additional vector database service to manage.

Setting Up pgvector

If you are using Supabase, pgvector is already available. Just enable the extension:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create an index for fast vector search
CREATE INDEX ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Generating Embeddings

I use OpenAI's embedding model for generating vectors. The embeddings are stored alongside the text content:

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

async def store_document(title: str, content: str):
    embedding = get_embedding(content)
    await supabase.table('documents').insert({
        'title': title,
        'content': content,
        'embedding': embedding
    }).execute()

The Hybrid Search Query

The core of hybrid search is a SQL query that combines vector similarity with full-text search. I use a weighted combination where you can tune the balance between semantic and keyword relevance:

CREATE OR REPLACE FUNCTION hybrid_search(
  query_text TEXT,
  query_embedding vector(1536),
  match_count INT DEFAULT 10,
  keyword_weight FLOAT DEFAULT 0.3,
  semantic_weight FLOAT DEFAULT 0.7
)
RETURNS TABLE (
  id UUID,
  title TEXT,
  content TEXT,
  similarity FLOAT,
  keyword_rank FLOAT,
  combined_score FLOAT
)
LANGUAGE plpgsql AS $$
BEGIN
  RETURN QUERY
  WITH semantic AS (
    SELECT 
      d.id,
      d.title,
      d.content,
      1 - (d.embedding <=> query_embedding) AS similarity
    FROM documents d
    ORDER BY d.embedding <=> query_embedding
    LIMIT match_count * 2
  ),
  keyword AS (
    SELECT 
      d.id,
      ts_rank(
        to_tsvector('english', d.content),
        plainto_tsquery('english', query_text)
      ) AS rank
    FROM documents d
    WHERE to_tsvector('english', d.content) @@ 
          plainto_tsquery('english', query_text)
  )
  SELECT 
    s.id,
    s.title,
    s.content,
    s.similarity,
    COALESCE(k.rank, 0) AS keyword_rank,
    (s.similarity * semantic_weight + 
     COALESCE(k.rank, 0) * keyword_weight) AS combined_score
  FROM semantic s
  LEFT JOIN keyword k ON s.id = k.id
  ORDER BY combined_score DESC
  LIMIT match_count;
END;
$$;

Calling Hybrid Search from Python

async def search(query: str, limit: int = 10) -> list[dict]:
    # Generate embedding for the query
    query_embedding = get_embedding(query)
    
    # Call the hybrid search function
    result = await supabase.rpc('hybrid_search', {
        'query_text': query,
        'query_embedding': query_embedding,
        'match_count': limit,
        'keyword_weight': 0.3,
        'semantic_weight': 0.7
    }).execute()
    
    return result.data

Tuning the Weights

The balance between keyword and semantic weight depends on your use case:

  • Technical documentation: Higher keyword weight (0.5/0.5). Users often search for specific function names, error codes, or technical terms
  • General content: Higher semantic weight (0.3/0.7). Users describe what they want conceptually
  • Product search: Balanced (0.4/0.6). Product names need exact matching, but descriptions benefit from semantic understanding

Performance Optimization

Hybrid search hits the database harder than either approach alone. Here are the optimizations I use:

  • IVFFlat index: Speeds up vector similarity search dramatically. Set the lists parameter to approximately the square root of your row count
  • GIN index for full-text search: Essential for keyword performance on large tables
  • Limit the vector search pool: Search the top N*2 vector results, then apply keyword scoring. This avoids full table scans
  • Caching: Cache embeddings for common queries to avoid repeated API calls
-- GIN index for full-text search
CREATE INDEX idx_documents_fts 
ON documents 
USING gin(to_tsvector('english', content));

Using Hybrid Search in RAG

The primary use case for hybrid search in my applications is RAG. When a user asks a question, I search for relevant documents using hybrid search, then pass those documents as context to the LLM:

async def answer_with_context(question: str) -> str:
    # Retrieve relevant documents
    docs = await search(question, limit=5)
    
    # Build context from retrieved documents
    context = "\n\n".join([d['content'] for d in docs])
    
    # Generate answer with context
    response = await claude.messages.create(
        model="claude-sonnet-4-20250514",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    )
    return response.content[0].text
Hybrid search is the secret weapon of production RAG systems. Pure vector search misses important keyword matches. Pure keyword search misses semantic relevance. Combine them and your retrieval quality jumps significantly.

Getting Started

If you are already using Supabase, you are five minutes away from hybrid search. Enable pgvector, add an embedding column to your table, create the search function, and start experimenting. The improvement in retrieval quality over pure vector search is immediately noticeable.