PostgreSQL Performance Tuning for Vector Search Workloads
Vector Search Performance Matters
When you add vector search to your AI application with pgvector, query performance becomes critical. A naive setup works fine with a few thousand vectors, but as your dataset grows to hundreds of thousands or millions, you need to tune PostgreSQL specifically for vector workloads. Here is what I learned optimizing my production setup.
Understanding pgvector Index Types
pgvector supports two index types, and choosing the right one is the single biggest performance decision:
IVFFlat
IVFFlat (Inverted File with Flat compression) divides vectors into clusters and searches only the most relevant clusters. It is faster to build but less accurate:
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);The lists parameter controls the number of clusters. The general recommendation is sqrt(row_count) for up to 1 million rows. For my dataset of 50,000 vectors, I use lists = 224 (roughly sqrt of 50,000).
HNSW
HNSW (Hierarchical Navigable Small World) builds a graph structure for approximate nearest neighbour search. It is slower to build but provides better recall:
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);Parameters:
m: Maximum number of connections per layer (default 16, higher values improve recall but use more memory)ef_construction: Size of the dynamic candidate list during construction (default 64, higher values improve recall but slow down index building)
For most AI applications, HNSW provides better query performance. I switched from IVFFlat to HNSW and saw 40% faster queries with better result quality.
PostgreSQL Memory Configuration
Vector operations are memory-intensive. Default PostgreSQL settings are far too conservative:
# postgresql.conf
# Shared buffers: 25% of available RAM
shared_buffers = 2GB
# Work memory: increase for vector operations
work_mem = 256MB
# Maintenance work memory: for index building
maintenance_work_mem = 1GB
# Effective cache size: 75% of available RAM
effective_cache_size = 6GBThe work_mem setting is especially important for vector queries. Each vector search operation sorts and compares potentially thousands of high-dimensional vectors. The default 4 MB work_mem forces PostgreSQL to spill to disk, which devastates performance.
A Warning About work_mem
The work_mem setting is per-operation, not per-connection. A complex query with multiple sorts can use several multiples of this value. On a server handling many concurrent connections, setting it too high can exhaust memory. I set it to 256 MB for my dedicated AI database server, but you might need less on shared infrastructure.
Query Optimization
Limit Your Results
Always use LIMIT with vector searches. Without it, PostgreSQL sorts the entire table by distance:
-- Good: returns quickly with index
SELECT content, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;
-- Bad: scans entire table
SELECT content, embedding <=> $1 AS distance
FROM documents
ORDER BY embedding <=> $1;Pre-filter Before Vector Search
If you have metadata filters, apply them before the vector search:
-- Create a partial index for a common filter
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WHERE category = 'technical';
-- Query uses the filtered index
SELECT content
FROM documents
WHERE category = 'technical'
ORDER BY embedding <=> $1
LIMIT 10;Partial indexes are dramatically faster when you always filter by a specific column. Instead of searching all vectors, PostgreSQL only searches the subset matching the filter.
Use EXPLAIN ANALYZE
Always check your query plans:
EXPLAIN ANALYZE
SELECT content
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;Look for index scans rather than sequential scans. If PostgreSQL is not using your index, check that the operator matches the index type (cosine distance requires vector_cosine_ops).
Dimension Considerations
OpenAI's text-embedding-3-small produces 1536-dimensional vectors. Each vector takes about 6 KB of storage. At 100,000 documents, that is 600 MB just for the embedding column, plus index overhead.
If you can use lower-dimensional embeddings, the performance gains are significant. OpenAI's API supports dimension reduction:
# Request 512-dimensional embeddings instead of 1536
response = client.embeddings.create(
model="text-embedding-3-small",
input="Your text here",
dimensions=512
)512 dimensions use one-third the storage and compute of 1536 dimensions, with modest accuracy reduction for most use cases.
Connection Pooling
Vector queries can be slow (10-100ms each). If you have many concurrent users, connection pooling prevents exhausting PostgreSQL connections:
# PgBouncer configuration
[databases]
ai_db = host=127.0.0.1 port=5432 dbname=ai_pipeline
[pgbouncer]
listen_port = 6432
pool_mode = transaction
max_client_conn = 200
default_pool_size = 20Transaction-mode pooling is ideal for web applications where connections are short-lived.
Monitoring Queries
Enable the pg_stat_statements extension to track query performance over time:
CREATE EXTENSION pg_stat_statements;
-- Find the slowest queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;This has helped me identify poorly performing vector queries that needed index adjustments or query rewrites. Performance tuning is an ongoing process, not a one-time setup.