Running pgvector in production on Amazon Aurora PostgreSQL
Database Blog
This article provides comprehensive operational guidance for running pgvector in production on Amazon Aurora PostgreSQL, covering index selection, scaling strategies, memory management, and observability practices for RAG workloads.
- Choose HNSW indexes for most production RAG workloads; skip indexing for small datasets or partitioned schemas requiring 100% recall
- Use cosine distance (<=>) for text embeddings or inner product (<#>) for unit-normalized vectors like Amazon Titan embeddings
- Enable iterative scans with relaxed_order mode to fix overfiltering in queries combining WHERE clauses with vector search
- Implement two-stage retrieval with binary quantization for coarse candidate selection followed by cosine re-ranking
- Manage HNSW index churn through scheduled REINDEX CONCURRENTLY, partition-based rebuilds, or append-only patterns with compaction
- Size Aurora instances with memory-optimized r-series classes to keep HNSW graphs memory-resident and avoid performance degradation
- Monitor BufferCacheHitRatio, query-level statistics via aurora_stat_statements, and custom recall/latency metrics to detect index drift early
- Use Amazon RDS Proxy to manage connection pools and prevent work_mem exhaustion under concurrent vector query load
Successful pgvector production deployments require deliberate upfront decisions on index strategy, parameter tuning, capacity planning, and observability aligned with corpus size, write patterns, and recall targets.
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.