What a vector database actually is, the five operations every vendor exposes, why metadata is the hidden 80% of the job, how to pick between Pinecone, Weaviate, Qdrant, pgvector and Chroma, and the six failure modes that bite first-time users.
A vector database is a database optimized for one specific operation: given a query vector, return the K stored vectors most similar to it — fast, at scale, and ideally with filters.
That sentence is the whole topic. Everything else in this course is detail on how to do that well.
Postgres can store vectors. The pgvector extension lets you write SELECT * FROM docs ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10 today. So why a separate category of database?
The answer is the search. Without a specialized index, that query scans every row and computes distance — fine at 10,000 vectors, painful at 1 million, impossible at 100 million. Vector databases use approximate nearest neighbor (ANN) indexes (HNSW, IVF, ScaNN — covered in the Intermediate course) to answer the same query in single-digit milliseconds at million-scale.
The split:
| Traditional DB (Postgres, MySQL) | Vector DB | |
|---|---|---|
| Primary operation | Filter and join exact values | Find similar vectors |
| Match type | Equality, range, pattern | Distance (cosine, Euclidean) |
| Scale of "fast" | Millions of rows with right indexes | Millions of vectors with ANN index |
| Storage primitive | Rows in tables | Vectors with metadata |
Three honest disclaimers worth hearing on Day 1:
Not a replacement for your primary database. You still need Postgres or its equivalent for users, billing, audit logs, anything transactional. The vector DB holds embeddings and just enough metadata to filter them — your application data stays where it is.
Not magic. A vector DB stores numbers and finds similar numbers. The "understanding" lives upstream, in the embedding model that turned text or images into those numbers. A great vector DB with a bad embedding model returns garbage; a mediocre vector DB with a great embedding model can still produce useful results.
Not always the right answer for "I want search". Vector search is great when meaning matters more than exact words. For "find docs mentioning error code E_AUTH_4012," keyword search (BM25, Elasticsearch) wins because the user knows exactly what they're looking for. Production systems often run both — covered in the Intermediate course as "hybrid search."
The honest list of when reaching for a vector DB makes sense:
The signal you don't need one yet: under 10,000 vectors, brute-force scan in your application memory is faster than any database round-trip. Don't over-engineer.