Name: Your First Vector Database
Availability: InStock

What Is a Vector Database (and What Isn't)

A vector database is a database optimized for one specific operation: given a query vector, return the K stored vectors most similar to it — fast, at scale, and ideally with filters.

That sentence is the whole topic. Everything else in this course is detail on how to do that well.

What Makes It Different from Postgres

Postgres can store vectors. The pgvector extension lets you write SELECT * FROM docs ORDER BY embedding <=> '[0.1, 0.2, ...]' LIMIT 10 today. So why a separate category of database?

The answer is the search. Without a specialized index, that query scans every row and computes distance — fine at 10,000 vectors, painful at 1 million, impossible at 100 million. Vector databases use approximate nearest neighbor (ANN) indexes (HNSW, IVF, ScaNN — covered in the Intermediate course) to answer the same query in single-digit milliseconds at million-scale.

The split:

	Traditional DB (Postgres, MySQL)	Vector DB
Primary operation	Filter and join exact values	Find similar vectors
Match type	Equality, range, pattern	Distance (cosine, Euclidean)
Scale of "fast"	Millions of rows with right indexes	Millions of vectors with ANN index
Storage primitive	Rows in tables	Vectors with metadata

What It Isn't

Three honest disclaimers worth hearing on Day 1:

Not a replacement for your primary database. You still need Postgres or its equivalent for users, billing, audit logs, anything transactional. The vector DB holds embeddings and just enough metadata to filter them — your application data stays where it is.

Not magic. A vector DB stores numbers and finds similar numbers. The "understanding" lives upstream, in the embedding model that turned text or images into those numbers. A great vector DB with a bad embedding model returns garbage; a mediocre vector DB with a great embedding model can still produce useful results.

Not always the right answer for "I want search". Vector search is great when meaning matters more than exact words. For "find docs mentioning error code E_AUTH_4012," keyword search (BM25, Elasticsearch) wins because the user knows exactly what they're looking for. Production systems often run both — covered in the Intermediate course as "hybrid search."

When You Actually Need One

The honest list of when reaching for a vector DB makes sense:

Semantic search — finding docs that mean the same thing as the query, not just contain the same words
Recommendations — "users who liked X also liked Y," built on similarity between user/item vectors
RAG (Retrieval-Augmented Generation) — finding relevant chunks to give an LLM as context (Day 5 of this course)
Deduplication and clustering — finding near-duplicates across millions of docs
Image and audio search — "find images that look like this one" using vision/audio embeddings

The signal you don't need one yet: under 10,000 vectors, brute-force scan in your application memory is faster than any database round-trip. Don't over-engineer.

Key Takeaways

A vector database does one specific thing well: find the K most similar vectors to a query, fast
It complements your primary database — you still need Postgres for transactional application data
It's not always the right answer — keyword search still wins when the user knows exactly what they're looking for

Your First Vector Database

What Is a Vector Database (and What Isn't)

What Is a Vector Database (and What Isn't)

What Makes It Different from Postgres

What It Isn't

When You Actually Need One

The API You'll Actually Use

Metadata: The Hidden 80% of the Job

The Vector DB Landscape

Common First-Project Failure Modes

AI Learning Assistant

Course Stats

Up Next