Vector Databases — Advanced
Distributed vector search, embedding fine-tuning, multi-vector retrieval, production evaluation, and a capstone production design.
- 1
Distributed Vector Search at Scale
What you build when one machine isn't enough. Sharding strategies, cross-shard top-K merging (and the silent recall bug that ships in every junior implementation), replication, consistency models, and how Pinecone, Weaviate, Qdrant, and Milvus actually compose these primitives.
48 minShardingReplicationConsistency - 2
Embedding Fine-Tuning: From Off-the-Shelf to Domain-Specialized
When off-the-shelf embedders leave double-digit recall on the table for your domain. Contrastive learning fundamentals, hard negative mining (the most important step), LLM-generated training data at scale, and the deployment discipline that prevents catastrophic forgetting.
50 minContrastive LearningHard NegativesDomain Adaptation - 3
Multi-Vector Retrieval: ColBERT and Late Interaction
One vector per document is a compromise. Each token in your doc gets its own vector, and the MaxSim operator scores documents by per-token matching. ColBERT, PLAID compression, production deployment patterns, and ColPali's extension to document images.
50 minColBERTLate InteractionPLAID - 4
Production Evaluation Systems
The operational scaffolding that makes production retrieval stay good over years. Continuous evaluation infrastructure, per-segment gold sets, online metric pipelines, A/B testing with shadow traffic, and drift detection.
50 minContinuous EvaluationShadow TrafficDrift Detection - 5
Capstone: Production System Design
The synthesis. Two worked case studies (small customer-support bot vs large legal research platform) walking the five-step design process end-to-end. Capacity planning, cost modeling, launch playbook. The final day of the Vector DB course family.
55 minSystem DesignCapacity PlanningLaunch Playbook