Similarity Search in SQL

The three pgvector distance operators and what each one actually computes, how to pick the metric that matches your embedding model, writing KNN queries with ORDER BY ... LIMIT k, combining similarity search with WHERE filters, and returning real similarity scores your application can threshold on.

Day 3 Progress0%

The Three Distance Operators

pgvector adds three distance operators to Postgres, one per similarity metric. They look cryptic the first time you see them, but each is just a symbol for a math formula. Learn the three and every similarity query becomes readable.

What Each Operator Computes

OperatorNameReturnsSmaller means
<->L2 / Euclidean distanceThe straight-line distance between two vectorsMore similar
<#>Negative inner productThe dot product, negatedMore similar
<=>Cosine distance1 - cosine_similarityMore similar

The single most important fact about all three: smaller is more similar. That is deliberate. It means you always sort the same way — ORDER BY embedding <op> query ASC (and ASC is the SQL default). The nearest neighbor is always the first row.

Why <#> Is Negated

The plain inner (dot) product gets larger as two vectors point more in the same direction — the opposite of a distance, where smaller is closer. To keep "smaller = closer" consistent across all three operators, pgvector returns the negative inner product. So a strong match might come back as -0.93, and a weak one as -0.10. Sorting ascending still puts the best match first, because -0.93 < -0.10.

This trips up everyone once: the raw <#> value is not a "score" you can show a user. To recover the actual inner product, negate it back: (embedding <#> query) * -1.

A Worked Comparison

Imagine three stored vectors and one query, all 3-dimensional, and look at what each operator returns:

query = [1, 0, 0]
  a = [1, 0, 0]      identical direction & magnitude
  b = [2, 0, 0]      same direction, twice the magnitude
  c = [0, 1, 0]      orthogonal (90 degrees away)
  • L2 <->: a → 0, b → 1, c → 1.41. b is "far" because magnitude differs, even though the direction is identical.
  • Cosine <=>: a → 0, b → 0, c → 1. a and b tie at 0 because cosine ignores magnitude and only cares about angle.
  • Negative inner product <#>: a → -1, b → -2, c → 0. b wins (most negative) because the dot product rewards larger magnitude.

Same data, three different "nearest neighbors." That is why the operator you choose is a real decision, not a stylistic one — covered next.

The Symbols Are Just Operators

Because they are real SQL operators, you can use them anywhere an expression is allowed: in ORDER BY, in the SELECT list (to show the distance), in a WHERE clause (to threshold), and even in computed columns. They take a vector on each side and return a double precision.

Key Takeaways
  • pgvector has three operators: <-> (L2 / Euclidean), <#> (negative inner product), <=> (cosine distance)
  • For all three, smaller = more similar, so you always ORDER BY ... ASC and the nearest neighbor is the first row
  • <#> returns the NEGATED inner product to keep 'smaller is closer' consistent — negate it back before showing a score

AI Learning Assistant

Powered by advanced LLM

Get personalized help with concepts, code examples, and explanations tailored to your learning pace.

Course Stats

Estimated Time
55 min
Lessons
5 sections