Ardan Ultimate AI #02 — LLM-generated embeddings

Field notes from working through example 02 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

An embedding model takes text and returns a fixed-length vector (commonly 768 or 1024 dimensions). Similar-meaning text produces similar vectors; cosine similarity between vectors is the semantic similarity between texts.

What it looks like

emb1, _ := embed.Generate(ctx, "the cat sat on the mat")
emb2, _ := embed.Generate(ctx, "a feline rested on the rug")
emb3, _ := embed.Generate(ctx, "stock prices rose 3%")

sim12 := cosineSim(emb1, emb2)  // ~0.85 — similar meaning
sim13 := cosineSim(emb1, emb3)  // ~0.20 — different meaning

What I learned

Embedding model choice matters more than you’d think. Different models produce different similarity scores for the same pairs. Pick one and stick with it; mixing embeddings from different models is worse than not embedding at all.

Embedding rotation is a silent break. If the team upgrades the embedding model, all existing vectors in pgvector become stale (they’re in the old model’s space; queries are in the new). Plan the re-embedding step before the upgrade.

Production connection

Every RAG pipeline in production uses embeddings. The model choice (OpenAI text-embedding-3-small vs Voyage vs Cohere vs local) is a quarterly conversation; the API call is one line. Most of the work is downstream.

Credit & reference. This post is field notes on example 02 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example02-embeddings/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.