Field notes from working through example 29 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.
What the example teaches
Whisper (or equivalent) transcribes a video file. The transcript is chunked by timestamp ranges. Each chunk is embedded and stored in pgvector. Chat queries retrieve the relevant chunks and the LLM cites timestamps in its answers.
What it looks like
segments := whisper.Transcribe(videoPath)
for _, s := range chunkByDuration(segments, 30*time.Second) {
emb := embed.Generate(s.Text)
db.InsertChunk(videoID, s.StartSec, s.EndSec, s.Text, emb)
}
// Query
hits := db.NearestNeighbours(embed.Generate(question), k=5)
answer := llm.Answer(question, hits) // prompt instructs LLM to cite timestamps
What I learned
Timestamp citations are the killer feature. “The CEO mentioned the layoffs at 14:23” is dramatically more useful than “the CEO mentioned the layoffs.” Users can verify. The chunk boundaries are what make this possible.
Chunk size matters more here than in text RAG. Too short and you lose context (a thought spanned 90 seconds; you split it across 3 chunks). Too long and retrieval gets fuzzy. 30-60 seconds is the sweet spot for conversational content; for lectures, 2-3 minutes works better.
Production connection
Kinetic India voice work touched adjacent ground — transcribing rider voice queries and dispatching to dialog handlers. Not exactly RAG, but the chunking + embedding decisions transferred. If I were to build “ask questions about your ride history” for a fleet operator, this example is the starting template.
Credit & reference. This post is field notes on example 29 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example29-video-transcription-rag/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.