· 2 min read · ← All posts
Ardan Labs Go RAG Ingestion pgvector

Field notes from working through example 07 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

Ingestion is the unglamorous prep work for RAG. Done in four steps:

  1. Parse the source (Markdown sections, code blocks, etc).
  2. Chunk into ~500-1000 token pieces with overlap.
  3. Generate embeddings for each chunk.
  4. Insert into pgvector with metadata (source path, section, page).

What it looks like

func ingest(path string) error {
    raw, err := os.ReadFile(path)
    if err != nil { return err }

    sections := parseMarkdown(raw)
    var chunks []Chunk
    for _, s := range sections {
        chunks = append(chunks, splitWithOverlap(s, 800, 100)...)
    }

    embs, _ := embed.BatchGenerate(chunkTexts(chunks))

    for i, c := range chunks {
        _, _ = db.Exec(ctx,
            `INSERT INTO chunks (source, section, content, embedding)
             VALUES ($1, $2, $3, $4)`,
            path, c.Section, c.Text, embs[i])
    }
    return nil
}

What I learned

Chunk overlap is non-negotiable for prose. A 1000-token chunk that ends mid-thought loses the second half of the thought. 10-15% overlap means the next chunk picks up where the previous left off.

Metadata is what makes citations meaningful. Just storing chunks gets you “the answer is X.” Storing chunks + section + page lets you cite “the answer is X (from section 3.4, page 47).” The latter is what users trust.

Production connection

Genie’s document ingestion pipeline is structurally identical. The chunking heuristic is the same — token-aware, with overlap, section-anchored. The Ardan example is the cleanest implementation in idiomatic Go I’ve found.


Credit & reference. This post is field notes on example 07 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example07-ingestion/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.

← Back to all posts