· 2 min read · ← All posts
Ardan Labs Go RAG PDF Docling

Field notes from working through example 30 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

Docling extracts structured content from PDFs — text, tables, figures, headings — preserving layout. The example uses it as the parser stage of a RAG pipeline, then feeds the structured output through an LLM for summarisation / QA.

What it looks like

// Docling runs as a service (Python). The Go client hits its HTTP API.
client := docling.New("http://localhost:8081")
doc, _ := client.Extract(ctx, "annual-report.pdf")

for _, table := range doc.Tables {
    summary, _ := llm.Summarise(ctx, table.Markdown)
    storeChunk(doc.ID, table.PageNumber, summary)
}

What I learned

Tables are where most PDF pipelines fail. A bank statement, an invoice, a regulatory filing — the information density is in the tables. Naive text extraction loses the rows-and-columns relationship; LLM-based extraction is slow and unreliable. Docling preserves the structure.

Hybrid (parser + LLM) beats pure LLM for documents. Sending the raw PDF to a vision-capable LLM technically works. Costs 10× more in tokens. Docling’s symbolic extraction + LLM summarisation is the right split.

Production connection

The Genie XLSX + scanned-PDF loaders (pkg/loader/) follow the same pattern — symbolic extraction first, LLM as the narration layer. Tata client work also hit this — a quarterly report pipeline that switched from “send the PDF to Gemini” to “Docling first, summarise the structured output” cut token cost by 12× while improving table accuracy.


Credit & reference. This post is field notes on example 30 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example30-pdf-docling/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.

← Back to all posts