Ardan Ultimate AI #32 — Embedded React chat over RAG (Go backend + bundled UI)

Field notes from working through example 32 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

End-to-end chat app: React UI + Go HTTP server + RAG pipeline + local LLM. The React assets are embedded via embed.FS and served from the same Go binary. One process; one deployment artefact; full-stack demo.

What it looks like

import "embed"

//go:embed all:web/dist
var webFS embed.FS

func main() {
    mux := http.NewServeMux()
    mux.HandleFunc("POST /chat", handleChat)            // RAG endpoint
    mux.Handle("/", http.FileServer(http.FS(webFS)))    // bundled SPA
    http.ListenAndServe(":8080", mux)
}

What I learned

embed.FS for the whole frontend is underrated. The deploy story for “Go API + React frontend” usually involves S3, CloudFront, CORS configuration. Embedding the SPA into the Go binary collapses all of that to one container image, one health check, one log stream.

SSE for chat beats WebSockets when you can use it. No reverse proxy quirks, no connection upgrade dance, works through every CDN.

Production connection

The Genie UI uses exactly this pattern — React assets in pkg/web/handlers/ui/, served from the same Go process that exposes the API. The single-binary deploy is one of the reasons Genie’s docker image is under 30 MB. Saw it work in Ardan’s example before I knew what to call it.

Credit & reference. This post is field notes on example 32 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example32-chat-web-service/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.