Field notes from working through example 04 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.
What the example teaches
A non-streaming chat handler returns the entire response when the LLM finishes. The user waits in silence for 10-30 seconds.
A streaming handler flushes each token as it arrives. The user sees the response forming in real time. Same total latency; dramatically better perceived experience.
What it looks like
func chatStream(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
flusher := w.(http.Flusher)
stream, _ := llm.GenerateStream(r.Context(), prompt)
for chunk := range stream {
fmt.Fprintf(w, "data: %s\n\n", chunk.Token)
flusher.Flush()
}
}
What I learned
http.Flusher is the load-bearing part. Without flusher.Flush(), Go buffers the output and the user gets the whole thing at the end anyway. Easy to forget; the chat feels broken when it happens.
SSE beats WebSockets for this use case. SSE goes through every CDN and proxy without configuration. WebSockets need upgrade-aware infrastructure. Pick SSE unless you need bidirectional.
Production connection
Genie’s /v1/ask/stream is this pattern, plus named SSE events (ai_disclosure, agent.handle, report) so the UI can render different stream sections differently. The base streaming handler came straight from this example.
Credit & reference. This post is field notes on example 04 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example04-chat-streaming/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.