· 2 min read · ← All posts
Ardan Labs Go Streaming SSE LLM

Field notes from working through example 04 of Ardan Labs’ Ultimate AI course by Bill Kennedy and Florin Pățan (Apache 2.0). My fork: PratikDhanave/ai-training. Thank you Bill and Florin for teaching this material — the patterns in this post are derived from the course; the production reflections at the end are mine.

What the example teaches

A non-streaming chat handler returns the entire response when the LLM finishes. The user waits in silence for 10-30 seconds.

A streaming handler flushes each token as it arrives. The user sees the response forming in real time. Same total latency; dramatically better perceived experience.

What it looks like

func chatStream(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    flusher := w.(http.Flusher)

    stream, _ := llm.GenerateStream(r.Context(), prompt)
    for chunk := range stream {
        fmt.Fprintf(w, "data: %s\n\n", chunk.Token)
        flusher.Flush()
    }
}

What I learned

http.Flusher is the load-bearing part. Without flusher.Flush(), Go buffers the output and the user gets the whole thing at the end anyway. Easy to forget; the chat feels broken when it happens.

SSE beats WebSockets for this use case. SSE goes through every CDN and proxy without configuration. WebSockets need upgrade-aware infrastructure. Pick SSE unless you need bidirectional.

Production connection

Genie’s /v1/ask/stream is this pattern, plus named SSE events (ai_disclosure, agent.handle, report) so the UI can render different stream sections differently. The base streaming handler came straight from this example.


Credit & reference. This post is field notes on example 04 from Ardan Labs’ Ultimate AI by Bill Kennedy + Florin Pățan, licensed Apache 2.0. The original example: cmd/examples/example04-chat-streaming/. My fork with notes: PratikDhanave/ai-training. Highly recommend the course for anyone building AI applications in Go — the material is rigorous and the Kronk + yzma + llama.cpp pipeline gives you hardware-accelerated local inference end-to-end. Thank you, Bill and Florin.

← Back to all posts