·3 min read·← All posts
Go errgroup Concurrency

The shape

A coordinator agent needs to fan out to N specialist agents in parallel. If any fails fatally, cancel the rest. If they all succeed, collect their results. Standard fan-out / fan-in shape.

golang.org/x/sync/errgroup handles this in ~15 lines.

The basic pattern

import "golang.org/x/sync/errgroup"

func dispatchParallel(ctx context.Context, query Query) ([]Result, error) {
    g, ctx := errgroup.WithContext(ctx)
    results := make([]Result, len(specialists))

    for i, agent := range specialists {
        i, agent := i, agent  // capture
        g.Go(func() error {
            resp, err := agent.Run(ctx, query)
            if err != nil { return err }
            results[i] = resp
            return nil
        })
    }

    if err := g.Wait(); err != nil {
        return nil, err
    }
    return results, nil
}

errgroup.WithContext returns a derived context that’s cancelled when any g.Go‘d function returns an error. That cancellation propagates to all other in-flight goroutines via ctx.

The bounded variant

Without a limit, fan-out spawns N goroutines. For large N, that’s wasteful or hostile to downstream services. Bound concurrency with g.SetLimit:

g, ctx := errgroup.WithContext(ctx)
g.SetLimit(8)  // max 8 concurrent goroutines

for _, item := range items {
    item := item
    g.Go(func() error {
        return process(ctx, item)
    })
}
g.Wait()

The SetLimit was added to errgroup later than the original API; it’s the right default for any fan-out where N isn’t small.

The collect-some-errors variant

By default, the first error cancels everything. Sometimes you want to collect all the errors and decide afterwards:

g, ctx := errgroup.WithContext(ctx)
var (
    errs []error
    mu   sync.Mutex
)

for _, agent := range specialists {
    agent := agent
    g.Go(func() error {
        if err := agent.Run(ctx, query); err != nil {
            mu.Lock()
            errs = append(errs, err)
            mu.Unlock()
        }
        return nil  // don't propagate; collect instead
    })
}
g.Wait()

if len(errs) > 0 {
    return nil, fmt.Errorf("%d agents failed: %w", len(errs), errors.Join(errs...))
}

errors.Join (Go 1.20+) wraps multiple errors into one. Useful for the “some agents failed; tell me about all of them” reporting case.

The pitfall: shared state

The classic mistake — appending to a shared slice without a lock:

// WRONG — concurrent append, race
var results []Result
g.Go(func() error {
    r := agent.Run(ctx, query)
    results = append(results, r)  // RACE
    return nil
})

Fix: pre-allocate with indices (first pattern) or use a mutex (collect pattern). The race detector catches this if your tests exercise the parallel path.

The pitfall: outer context not propagated

If the parent of errgroup.WithContext doesn’t have a cancellation handler, an early error stops the goroutines but the caller doesn’t know to stop work. Always pass the derived context to your worker goroutines (the pattern above does); don’t use the parent context inside the worker.

What Genie’s coordinator does

agents/financial_supervisor fans out to specialists (analyzer, forecaster, anomaly_detector, recommender) in parallel with the bounded variant. Limit of 6. First error from any specialist cancels the rest — the supervisor has nothing useful to do without all specialists’ input.

The pattern shows up everywhere a multi-agent system parallelises. Worth knowing exactly; worth not over-engineering past errgroup + SetLimit for most cases.

For truly large fan-outs (thousands of items), errgroup’s limit isn’t ideal — a worker-pool pattern with a job channel is more efficient. The threshold is around 1000 concurrent goroutines; below that, errgroup wins on readability.

← Back to all posts