·2 min read·← All posts
Saga Distributed Systems Workflow Go

The shape of the problem

A saga is a sequence of steps where each step has a compensating action. If any step fails, you run the compensations for the steps that already succeeded — in reverse order.

For a payment disbursement:

1. Reserve funds in source account     →  Compensation: release reservation
2. Create transaction record           →  Compensation: mark void
3. Call payment gateway                →  Compensation: refund (if it went through)
4. Update borrower's balance           →  Compensation: revert balance
5. Send notification                   →  Compensation: send "cancelled" notification

Happy path: 1→2→3→4→5. Failure at step 4: compensate 3, compensate 2, compensate 1 (in that order).

The order of compensations matters

Forward order succeeded 1, 2, 3. Reverse compensation order is 3, 2, 1. Doing them in any other order produces incorrect state.

Why: compensating step 1 (release reservation) before step 3 (refund) might trigger a balance check that fails. Compensating step 2 before step 3 might leave the gateway call orphaned without a transaction record to associate the refund with.

The Go implementation

type Step struct {
    Forward      func(ctx context.Context, state *State) error
    Compensate   func(ctx context.Context, state *State) error
    Name         string
}

type Saga struct {
    Steps []Step
}

func (s *Saga) Run(ctx context.Context, state *State) error {
    var completed []int
    defer func() {
        if len(completed) == len(s.Steps) { return }  // happy path
        // Compensate in reverse order
        for i := len(completed) - 1; i >= 0; i-- {
            if err := s.Steps[completed[i]].Compensate(ctx, state); err != nil {
                log.Error("compensation failed", "step", s.Steps[completed[i]].Name, "err", err)
                // Record to operator queue; don't propagate (we're in defer)
            }
        }
    }()
    for i, step := range s.Steps {
        if err := step.Forward(ctx, state); err != nil {
            return fmt.Errorf("step %s failed: %w", step.Name, err)
        }
        completed = append(completed, i)
    }
    return nil
}

The defer does the compensation. The completed slice tracks which steps actually ran. The reverse iteration ensures the right order.

Compensation failures

A compensation can fail. The reservation-release fails because the source account was frozen. The refund fails because the gateway is down.

The pattern for handling this:

  1. Log the compensation failure to a dedicated table. Don’t propagate (you’re already in error path).
  2. Surface to operator review. A daily report; an on-call alert if it’s frequent.
  3. Idempotent compensations. When the operator retries, the compensation has to be safe to run again.

What I’ve seen go wrong

For Genie’s pkg/workflow saga implementation, these patterns are baked into the helper. The application defines the steps; the framework handles the reverse-order compensation and the failure-of-compensation queue.

The unhappy path is where the engineering shows. The happy path writes itself; the saga compensations are what you actually need to think about.

← Back to all posts