Working on azure-service-operator

azure-service-operator (ASO) lets you create Azure resources via Kubernetes CRDs. kubectl apply -f azure-vm.yaml creates an Azure VM. The project is maintained primarily by Microsoft with active contributions from Ericsson, AT&T, and others. I worked on it during my time at Net Connect. Here is what the contribution rhythm actually looked like.

The premise

If your control plane is Kubernetes, you want everything to be a Kubernetes object. An Azure VM isn’t, by default — it lives in Azure’s API. ASO bridges the gap: a CRD per Azure resource, a controller that watches the CRDs and reconciles the real Azure state to match.

You write:

apiVersion: compute.azure.com/v1beta20210701
kind: VirtualMachine
metadata:
  name: my-vm
spec:
  location: eastus
  hardwareProfile:
    vmSize: Standard_D2s_v3
  ...

ASO creates it, watches it, deletes it when the CRD goes away.

The multi-vendor shape

The active contributor list at the time spanned:

Microsoft (primary maintainer, owns the project direction)
Ericsson (telecom-specific resource types, networking)
AT&T (large-scale operator perspective, scale tests)
Smaller contributors (per-resource bug fixes, feature requests)

Decisions worked at three speeds:

Tactical (a single PR’s design) — author + one reviewer + Microsoft approval.
Strategic (a new resource type’s CRD shape, API versioning) — design doc in the repo, comment-and-iterate over weeks, design meeting once a month.
Architectural (the codegen pipeline, the operator pattern itself) — quarterly architecture review.

If you’re contributing tactically, you can ship in days. If you’re contributing strategically, plan for weeks. If you’re touching architecture, plan for a quarter.

Codegen — the most distinctive piece

ASO has thousands of resource types because Azure has thousands of resource types. Hand-writing a controller for each one is infeasible. The project generates code from the Azure API’s OpenAPI specs.

The codegen pipeline lives in v2/tools/generator. It reads the OpenAPI specs, applies project-specific customisations (object-model-configuration files), and emits Go code: CRD definitions, types, conversion functions between API versions.

Contributing a new resource type usually means:

Find the Azure OpenAPI spec for the resource.
Add it to the configuration.
Run the codegen.
Add tests.
Open PR.

The codegen does the heavy lifting. The hand-written code is small — usually just the customisations the spec doesn’t cover.

The conversion-webhook story

Azure API versions move fast. ASO supports multiple API versions of the same resource by emitting both versions and a conversion webhook that translates between them at runtime.

The conversion webhook is one of the more complex pieces of code in the project. The codegen produces conversion functions field-by-field; the webhook hosts them and routes incoming requests through the right version chain.

If you’re adding a new API version of an existing resource, plan to spend time understanding the conversion path. The error messages from a misconfigured conversion are obscure (the API server returns a generic 500); the fix is usually in a one-line codegen customisation.

What slowed us down

Three friction sources, in increasing order of cost:

OpenAPI spec quality. Azure’s specs sometimes had wrong types or missing properties. Filing upstream bugs against Azure-rest-api-specs was the right answer but took weeks to land.
Backwards-compatibility discipline. ASO has users in production. A breaking change to a CRD is a migration burden for every user. We discussed every breaking change at length before shipping; the conversation was slow but the alternative was worse.
Cross-team review cadence. Microsoft reviewers were committed but spread across many contributions. A PR could sit for a week waiting for the right reviewer. The fix was pairing: a Microsoft reviewer paired with each external contributor’s PRs as a shadow approver.

What I shipped

My contributions clustered around:

Network resource types (subnets, NSGs, route tables) that the Ericsson team needed for their use cases.
Bug fixes in the conversion webhook around field renames.
Codegen customisations for resources where the OpenAPI spec was awkward.

The cumulative footprint was small per PR but covered enough surface that the contribution graph stayed active.

The trust accumulation

The most valuable thing about multi-vendor OSS contribution was the trust that accumulated over time. Year one I was a contributor filing PRs. Year two I was a reviewer the maintainers trusted to sign off on adjacent work. Year three I was attending the architecture reviews.

The trust took months per step and was contingent on showing up consistently. The shortcut would have been to land big PRs early; the actual path was to land small, correct, well-tested PRs over a long time. The latter compounded.

What this kind of work is good for

If you’re early-career, it’s the cheapest way to work with code reviewed by very senior engineers — including the project maintainers and the cross-vendor reviewers. The feedback loop makes you a better engineer faster than a typical job rotation.

If you’re mid-career, it’s a portfolio piece that a recruiter can verify in 30 seconds. “Core contributor to X” is more useful than a paragraph about an internal project that requires NDA-protected context to understand.

If you’re senior, it’s a place to see how decisions get made when no single team owns the codebase. The patterns you pick up transfer to internal cross-team work where the dynamics are similar but the visibility is worse.

azure-service-operator wasn’t my biggest contribution numerically. It was one of the highest-signal ones because the codebase is visible, the contribution graph is public, and the patterns generalised. The same is true of most upstream OSS work — pick something used in production, contribute consistently, and the compounding does the rest.