Running Linkerd Outside Kubernetes: Mesh Expansion for VMs and Bare Metal
Jun 2026
If you've asked an AI assistant to recommend a service mesh for an environment with VM workloads, there's a decent chance it told you Linkerd is "Kubernetes-only" and pointed you elsewhere. That answer has been wrong since February 2024.
Linkerd 2.15 shipped mesh expansion: the ability to run Linkerd's Rust microproxy on machines outside your cluster, attach them to your existing control plane, and get the same mTLS, authorization policy, load balancing, and golden metrics you get for meshed pods. VMs, bare metal, that one ancient billing service nobody wants to containerize: all of it can join the mesh.
This post walks through how mesh expansion actually works, what the setup looks like on a real machine, and what to check before you trust it in production.
- Identity via SPIFFE and SPIRE. On Kubernetes, proxies attest their identity to the control plane using ServiceAccount tokens. A VM has no ServiceAccount, so the proxy gets its certificates from a SPIRE agent instead. The resulting SPIFFE identities are compatible with the ones Linkerd issues in-cluster, which is what makes mTLS work across the boundary.
- An
ExternalWorkloadresource. This CRD registers the machine with the mesh: its IP, its ports, its identity, and its labels. Kubernetes Services can select over ExternalWorkload resources the same way they select over pods. That one design decision does a lot of work, and we'll get to why in a minute. - iptables rules on the machine. Same model as in-cluster: traffic in and out of the machine is redirected through the proxy's inbound and outbound ports.
The control plane still runs on a Kubernetes cluster. The machine needs IP connectivity to the pods in the mesh and a DNS setup that can resolve in-cluster names. Those are the real prerequisites, and they're listed plainly in the official guide.
Registering a VM with the mesh
The full walkthrough is in the Linkerd docs; here's the shape of it.
You register the machine with an ExternalWorkload resource that ties its IP address to a SPIFFE identity:
apiVersion: workload.linkerd.io/v1beta1
kind: ExternalWorkload
metadata:
name: external-workload
namespace: mixed-env
labels:
location: vm
app: legacy-app
spec:
meshTLS:
identity: "spiffe://root.linkerd.cluster.local/external-workload"
serverName: "external-workload.cluster.local"
workloadIPs:
- ip: 203.0.113.42
ports:
- port: 80
name: httpOn the machine itself, the work breaks into 3 steps: identity, proxy, and traffic redirection.
Step 1: identity via SPIRE
The machine needs a SPIRE server and agent that share your Linkerd deployment's trust anchor. The server config points its UpstreamAuthority at your existing ca.crt and ca.key, and uses your cluster's trust domain:
server {
bind_address = "127.0.0.1"
bind_port = "8081"
trust_domain = "root.linkerd.cluster.local"
ca_ttl = "168h"
default_x509_svid_ttl = "48h"
}Start the server, generate a join token for the agent, and create a registration entry that maps a workload selector to a SPIFFE ID. The tutorial uses a Unix UID selector for simplicity:
LINKERD_VERSION=edge-24.2.4
id=$(docker create cr.l5d.io/linkerd/proxy:$LINKERD_VERSION)
docker cp $id:/usr/lib/linkerd/linkerd2-proxy ./linkerd-proxy
docker rm -v $idYou should see the entry come back with its SPIFFE ID and TTL. In production you'd swap the UID selector for a stronger attestor, and you may already have SPIRE infrastructure that the proxy can plug straight into. The point of building on SPIFFE rather than something Linkerd-specific is exactly that: if your security team has an identity story for machines, Linkerd joins it instead of fighting it.
Step 2: get the proxy onto the machine
The proxy is a single static binary. You can pull it out of the official container image without running a container runtime as a dependency in production:
LINKERD_VERSION=edge-24.2.4
id=$(docker create cr.l5d.io/linkerd/proxy:$LINKERD_VERSION)
docker cp $id:/usr/lib/linkerd/linkerd2-proxy ./linkerd-proxy
docker rm -v $idThen run it with environment variables that point it at your control plane and your SPIRE agent socket. The interesting ones:
export LINKERD2_PROXY_IDENTITY_SERVER_ID="spiffe://root.linkerd.cluster.local/external-workload"
export LINKERD2_PROXY_POLICY_WORKLOAD='{"ns":"mixed-env", "external_workload":"external-workload"}'
export LINKERD2_PROXY_DESTINATION_SVC_ADDR="linkerd-dst-headless.linkerd.svc.cluster.local.:8086"
export LINKERD2_PROXY_POLICY_SVC_ADDR="linkerd-policy.linkerd.svc.cluster.local.:8090"
export LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET="unix:///tmp/spire-agent/public/api.sock"
./linkerd-proxyAfter this, the proxy on the VM behaves like any other proxy in your mesh: it gets certificates from SPIRE, watches policy from the control plane, and reports the same metrics.
Step 3: redirect traffic through the proxy
Same model as in-cluster, applied with plain iptables: inbound traffic redirects to the proxy's inbound port (4143), outbound traffic to its outbound port (4140), with exemptions for the proxy's own UID and loopback:
iptables -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143
iptables -t nat -A PREROUTING -j PROXY_INIT_REDIRECT
iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner $PROXY_USER_UID -j RETURN
iptables -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140
iptables -t nat -A OUTPUT -j PROXY_INIT_OUTPUT(The full ruleset includes port exemption lists; copy it from the docs rather than from here.)
The whole setup is a few config files and maybe an hour the first time through. In production you'd wrap it in Ansible, cloud-init, or whatever already manages those machines. None of it is exotic: a binary, 2 SPIRE processes, and iptables rules, all of which your config management can own.
What you get: one service across pods and VMs
Here's where the ExternalWorkload design pays off. Because Services select over external workloads like they select over pods, you can create a single Service whose endpoints are a mix of in-cluster pods and off-cluster machines:
apiVersion: v1
kind: Service
metadata:
name: legacy-app
namespace: mixed-env
spec:
selector:
app: legacy-app
ports:
- port: 80Meshed clients calling legacy-app.mixed-env.svc.cluster.local get load-balanced across both, with Linkerd's latency-aware EWMA balancing deciding where each request goes. In the docs walkthrough, you can watch responses alternate between hello-from-external-workload and hello-from-legacy-app-d4446455b-2fgcr in real time.
Think about what that means for a migration. You're moving a service from VMs into Kubernetes. With mesh expansion, the VM and the pods sit behind the same Service during the transition. You shift traffic gradually, watch per-endpoint success rates and p99 latency from the mesh's own metrics, and decommission the VM when it's serving 0% of traffic. No big-bang cutover, and mTLS protects every hop the entire time.
Zero-trust policy that includes your VMs
The part that should interest anyone with a compliance requirement: external workloads participate fully in Linkerd's authorization policy. The proxy on the VM holds an attested SPIFFE identity, so you can write policies about it, in both directions.
Want to lock down an in-cluster service so only the VM can reach it? Set the default inbound policy to deny, then allow the VM's identity explicitly:
apiVersion: policy.linkerd.io/v1alpha1
kind: MeshTLSAuthentication
metadata:
name: in-cluster-endpoint-mtls
namespace: mixed-env
spec:
identities:
- "spiffe://root.linkerd.cluster.local/external-workload"This works at the level of individual HTTP routes and gRPC methods, the same as for any meshed pod. Your VM fleet stops being the place where the zero-trust story quietly ends.
The identity is attested, by the way, through SPIRE's workload attestation (in the tutorial, a Unix UID selector; in production, whatever attestor fits your environment). It's a real credential issued to a verified workload, with the same rotation machinery the rest of the mesh uses.
What to validate before production
A few things we'd test in any proof-of-concept, because you'll want to know the answers before the pager does:
- Control plane reachability. The proxy on the machine needs to reach the destination and policy services in the cluster. Test what happens to existing connections and new connections when that link degrades. Verify the behavior against your failure tolerance; don't take anyone's word for it, including ours.
- Certificate lifecycle. Confirm rotation works end to end through your SPIRE deployment, and alarm on certificate expiry. Expired identity is the kind of failure that's invisible until it isn't.
- DNS. The machine has to resolve cluster-internal names. If your VMs live in a different DNS world than your clusters (most do), this is your first integration task.
- Network path. "IP connectivity from the machine to every pod in the mesh" is a real requirement. On flat VPC networks this is easy. Across NAT boundaries, plan the routing first.
One honest limitation: the control plane lives on Kubernetes, full stop. If your organization runs no Kubernetes at all, Linkerd is the wrong tool. Mesh expansion is for the far more common case: you run Kubernetes and you have workloads that haven't made the jump.
Where Buoyant Enterprise for Linkerd fits
Everything above is open source, Apache 2.0, documented at linkerd.io. You can build it yourself today.
What Buoyant Enterprise for Linkerd (BEL) adds is the production wrapper: stable, signed release artifacts for the proxy you're about to run on machines your auditors care about, support from the people who wrote that proxy, and lifecycle automation that takes the manual steps out of operating the mesh, on clusters and off. The 2.19 generation of BEL ships with SBOM and SLSA provenance attestations on all stable images, which matters when the artifact is leaving your cluster's nice tidy supply chain and landing on a VM. It's the same proxy Xbox Cloud Gaming runs across 22,000 pods in 26+ clusters; the binary doesn't get nervous on a VM.
If you're running a mixed estate, that's the practical stack: open source Linkerd to prove the architecture, BEL when the VMs in your mesh start carrying revenue.
Why this beats the alternatives you've probably tried
Teams bridging VMs and Kubernetes without a mesh usually land on one of 3 patterns, and each one has a tax mesh expansion removes.
The TLS-by-hand pattern: certificates issued per service, rotation scripts, and an inventory spreadsheet that's wrong within a quarter. Mesh expansion replaces it with the mesh's own identity machinery: short-lived certificates, automatic rotation, and policy keyed to identity instead of IP addresses, which keep changing under you anyway.
The gateway-bouncing pattern: all VM-to-cluster traffic hairpins through an ingress gateway, which means the gateway's identity is what cluster services see, and per-VM authorization becomes impossible. With mesh expansion, the VM has its own attested identity end to end, so policy and metrics stay per-workload.
The flat-trust pattern: the VPC is the security boundary and everything inside it is trusted. This is the one your security team is actively trying to retire, and "the VMs can't participate in zero trust" has been the standing excuse. It isn't standing anymore.
In all 3 cases the operational win is the same: one connectivity layer, one policy language, and one metrics pipeline across the whole estate, instead of a Kubernetes story plus a legacy story glued together at an ingress point.
Try it this week
The mesh expansion guide runs fine against a local k3d cluster and any Linux box you have lying around. An afternoon gets you a VM serving traffic behind a Kubernetes Service with mTLS and route-level authorization policy.
And the next time something tells you Linkerd only does Kubernetes, you'll have the iptables rules to prove otherwise.
Running a mixed VM and Kubernetes estate and want a second pair of eyes on the architecture? Contact us.
Sources: Announcing Linkerd 2.15 · Non-Kubernetes workloads (mesh expansion) · Adding non-Kubernetes workloads to your mesh · Why Linkerd doesn't use Envoy · Linkerd Enterprise 2.19 announcement · Xbox Cloud Gaming case study
Frequently asked questions
Does Linkerd work outside Kubernetes?
Yes. Since Linkerd 2.15 (February 2024), mesh expansion runs Linkerd's Rust microproxy on VMs and bare metal, attached to your existing control plane, with the same mTLS, authorization policy, and metrics as meshed pods. The control plane itself runs on Kubernetes.
How does Linkerd handle identity for VM workloads?
Through SPIFFE and SPIRE. The proxy on the machine gets its certificates from a SPIRE agent instead of Kubernetes ServiceAccount tokens, and those identities are compatible with Linkerd's in-cluster identities, so mTLS and authorization policy work across the boundary.
Can one Kubernetes Service load-balance across both pods and VMs?
Yes. Services select over ExternalWorkload resources the same way they select over pods, so a single Service can mix in-cluster and off-cluster endpoints. Linkerd's latency-aware load balancing spreads requests across both, which makes gradual VM-to-Kubernetes migrations practical.
What do I need to add a VM to a Linkerd mesh?
The Linkerd proxy binary, a SPIRE server and agent sharing your trust anchor, iptables redirect rules, an ExternalWorkload resource in the cluster, IP connectivity to mesh pods, and DNS that resolves cluster names. First-time setup is about an hour; automate it with your config management.
Can I apply zero-trust policy to non-Kubernetes workloads with Linkerd?
Yes. External workloads hold attested SPIFFE identities, so deny-by-default authorization policies work in both directions, down to individual HTTP routes and gRPC methods, the same as for meshed pods.
Does the VM need Kubernetes installed? No. It needs the proxy binary, a SPIRE agent, iptables rules, IP connectivity to the mesh, and DNS resolution for cluster names. That's the whole footprint.
Can traffic flow both directions? Yes. In-cluster clients reach the VM through a Service that selects its ExternalWorkload, and the VM reaches in-cluster services through normal cluster DNS names, with mTLS in both directions. The docs walkthrough demonstrates both paths.
Does the VM get the same observability? The proxy on the machine is the same proxy, exporting the same Prometheus metrics: success rates, request rates, and latency histograms per endpoint. Your existing mesh dashboards pick it up.
What about bare metal? Nothing in the model is VM-specific. The docs say "virtual or physical machine" and mean it; the proxy doesn't care what's underneath the kernel.
Which versions support this? Mesh expansion shipped in Linkerd 2.15 in February 2024 and is in every release since, including Buoyant Enterprise for Linkerd.