Is HAZL available in open-source Linkerd?

No. Open-source Linkerd uses Kubernetes-native Topology Aware Routing. HAZL is a feature of Buoyant Enterprise for Linkerd (BEL). Learn more at https://docs.buoyant.io/buoyant-enterprise-linkerd/latest/features/hazl/

How much will HAZL save me?

Savings depend on your cloud provider, traffic volume, and zone topology. Model it against your own cluster at https://buoyant.io/hazl-calculator and compare against your BEL cost.

OBSERVABILITY · RELIABILITY · KUBERNETES

Observability and Reliability

Q: What does HAZL do?

HAZL balances HTTP and gRPC requests to keep traffic in-zone for cost savings, while sending it cross-zone when reliability requires it. Learn more at https://docs.buoyant.io/buoyant-enterprise-linkerd/latest/features/hazl/

Q: How is HAZL different from Topology Aware Routing?

Topology Aware Routing allocates zones statically and ignores live load, latency, and health — and is binary. HAZL balances on real load with in-band health checks and spills cross-zone only when needed. Learn more at https://docs.buoyant.io/buoyant-enterprise-linkerd/latest/features/hazl/

for Kubernetes

Get success rate, request volume, and latency for every meshed service the moment its pods roll with the proxy injected. No instrumentation. Then turn the operational work that quietly causes outages, like certificate rotation, into a non-event.

Provider

AWS

$0.02/GB

GCP

$0.01/GB

Clusters

No. of Kubernetes clusters

AZs ?

HAZL needs just 2 AZs — competitors require 3+.

Availability zones per cluster

Data transfer

Cross-zone traffic volume

— select a provider

Select a provider to see your savings.

Without HAZL

—

per year

With HAZL

—

60% less

Savings

—

per year

Get started for Free Book a demo Read the docs ↗

lines of instrumentation code required

85%

reduction in control-plane memory at scale (2.20)

24 h

automatic proxy cert rotation cycle

Why uniform metrics are hard in a polyglot cluster

Every service emits metrics differently, or not at all, and instrumenting each one by hand across languages and frameworks is a project that never finishes. Meanwhile the operational work that keeps the mesh healthy, certificate rotation above all, is the kind of toil that causes a cluster-wide outage when a step is missed.

A service mesh reads traffic at the proxy, so you get one consistent set of golden metrics across every service with no code change, and the riskiest operational steps get automated.

WITHOUT A SERVICE MESH

"A project that never finishes."

Every service emits metrics differently, or not at all. No consistent signal across the fleet.

Certificate rotation: the step most likely to cause a cluster-wide outage when missed.

WITH LINKERD

lines of instrumentation code required

24h

automatic cert rotation, every proxy

85%

less control-plane memory at scale (2.20)

Lorem ipsum backup validation info here? Read the full analysis ↗

What you get

From the moment the proxy is injected, you get consistent golden signals across every meshed service — no instrumentation, no code change. The operational work that quietly causes outages gets automated too.

Golden metrics, no instrumentation

Success rate, RPS, and latency percentiles for every meshed service, the moment the proxy is injected.

Cert rotation as a non-event

Linkerd auto-rotates proxy certs every 24 hours; Buoyant Enterprise for Linkerd (BEL) 2.20 automates the riskiest step, trust-anchor rotation.

Up to 85% less control-plane memory

A 2.20 destination-controller refactor, on large, high-churn clusters.

Hundreds of failed deploys caught

loveholidays built SLOs on Linkerd metrics and caught them before they became outages.

How does Linkerd observability work?

Linkerd's proxy reads every meshed connection and records golden metrics (success rate, requests per second, latency percentiles) for HTTP, HTTP/2, and gRPC, with no code change. Those metrics are scraped into Prometheus and surfaced on the dashboard, ready for SLOs and alerts. The control plane issues and rotates identities, and the BEL trust-anchor rotation operator handles the one cert step most likely to cause an outage.

Stop paying for traffic you don't need to

Most teams turn HAZL on with no tuning and watch steady-state cross-zone traffic drop while reliability holds. Run the demo, then point it at a real cluster and measure against your own bill.

Topology Aware Routing

What it doesn't

✗ Drops to 0% in-zone under failure

✗ Requires ≥3 balanced pods per zone

✗ No health-based spill logic

✗ Struggles with autoscaling

✓ Free, no license needed

✓ Built into Kubernetes

Buoyant Enterprise for Linkerd

What it covers

✓ Never sacrifices reliability

✓ Works with <3 pods per zone

✓ In-band health checking (HTTP/gRPC)

✓ Reads HTTP 429 as a spill signal

✓ ~1 min recovery after overload

✓ No tuning required in most cases

Why HAZL

1. Cost

2. Reliability

3. Simplicity

Cuts cost and protects reliability

HAZL is a "request-level load balancer in Buoyant Enterprise for Linkerd that balances HTTP and gRPC traffic in environments with multiple availability zones," and unlike Topology Aware Routing "never sacrifices reliability to achieve this cost reduction."

Read the docs ↗

It reacts to real load

HAZL balances on outstanding requests per endpoint and prefers local endpoints, adding cross-zone only when local load climbs. It uses in-band health checking, and reads rate-limit responses: an in-zone endpoint returning HTTP 429 is a reason to spill rather than a fast success (a BEL feature) In the same failure that dropped Topology Aware Routing to 0%, HAZL held near 100%.

Read the docs ↗

It works where TAR struggles

Fewer than 3 pods per zone, imbalanced traffic, autoscaling, and "requires no tuning or configuration" in most cases. It also preserves zone affinity across cluster boundaries.

Read the docs ↗

1. Cost

Cuts cost and protects reliability

Read the docs ↗

2. Reliability

It reacts to real load

Read the docs ↗

3. Simplicity

It works where TAR struggles

Fewer than 3 pods per zone, imbalanced traffic, autoscaling, and "requires no tuning or configuration" in most cases. It also preserves zone affinity across cluster boundaries.

Read the docs ↗

Show me the evidence

Every claim is backed by a reproducible demo, a published cost model, and a CNCF track record.

DEMO

Reproducible demo

Run it on a 3-zone local cluster and watch HAZL hold success rate where Topology Aware Routing drops it to 0%.

See the demo ↗

BLOG

Published cost model

The cost figures above are from Buoyant's published AWS model — open for review.

Read the analysis ↗

COMMUNITY BACKED

CNCF-graduated

Buoyant created Linkerd, coined the term "service mesh," and shipped the service mesh in July 28, 2021.

Frequently asked questions

What does HAZL do?

It balances HTTP and gRPC requests to keep traffic in-zone for cost savings, while sending it cross-zone when reliability requires it.

How is it different from Topology Aware Routing?

TAR allocates zones statically and ignores live load, latency, and health, and is binary. HAZL balances on real load with in-band health checks and spills only when needed.

Is HAZL in open-source Linkerd?

No. OSS uses Kubernetes-native TAR; HAZL is a BEL feature.

How much will it save me?

It depends on cloud provider, traffic volume, and zone topology. Model it against your own cluster and your BEL cost.

Observability and Reliability

for Kubernetes

Why your cross-zone bill is so high?

Why uniform metrics are hard in a polyglot cluster

What you get

Golden metrics, no instrumentation

Cert rotation as a non-event

Up to 85% less control-plane memory

Hundreds of failed deploys caught

How does Linkerd observability work?

Zone-aware load balancing with HAZL

Stop paying for traffic you don't need to

Why HAZL

Cuts cost and protects reliability

It reacts to real load

It works where TAR struggles

Cuts cost and protects reliability

It reacts to real load

It works where TAR struggles

Show me the evidence

Reproducible demo

Published cost model

CNCF-graduated

Frequently asked questions