Get Service Mesh Certified with Buoyant.

Enroll now!
close

How Linkerd is Adapting for Stateful AI Workloads

Next episode:

Comcast's Platform Engineering: Guardrails and Scale in the Age of AI

Next

Previous episode:

Navigating the AI Era at Bloomberg

Previous

We recently sat down with Oliver Gould, CTO and co-founder of Buoyant and creator of Linkerd, to dive deep into the world of service mesh in the AI era. During this AI Kubernetes Show episode, we discussed the tough architectural puzzles that pop up when running AI workloads in production, what it takes to handle new protocols like MCP, and where he sees the service mesh heading in the future.

The role of reliability for AI inference Workloads

This blog post was generated by AI from the interview transcript, with some editing.

As AI moves into production, new architectural patterns and challenges are emerging, particularly concerning the cost and reliability of inference workloads. The traditional concerns of the microservice boom are amplified in this new environment.

One major pain point in the AI inference world is the high cost of failure. If an inference request fails, you often have to recompute a lot of work across many different places. Rerunning those failed requests becomes a tremendous cost lever for organizations due to compute resources or API usage.

This is where the network layer shines as a solution. Tools like Linkerd, which operate at this layer, are perfectly positioned to handle these issues. Its existing feature set—things like fault tolerance, intelligent load balancing, retries, timeouts, and getting traffic routed correctly—becomes even more vital for stable. 

Historically, Linkerd tackled reliability by creating what some call a layer five (or layer seven) load balancer. By operating at the protocol layer, specifically on the HTTP or gRPC request layer, the system can use protocol-level information—like HTTP verbs to determine if a request is retriable or status codes to understand the nature of a failure. The main objective here is simple but powerful: pull a significant chunk of the application's reliability concerns down into the infrastructure itself.

Addressing the challenges of new protocols like MCP

The emergence of new protocols, such as MCP (Model Context Protocol), introduces new complexity for platform engineers and existing network tooling. 

MCP introduces new challenges in that it is a stateful streaming protocol: the connection maintains state about the ongoing conversation, which complicates common practices like multiplexing. Furthermore, unlike the quick, unary requests typical of classic HTTP and gRPC load balancing, MCP involves lengthy transactions. This shift needs different, more stable load balancing techniques. 

Another major hurdle for existing network tooling is that critical information needed for routing and checking success is often buried inside the payload, not in the easily accessible header. This means you can't just check a status code to know if a response was successful; instead, you have to dig into the body, parse the JSON, and extract the relevant data. This makes significant parts of the traffic opaque to most of the network tooling engineers rely on today.

Evolving infrastructure for stateful workloads

The stateful nature of MCP interactions clashes with the prevalent stateless design of microservice ecosystems and Kubernetes deployments.

Kubernetes is often associated with microservices and stateless applications. However, agentic interactions or MCP are much more like a long-lived stream, creating a contextual loop. This shift toward stateful operations means organizations need to rethink their fundamental processes for infrastructure and planning. If you're managing stateful workloads that need to scale and auto-heal, you have to focus on how you manage things like hotspots and load distribution. It's also important to nail down the auto-provisioning of resources, especially for things like GPUs during peak hours. Ultimately, auto-scaling and load balancing aren't separate concerns; they are fundamentally connected problems that need to be addressed in concert.

Linkerd's MCP support

Buoyant recently announced support for MCP in Linkerd, a capability designed to extend the service mesh's battle-tested reliability capabilities to this new class of traffic.

Linkerd’s engineering team is focused on optimizing the load balancer for streams, because that’s the most powerful tool we have for improving latency at scale. For extending protocol support, the plan is to build upon existing policy APIs, such as those for Gateway API’s HTTPRoute and GRPCRoute, by introducing a new configuration primitive to handle extended protocol configuration. (This has the working name of MCPRoute.)

Natively integrating these features with established systems and open standards like Gateway API is an important part of the plan. This ensures the load balancing is fully configurable, allowing users to maintain control over the associated costs, trade-offs, and overall performance.

The impact of AI tooling on developer productivity

Gould shared his personal experience with AI tooling, highlighting its effectiveness in augmenting developer workflow. AI is proving to be great for automating those tasks nobody wants to write, like GitHub Actions YAML and Kubernetes YAML. This automation is a huge time-saver.

Gould shared an example where AI tooling like Copilot and Cortex accelerated diagnosis in a tough situation. Diagnosing a really narrow race condition between some Go synchronization primitives would have taken well over a week. Instead, it took about 10 minutes, and even led to a prototype that uses about half as much memory.

In short: we should use humans for what they do best and robots for what they do best. AI excels at diving through an immense amount of logs and data to find things like a memory leak in the logs or even in the code itself. This capability allows the developer to focus on high-value problem-solving and design.

Stay in Touch with Oliver

You can connect with Oliver Gould and Buoyant online on Blue Sky at: @olix0r.net‬, and see what the team is up to on buoyant.io and linkerd.io.

Get started with Buoyant
Enterprise for Linkerd

Download and install the world's most advanced service
mesh on any Kubernetes cluster in minutes