Get Service Mesh Certified with Buoyant.

Enroll now!
close

Comcast's Platform Engineering: Guardrails and Scale in the Age of AI

Next episode:

Vibe Coding, Slopsquatting, and the New Era of AI Security Risks

Next

Previous episode:

Navigating the AI Era at Bloomberg

Previous

In this episode of the AI Kubernetes show, we talked with Curtis Cook, a platform DevOps engineer at Comcast and Deep Roots member (part of the CNCF Merge Forward community group), about how the increasing velocity of code development due to AI tooling is changing the world of platform engineering.

Responding to increased code velocity

This blog post was generated by AI from the interview transcript, with some editing.

The accelerating pace of code development driven by AI, means platform engineers have to completely rethink how they manage code quality and security.

The massive amount of AI-generated code presents a significant scaling problem. Think of AI-generated code as bringing on thousands of junior developers. This definitely makes things faster, but it demands serious management to keep things from getting out of hand. Someone still needs to step in and make sure all that code is secure, compliant, and up to standard.

Comcast's shift to platform-level controls

At Comcast, the focus is on platform-level controls instead of policing individual commits. This approach comes straight from the lessons learned with Kubernetes: you simply can't scale by throwing more people at the problem. The only way to scale is by automating the mundane tasks and enforcing standards right at the platform level. Ultimately, the goal is to build a system that supports both development velocity and reliability, with the clear understanding that chasing speed for the sake of speed isn't the point.

Security in the age of AI

The introduction of generative AI and other AI tools brings a new security threat surface and increases the "blast radius." This is reminiscent of past infrastructure changes, particularly lessons learned from Kubernetes. For example, a single misconfigured RBAC policy in Kubernetes could expose an entire cluster.

The difference with AI is that the concern moves beyond just infrastructure, as AI is actively making decisions. This makes securing AI workloads absolutely crucial. Comcast approaches securing these systems by treating AI like any other critical system, adopting a "zero trust by default" mindset. They enforce the use of only approved tools and ensure "guardrails" are "baked in" right from the start of the software development lifecycle. Additionally, they enforce strict isolation and make sure that privileges are always set to the "least needed."

Mitigating hallucinations and context bloat

Applying the least privileged access mindset to AI tools can mitigate risks like hallucinations. By narrowly defining the scope of tools an AI agent has access to, engineers can combat "context bloat."

Narrowing the scope is a good move because it significantly improves your chances of avoiding issues like hallucinations. When the model has less variety to pull from, the results naturally become more reliable.

The non-deterministic world of AI

The shift from deterministic systems (like functions, where input equals predictable output) to non-deterministic, goal-based systems powered by AI may be one of the most profound shifts since migrating from bare metal to the cloud.

Non-deterministic outputs introduce a new set of challenges. When it comes to security, threats like prompt injection, model drift, and data poisoning are all real concerns. For testing and validation, the traditional "if this then that" logic of unit tests is insufficient. The non-deterministic nature of AI outputs makes it hard to gauge, requiring newer testing methods like statistical validation and confidence scoring to mitigate hallucinations.

Core benefits of AI tooling 

Despite the challenges, this AI approach offers significant upsides. The biggest benefit is speed, which means developers get solutions faster, dramatically increasing productivity. Also, much like Kubernetes, non-deterministic systems can be set up for resiliency. By having multiple paths to a solution, a system can recover more gracefully.

Cultural and technical shifts

The new mindset centers on establishing "guardrails, not gates." Think of it this way: Kubernetes gave us a declarative infrastructure, and now AI is offering a declarative intent. The future of software isn't about controlling every step. It’s about clearly defining the destination and building platforms that can safely get us there.

Community engagement and the path to platform engineering

Getting involved with the Kubernetes and CNCF community is an experience that underscores the power of open source and a truly welcoming environment. Initial engagement often hinges on the community's readiness to assist, especially through platforms like Slack and Stack Overflow. The toughest initial climb when transitioning into a Kubernetes engineering role is mastering the fundamentals—things like some Linux knowledge and solid networking information or fundamentals. To make the community even more inviting, it is important to be open to people from different backgrounds. This is a point exemplified by the variety of CNCF communities that exist, such as Deep Roots (a BIPOC group) and others catering to those who are deaf and hard of hearing, LGBTQ, or neurodivergent.

If you haven't found a community yet, AI can help newcomers get started. They can use AI to build custom tooling. One good example is a context-aware LLM chatbot trained on CNCF site data that acts as a "journey selector." An engineer could ask, "I created this software and I want to run it in a certain way; what CNCF tools would you recommend?" and get a step-by-step path.

Get started with Buoyant
Enterprise for Linkerd

Download and install the world's most advanced service
mesh on any Kubernetes cluster in minutes