A Kubernetes engineer’s guide to migrating to Amazon EKS

Flynn

March 10, 2026

Buoyant Enterprise for Linkerd

Welcome to EKS! If you’re familiar with Kubernetes but new to the world of AWS and specifically Amazon’s managed Kubernetes service, this is the guide for you.

The good news is that EKS is an ideal choice if you want serious reliability in the enterprise. It’s also a modern implementation: though AWS was the first hyperscale cloud provider, EKS itself didn’t arrive until 2018, making it the most recent of the hyperscaler-managed Kubernetes offerings out there.

The less good news is that you can’t jump into EKS blindly. EKS leans heavily on Amazon’s earlier EC2 engineering efforts, so it can feel quite different from some other managed Kubernetes offerings. This means it has some quirks and idiosyncrasies that you will need to learn if you want to avoid unpleasant surprises later.

In this guide we’ll walk you through the most important things we think you’ll need to know before you start to plan out your EKS-based platform.

Availability Zones, Regions, and Production-Grade Kubernetes

The first thing to understand is how AWS distributes its underlying compute hardware. On the surface, this can look similar to other cloud providers. But under the hood, the details are different.

Like most cloud providers, AWS splits its infrastructure into distinct regions, representing geographic locations of the underlying hardware, and availability zones (AZs), representing a subset of the hardware in that region.

Unlike other cloud providers, where zones can feel like an irrelevant detail, AZs are very important in EKS (and the rest of AWS). Within a zone, traffic latency is minimal—but since power failures or network outages can (and do!) take down entire AZs, a single AZ is a single point of failure. To avoid that scenario, EKS generally requires you to create clusters across at least two AZs, ensuring that your cluster can keep running even if an entire AZ goes dark. A common setup is to have an EKS cluster span three AZs so that it can survive a simultaneous failure of two zones.

This idea that EKS clusters will almost always span AZs has some ramifications. The first of these is that you need to think about how your subnets are set up when you create a cluster. Rather than creating overlay networks within Kubernetes to abstract away the underlying physical network, in EKS you’ll define the network’s configuration yourself. As such, you’ll need to set up subnets that match your desired AZ configuration, network security groups that match the subnet configuration, etc.

The bigger ramification of AZs with EKS, though, is that Amazon charges two cents per GB for cross-zone traffic—even within the same cluster. Two cents per GB (1¢ for ingress and 1¢ for egress) doesn’t sound like much, but for high-traffic clusters, it can mount up quickly. Some EKS users are surprised to find that their ultra-reliable, multi-zone EKS cluster is incurring 6 or even 7 figures of annual spend, just from zone ingress and egress charges!

Happily, there are good ways to avoid this expense. See the bottom of this guide for ways to cut cross-zone traffic charges in EKS without sacrificing reliability along the way.

ServiceAccounts and IAM

In EKS, you’ll also need to define the relationship between identity in the Amazon world (with IAMs) and in the Kubernetes world (with ServiceAccounts and the like). For new clusters, EKS Pod Identity is the simplest way to manage this. (Older clusters may be using the older IRSA mechanism, which is trickier to manage across multiple clusters.)

While it’s possible to do all this from the GUI, you’ll really want to look into HashiCorp’s Terraform, the de facto standard for infrastructure-as-code EKS deployment. A good place to start is https://github.com/BuoyantIO/aws-linkerd-better-together, where you can see an example Terraform plan to deploy an EKS cluster.

Kubernetes Version Management

EKS is relatively aggressive about keeping you on the latest version of Kubernetes. Once a new version of Kubernetes has been released upstream, it is often available on EKS within a month. And once available on EKS, it receives standard support for 14 months; after that, it enters extended support (at an additional cost) for another 12 months. You’ll need to make sure you plan for this continual upgrade process and whether you want to incur the cost of falling into the extended support window.

Upgrades follow the usual Kubernetes technique of rolling out new nodes, making sure all is well, and then shutting down the old nodes. EKS won’t let you skip versions, so you can’t go from (say) 1.33 to 1.35 in one step, and once you start an upgrade, you can’t pause it or stop it. In some cases, this might mean that it’s simpler to use ephemeral clusters, spinning up the new Kubernetes version in an entirely new cluster and using multicluster to migrate workloads to the new cluster.

One final note is that EKS requires at least five available IP addresses from the subnets you provide during cluster creation to manage upgrades. Additionally, new nodes might not always be created in the same subnets as your old ones during a rollout. You have to be meticulous with your security group and CIDR block configurations. This is another reason to look at Terraform early: it’s definitely the safer way to manage your Amazon infrastructure.

Making EKS "Complete" with Add-ons

Rather than try to bundle everything into a one-size-fits-all cluster, EKS tends to rely on add-ons to allow tailoring the cluster to meet your needs. Some of these are bridges to other Amazon services, such as:

AWS Elastic Container Registry: Managed Docker container registry to keep needed images within Amazon’s cloud
AWS CloudWatch: Metrics and log collector, visualization engine, and alerting system
AWS Managed Grafana: Managed Grafana service for dashboards and visualization of metrics, logs, and traces from many sources
Amazon Prometheus: Managed Prometheus-compatible service for collecting, storing, querying and alerting on metrics
AWS Private Certificate Authority: Managed service to safely create and issue X.509 certificates
AWS Load Balancer Controller: Bridges ALBs and NLBs into Kubernetes clusters

Others are open-source projects that simply provide useful additional functionality for EKS clusters:

ExternalDNS for Route 53 synchronization
Karpenter for autoscaling
cert-manager for certificate management (which works great with the AWS Private Certificate Authority)
Linkerd to provide security, reliability, and observability throughout your platform, even across clusters

If you’re already accustomed to the Amazon ecosystem, many of these will likely be familiar – if not, it's definitely worth some time looking into them.

Multicluster EKS

Rather than relying on a single production cluster, the modern pattern for Kubernetes platforms is to deploy across many smaller clusters. This strategy can help with reliability, isolation, and scale. This tenet applies just as much in AWS. For example, while multi-AZ clusters can improve your tolerance for zone failure, they won’t protect against an entire region failing... and AWS regions have been known to fail! To be resilient to regional outages, you’ll need clusters deployed in multiple regions.

If you can add a way to consistently control the communication between clusters, then multicluster becomes even more powerful:

You can do “atomic Kubernetes upgrades” by spinning up a new cluster with the new Kubernetes version, deploying your services there, and migrating your traffic to this new cluster only once it’s ready.
You can use this same technique to migrate from one cloud provider to another.
You can apply a "cattle, not pets” strategy to cluster management.

Unfortunately, neither EKS nor Kubernetes itself provides a particularly seamless way to handle cross-cluster traffic, but there are solutions. Read on!

Building a modern EKS-based platform with Linkerd

Many EKS adopters have found that Linkerd solves many of the challenges introduced by Amazon EKS. Linkerd improves the security, reliability, and cost-efficiency of EKS by:

Cutting cross-AZ traffic costs and latency: Linkerd’s High Availability Zonal Load Balancing (HAZL) feature cuts cross-AZ charges by keeping traffic local to its AZ, only allowing it to cross zone boundaries when necessary to preserve overall system reliability.
Adding zero-trust security: Linkerd adds mutual TLS as well as fine-grained authorization policies between services for a comprehensive approach to encryption, authentication, and authorization, all without requiring any changes to application code.
Multicluster communication: Linkerd makes communication between EKS clusters seamless, allowing you to dynamically shift traffic between clusters, even down to individual HTTP routes or gRPC methods, all without code changes.

The level of control that EKS provides over the network and identity allows EKS to handle Linkerd multicluster setups very effectively. To see how much you can take control of your cross-zone costs and improve deployment reliability, start your free trial of Buoyant Enterprise for Linkerd or get it directly from the AWS Marketplace.

Frequently Asked Questions (FAQ)

Why is a single EKS cluster required to span multiple Availability Zones (AZs)?

EKS is designed to provide production-grade reliability by eliminating a single point of failure. A single AZ can go down due to a power failure or network outage. By requiring a cluster to span at least two AZs, EKS ensures that your workloads can continue running even if an entire zone goes dark.

What is EKS's Kubernetes version support commitment?

Amazon EKS provides standard support for a Kubernetes version for 14 months after its release on EKS. This is followed by an optional extended support period of 12 months (at an additional cost), resulting in a total support commitment of 26 months.

How can I manage cross-AZ traffic cost and latency in EKS?

Whenever you use resources in multiple AZs, you incur cross-AZ traffic costs and latency. We recommend using a service mesh like Linkerd, which can intelligently control this traffic for you, dramatically cutting costs and preserving performance without requiring changes to your application code.