Running a Production-Ready Service Mesh with Linkerd on Amazon EKS

Ivan Porta

April 15, 2026

Buoyant Enterprise for Linkerd

Building a production-grade service mesh requires more than just installing an open source project. It demands a robust, scalable, and secure infrastructure. This blog post provides a comprehensive overview of a proven, production-ready AWS architecture for deploying Buoyant Enterprise for Linkerd (BEL) on Amazon’s Elastic Kubernetes Service (EKS). We dive into how managed AWS services, including ECR, AWS Private CA, Amazon Managed Prometheus, and Grafana, are used to simplify operations, improve security, and deliver enterprise-grade observability. You'll learn about the architecture and the Terraform code that provisions all required services, empowering your team to run BEL with confidence.

*AWS architecture for deploying BEL on EKS*

Terraform deployment structure

To simplify deployment and ensure a consistent, repeatable setup, we have created a ready-to-go Terraform-based deployment. This code provisions all required AWS services and configurations for a production-ready environment. You can access the entire project on the Service Mesh Academy GitHub.

The Terraform code is structured into clear, single-responsibility modules, where each directory manages a specific component:

The aws/ module provisions the core AWS infrastructure: the EKS cluster, VPC, IAM roles, ECR registry, CloudWatch log groups, Amazon Managed Prometheus, Amazon Managed Grafana, and the AWS Private CA hierarchy.
The cert_manager/ module installs cert-manager and the AWS PCA issuer plugin, then creates the AWSPCAClusterIssuer and Certificate resources that drive automatic certificate issuance.
The trust_manager/ module takes the issued CA bundle and distributes it across namespaces so workloads can trust it.
The linkerd/ module is split into two parts:
- the certificates/ module that creates the Linkerd namespace and the cert-manager Certificate objects, and
- the components/ module that installs Linkerd Enterprise CRDs and the control plane via Helm.
The grafana/ module, similarly to Linkerd, also has two parts:
- the grafana/alloy/ module deploys Grafana Alloy as the in-cluster telemetry agent, shipping metrics to Amazon Managed Prometheus and logs to CloudWatch, and
- the grafana/dashboard/ module provisions four pre-built Linkerd dashboards into Amazon Managed Grafana using an API key.
The emojivoto/ module deploys a demo application to validate that the mesh is working end-to-end.

More of a video person? You can also watch our Service Mesh Academy workshop on that topic: Running Linkerd on Amazon EKS. We'll also link to the specific sections in the recording below each section of this blog post.

Managed AWS Services: The production architecture

The production-ready architecture for running BEL on EKS uses various purpose-built, managed AWS services. These components are important for simplifying operations, improving your security posture, and getting enterprise-grade observability. These are the AWS services used in our setup:

AWS Elastic Container Registry

A common production requirement is to avoid pulling critical infrastructure images from public registries (such as GitHub Container Registry). Instead, you can use Amazon ECR to mirror images into a private registry, where you can allow only specific users or services to pull or push specific images, scan images for known security risks and report vulnerabilities, and, most importantly, eliminate dependence on external registries, giving the team greater control over their software supply chain.

The mirroring process involves pushing several core Linkerd images, like controller, proxy, and proxy-init (and in this case, the Emojivoto demo application images), directly into AWS ECR. During the Linkerd deployment, the configuration is modified to use the specific ECR URIs instead of the default paths, ensuring the cluster pulls exclusively from the verified internal mirror.

Terraform automates this step in ecr.tf, but under the hood it authenticates to ECR, logs crane into the registry, and copies the required images from GHCR into your private ECR repos.

ECR_PASSWORD=$(aws ecr get-login-password --region ap-northeast-2 --profile buoyant)

echo "$ECR_PASSWORD" | crane auth login 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com --username AWS --password-stdin

crane copy ghcr.io/buoyantio/controller:enterprise-2.19.4  123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/sma/linkerd-controller:enterprise-2.19.4

crane copy ghcr.io/buoyantio/proxy:enterprise-2.19.4 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/sma/linkerd-proxy:enterprise-2.19.4

crane copy ghcr.io/buoyantio/proxy-init:enterprise-2.19.4 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/sma/linkerd-proxy-init:enterprise-2.19.4

Watch the demo

This section of the workshop provides an overview of the key features offered by AWS ECR, including private image hosting, access control, and vulnerability scanning. And this section walks you through the ECR console, showing the Linkerd images (controller, proxy, and proxy-init) and the Emojivoto image already stored in private repositories; confirms via kubectl that the cluster is pulling exclusively from ECR; and demonstrates ECR's built-in security scanning to surface known vulnerabilities in the EmojiVoto image.

Amazon EKS: Deploying BEL

Amazon Elastic Kubernetes Service (EKS) reduces the operational burden of running Kubernetes by providing a managed Kubernetes control plane and streamlined lifecycle operations such as version upgrades. It integrates with AWS IAM to make it easy to define precise permissions for cluster and Kubernetes operations, and it provides native integration with other AWS services.

Buoyant Enterprise for Linkerd (BEL) is a natural complement for EKS, providing security, reliability, and observability above and beyond what EKS supplies. Deploying BEL on EKS is straightforward. Terraform’s helm.tf file uses the helm_release resource to install the two core BEL Helm charts:

CRDs chart (installs the BEL Custom Resource Definitions)
Control plane chart (installs the BEL control plane components)

Conceptually, Terraform is equivalent to running the following Helm commands:

helm install linkerd-enterprise-crds \
  --repo https://helm.buoyant.cloud \
  --version 2.19.4 \
  --namespace linkerd \
  linkerd-enterprise-crds

helm install linkerd-enterprise-control-plane \
  --repo https://helm.buoyant.cloud \
  --version 2.19.4 \
  --namespace linkerd \
  --set license="<your-license-key>" \
  --set identity.externalCA=true \
  --set identity.issuer.scheme="kubernetes.io/tls" \
  --set controllerImage="123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/sma/linkerd-controller" \
  --set proxy.image.name="123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/sma/linkerd-proxy" \
  --set proxyInit.image.name="123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/sma/linkerd-proxy-init" \
  --set-string identityTrustAnchorsPEM="$(cat root-ca.pem)" \
  linkerd-enterprise-control-plane

You can find more detail on installation options and production settings in the official Buoyant guide.

Watch the demo

Watch this section for the key features of Amazon EKS and a walk-through of the Terraform plan used to provision the cluster and all supporting services and verify that the Linkerd control plane is running with images pulled from ECR.

AWS Private Certificate Authority

AWS Private Certificate Authority is a key building block for production-grade service mesh security. It provides a managed, highly available certificate authority for issuing and storing private certificates, and it offers auditing capabilities that make it easier for compliance teams to track certificate lifecycles and permissions. Most importantly, AWS PCA securely manages certificate private keys, eliminating the operational risk of losing keys or exposing them to unauthorized actors.

Since Linkerd’s security model is built on mutual TLS (mTLS) between all meshed workloads, Linkerd relies heavily on X.509 certificates. In particular, two key certificates are critical for allowing Linkerd to function correctly:

Root Certificate (Trust Anchor): The top-level certificate that establishes the identity of the entire mesh.
Subordinate Certificate (Identity Issuer): A certificate issued by the root that Linkerd uses to sign individual certificates for each proxy in the mesh.

To bring these certificates into Kubernetes, you can use cert-manager with the AWS PCA Issuer plugin. cert-manager requests the root and issuer certificates from AWS PCA via AWS APIs and stores them as Kubernetes Secret objects in the format Linkerd expects. Linkerd then reads those Secrets at startup and uses them for mesh identity and mTLS. To instruct Helm not to expect these certificates (since they will be managed externally), we need to set the following values in the Linkerd Control Plane Helm chart:

identity.externalCA: true
identity.issuer.scheme: "kubernetes.io/tls"

If you want a deeper dive into Linkerd’s certificate lifecycle and the recommended cert-manager configuration, this doc is the best reference.

Watch the demo

In this section of the workshop, I explain how AWS Private Certificate Authority, cert-manager with the AWS PCA issuer plugin, and trust-manager work together to provision and distribute the root and identity issuer certificates that Linkerd needs for mTLS.

And in this section, I walk through the AWS PCA console, showing the root and identity issuer certificates, then verifies in the cluster that the cert-manager resources, Kubernetes secrets, and trust-manager bundle all contain the correct certificates pulled from AWS PCA.

AWS CloudWatch

AWS CloudWatch is a service that provides a central repository for log management, monitoring, and proactive alerting. (It includes an anomaly detector that can automatically notify teams via third-party integrations like PagerDuty when unexpected behavior occurs). While it shares some dashboarding capabilities with Grafana, its primary strength lies in its ability to aggregate data from diverse sources and provide actionable insights into cluster health. Like other AWS services, it natively integrates with AWS-managed Grafana.

In this demonstration we use CloudWatch as storage for logs natively shipped by EKS and for logs scraped via the Alloy Agent running in the cluster. The Alloy Agent sends the logs to the /aws/sma/emojivoto and /aws/sma/linkerd log groups for later processing.

Watch the demo

Here you'll learn about the key features of AWS CloudWatch, including log aggregation, dashboards, alarms, and its anomaly detector for proactive alerting via third-party integrations. While this section demonstrates live log tailing and log management in CloudWatch for the Linkerd and EmojiVoto log groups, then shows the same logs accessible in managed Grafana via the CloudWatch data source for cross-referencing with metrics during troubleshooting.

Amazon managed Prometheus: Scalable metrics storage

While deploying Prometheus in a cluster is relatively simple, maintaining it (especially scaling storage and ensuring high availability for large volumes of metrics) can present significant operational challenges. This is where Amazon Managed Service for Prometheus shines: AWS handles the maintenance and scaling of the Prometheus service and provides SLAs that guarantee availability. This allows teams to focus on analyzing data rather than operating the database.

Amazon Managed Service for Prometheus is designed to work as a native data source for Amazon Managed Grafana. In this setup, administrators don’t need to manually configure complex URLs; they can simply select the Prometheus workspace from a dropdown menu within the Grafana interface to begin building dashboards.

In our setup, we use Grafana Alloy to scrape metrics from various sources, including the Linkerd control plane, the service mesh proxies, and application containers, and to remote write them to the AWS-managed Prometheus workspace, just as it would to any other Prometheus instance. This allows us to collect a rich set of metrics, often referred to as the “golden signals”: request rate, error rate, and latency. It also captures details on load balancing and endpoint performance without any application changes, since these metrics come “for free” from the Linkerd proxy.

AWS managed Grafana: Unified observability with native AWS data sources

Like Amazon Prometheus, Amazon Managed Grafana takes away the operational burden of provisioning, configuring, and maintaining Grafana servers by shifting them to AWS, and it comes with an SLA for availability. Additionally, it provides a seamless native connection to other AWS services like Amazon Prometheus or CloudWatch.

The service supports multiple authentication methods, including AWS Single Sign-On (SSO) and SAML. This demonstration specifically shows how to integrate Azure AD (Microsoft Entra ID) to manage administrative access, allowing users to log in with their corporate credentials to manage dashboards and permissions.

In this demo, we wanted to showcase the flexibility of the service by integrating it with Microsoft Entra ID and enabling both Amazon Managed Service for Prometheus and CloudWatch Logs as data sources, providing a complete overview of what’s happening in the cluster.

Watch the demo

This section covers the benefits of AWS-managed Grafana and Prometheus, including SLAs, remote write configuration, and the available authentication methods. And here I walk the audience through configuring SAML authentication with Microsoft Entra ID for managed Grafana, adding Amazon Managed Prometheus and CloudWatch as data sources, and viewing Linkerd metrics dashboards including success rate, request rate, and latency for the EmojiVoto application.

Kubernetes add-ons and enterprise components

To complete the production-ready architecture, we'll discuss the self-managed tools deployed within the EKS cluster alongside the core AWS services. These add-ons provide the in-cluster mechanisms needed for security, specifically automating the Linkerd identity system, managing certificate trust distribution, and acting as the dedicated agent for collecting all service mesh telemetry.

Cert-Manager

Cert-manager is a Kubernetes add-on that automates the full lifecycle of certificates, issuance, renewal, and storage, using Kubernetes-native APIs. It does this by introducing CRDs such as Issuer / ClusterIssuer and Certificate, which let you declare what certificates you need and who should issue them.

In this architecture, cert-manager integrates with AWS Private Certificate Authority using the AWS PCA issuer plugin. That plugin enables cert-manager to call AWS APIs to request certificates from AWS PCA and then publish the resulting signed certificate into Kubernetes Secret objects. Linkerd can then consume those Secrets to establish its mTLS identity system without generating CA material inside the cluster.

Trust-Manager

While cert-manager is responsible for issuing certificates and storing them as Kubernetes Secrets, trust-manager focuses on distributing bundles of trusted X.509 certificates across the cluster.

This matters for Linkerd because Linkerd expects the trust anchor to be available as a ConfigMap, whereas cert-manager outputs certificates as Secrets. Trust-manager bridges that gap: using its Bundle CRD, it can publish a bundle of trust anchors into one or more ConfigMaps across the namespaces Linkerd cares about.

Trust-manager is also especially useful during trust anchor rotation. By publishing a bundle that includes both the old and new trust anchors during the transition period, it enables a zero-downtime rotation; workloads can continue validating identities while the mesh gradually shifts from the old root of trust to the new one.

Check out this page for more detail on how Linkerd works with cert-manager and Trust-Manager (including certificate rotation).

Grafana Alloy

Grafana Alloy is the agent responsible for scraping metrics and logs from the Linkerd control plane, Linkerd proxies, and application containers. It uses remote write to send metrics to Amazon Managed Service for Prometheus and forwards logs to AWS CloudWatch. Depending on your needs, the agent can be configured to drop or filter specific metrics, allowing teams to optimize the amount of data sent to managed services and reduce costs and noise.

Linkerd proxies and the control plane emit a rich set of metrics that can provide a clear overview of mesh health, endpoints, latency, and more. By default, each proxy exposes its own metrics on port 4191 at the /metrics endpoint. Learn more about scraping Linkerd metrics.

Buoyant Enterprise for Linkerd

Buoyant Enterprise for Linkerd is the enterprise-ready distribution of the open-source Linkerd project. It’s designed to meet the rigorous security and reliability requirements of modern production environments and, beyond Linkerd’s standard capabilities, includes advanced features such as FIPS compliance, HAZL, Buoyant Cloud, dedicated support, and more.

In this demo, we installed the latest version available at the time of writing (2.19.4) and then meshed the emojivoto application, showcasing not only logs but also metric visualizations that help SREs understand the flow and health of applications running in the cluster. This is just one of the many benefits your applications can get from using BEL. To learn more about BEL features, check out our docs. You can also try it out for free. Get started today!

Achieving production-grade Linkerd on AWS

In this blog post, we dove into how to architect a comprehensive, production-ready deployment of Buoyant Enterprise for Linkerd (BEL) on Amazon EKS. By taking advantage of purpose-built, managed AWS services (including AWS Private CA for mesh mTLS security, ECR for secure image distribution, and the centralized observability stack of Amazon Managed Prometheus, AWS Managed Grafana, and CloudWatch), we successfully offload important operational burdens. When combined with Kubernetes add-ons like cert-manager, trust-manager, and Grafana Alloy, this deployment provides a scalable, secure, and fully observable service mesh foundation for any enterprise application.

FAQ

Why should I use AWS Elastic Container Registry when running a production-ready service mesh on Amazon EKS?

AWS ECR mirrors critical service mesh and application images into a private registry, which eliminates dependence on public sources, provides security scanning, and ensures control over the software supply chain.

How does AWS Private Certificate Authority secure a production-ready service mesh running on Amazon EKS?

AWS PCA provides a managed, HA CA for issuing and storing private certificates used by Linkerd to provide mTLS between all meshed workloads. Private keys never leave PCA and are fully managed by AWS, eliminating the operational burden and security risk of handling sensitive cryptographic material.

What managed AWS services are recommended for achieving enterprise-grade observability for a service mesh deployed on Amazon EKS?

The recommended services are Amazon Managed Prometheus (scalable metrics storage), AWS CloudWatch (centralized log management & alerting), & Amazon Managed Grafana (unified visualization). They eliminate provisioning, scaling, & patching overhead, letting teams focus on analysis & incident response.

What is the simplest way to deploy a production-ready service mesh on Amazon EKS using all the required AWS managed services?

While it varies based on preferences, IaC options like AWS CloudFormation, Terraform or OpenTofu allow teams to define all the required services in code that can be reviewed, tested, and reapplied consistently across environments, significantly reducing the risk of manual misconfiguration.

Running a Production-Ready Service Mesh with Linkerd on Amazon EKS

The enterprise architect's guide to the service mesh

Relevant articles

Ivan Porta

Terraform deployment structure

Managed AWS Services: The production architecture

AWS Elastic Container Registry

Watch the demo

Amazon EKS: Deploying BEL

Watch the demo

AWS Private Certificate Authority

Watch the demo

AWS CloudWatch

Watch the demo

Amazon managed Prometheus: Scalable metrics storage

AWS managed Grafana: Unified observability with native AWS data sources

Watch the demo

Kubernetes add-ons and enterprise components

Cert-Manager

Trust-Manager

Grafana Alloy

Buoyant Enterprise for Linkerd

Achieving production-grade Linkerd on AWS

FAQ

Why should I use AWS Elastic Container Registry when running a production-ready service mesh on Amazon EKS?

How does AWS Private Certificate Authority secure a production-ready service mesh running on Amazon EKS?

What managed AWS services are recommended for achieving enterprise-grade observability for a service mesh deployed on Amazon EKS?

What is the simplest way to deploy a production-ready service mesh on Amazon EKS using all the required AWS managed services?