By Tim Macfarlane on September 06, 2019 • 19 min read
This is a pretty technical post! My goal here is to document some of the lesser known aspects of setting up EKS. It’s by no means an introduction to Kubernetes or AWS concepts so I’m presuming some prior experience and knowledge.
We recently built a system that deployed a brand new, isolated environment for each and every pull request submitted to one of our larger project’s repositories. Heroku calls these Review Apps. I talk about the advantages of deploying review apps here. The TLDR is that before we had review apps we were releasing every week or two, after we had review apps we were releasing every day or two. It’s all about where you do your manual exploratory testing. Do you do it on the pull-request? Or once your code is in master? It can make a big difference to how often you can release.
Since we were already running our production and test environments in AWS, and since Kubernetes seemed to be a good environment to deploy and run an arbitrary number of different applications, EKS made a lot of sense. EKS is AWS’s manage Kubernetes cluster product.
Like many applications, ours is made up of several microservices and third party dependencies such as databases. This is primarily a manual test environment so we throw in a few extra diagnostic tools such as redis-commander to look inside our redis DBs, maildev to see what emails are being sent by the system and to whom. Because we have several services, it’s not possible to use Heroku’s Review Apps which really only work for single service applications, and, more to the point, with our production system deployed on AWS, it was a good idea to stay on AWS for our test environments.
Each review app has a domain name that includes the PR number and the name of the service. We use a convention such as $SERVICE-pr-$PR_NUMBER.example.com
, so we’d have domain names such as web-pr-1234.example.com
and api-pr-1234.example.com
for the “web” and “api” services respectively, on the pull request with number 1234.
We have a wildcard DNS *.example.com
pointing at a single load balancer (ELB), which also has a wildcard certificate for the same domain, *.example.com
. This means all traffic going to those different domains go to the same load balancer. Inside the cluster, we use what is known as an ingress controller to route the requests to the corresponding service running on the cluster. An ingress controller is usually a load balancer like Nginx that reads the configuration about which host maps to which app from ingress resources you place in the cluster.
Here’s a more or less full breakdown of what happens:
web-pr-1234.example.com
web-pr-1234.example.com
.*.example.com
) matches the domain entered in the address bar (web-pr-1234.example.com
).It would have been nicer to have a domain like web.pr-1234.example.com
, but unfortunately you can only have wild card domains and certificates at one level, such as *.example.com
, not *.*.example.com
. It’s possible of course that we could have automated the creation of new certificates and DNS entries for each of our pull requests but this seemed a bit excessive..!
With this basic idea in mind we got to work putting all this together. This article shows some of the working out we did while making the routing and load balancing work, as well as a problem we bumped into as more and more applications were deployed onto the system, the famous max pod limit problem. We do this with commands that you can use to do it yourself.
There are three-ish different ways EKS can interact with load balancers on AWS. They differ largely in what or who creates and configures the load balancers: kubernetes or yourself (i.e. with CloudFormation). I’ll go through each option briefly here.
(Dan Maas has written up the first two of these already in his article on Setting up Amazon EKS)
A LoadBalancer service
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
type: LoadBalancer
selector:
app: app
ports:
- port: 80
targetPort: 80
Simply creating a Kubernetes service of type LoadBalancer will create a new ELB in the same VPC. The advantage is that it’s a pretty simple way of getting a public address for an app inside the Kubernetes cluster. The disadvantage is that you won’t have much more control over the routing. You can’t, for example, have one ELB routing to multiple backends based on hostname or path.
Another disadvantage here is that while EKS will create the ELB automatically for you, you will likely want to setup a DNS entry for it, HTTPS certificates and perhaps even CloudFront. At which point you’ll be referring back to CloudFormation or manual processes.
You’ll also need to make sure your subnets have the correct tags for this to work, see: https://docs.aws.amazon.com/eks/latest/userguide/load-balancing.html.
The ALB Ingress Controller
As people are increasingly using the ingress features of Kubernetes over the LoadBalancer service, a native ingress controller for AWS seems to make a lot of sense. Documentation for this can be found here: https://github.com/kubernetes-sigs/aws-alb-ingress-controller.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: app
namespace: app
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/tags: Environment=dev,Team=test
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
backend:
serviceName: app-service
servicePort: 80
The idea here is that a single ingress resource will create an AWS ALB (application load balancer) to give you an public entrypoint, allowing you to do host-based or path-based routing. The slight issue here is that if there are two ingress resources, two ALBs will be created! This is slightly different to the way other ingress controllers work in that there is often just one entrypoint, configured with multiple ingress resources. In short, if you want to manage just one load balancer (with corresponding DNS, certs, etc) for all incoming traffic, your applications will be forced to merge their configuration into a single ingress resource. Which is far from ideal.
Again, like the LoadBalancer service above, the ALB ingress controller will create the ALB automatically for you. It can also setup DNS using Route 53 for you too, see here. Setup of other more advanced things like certificates and CloudFront seem to be out of scope so you’ll likely be doing this manually or with CloudFormation templates.
Like the LoadBalancer service above, make sure your subnets have the correct tags for this to work, see: https://docs.aws.amazon.com/eks/latest/userguide/load-balancing.html.
Regular ELB + NodePort service + Ingress
The third option is much more traditional AWS setup. We have an ELB that points to a number of nodes in the Kubernetes cluster on a particular port (we’re going to use 30080, in the default allowed range of 30000-32767 for a NodePort). We then deploy a regular ingress-nginx setup onto the cluster, but with a service of type NodePort. All traffic will be directed to the ingress-nginx pods which are configured by the one or more ingress resources required by your deployments.
The advantage of this approach is that you have a single ELB that you can setup with CloudFormation alongside other elements such as HTTPS certificates, DNS, CloudFront, etc. You could even have two or more ELBs pointing to the same cluster, depending on DNS and certificate requirements.
We’re going to show how you can setup your cluster routing with this last approach.
As hinted at above, there are indeed plenty of ways you can create an EKS cluster and its surrounding infrastructure. We’re going use eksctl
to demonstrate in this article, but for production we opted for a single CloudFormation template that setup all the infrastructure (ELBs, HTTPS certificates, CloudFront, DNS, etc) and meant that we could execute the template on each master deployment to keep the infrastructure in line with it’s configuration.
eksctl
is a good tool, but it’s quite imperative whereas for GitOps you really want something that’s declarative. Imperative means you issue individual commands to change the cluster, declarative means you simply change the configuration, execute it, and the system will determine what needs to be changed or updated. CloudFormation and indeed Kubernetes are both great examples of declarative systems. eksctl
have some GitOps support in the works though, something to wait for.
As we’re using eksctl
, we’re going to create our cluster in a few steps:
eksctl
.(Using CloudFormation you can put all this into a single CloudFormation template but that’s probably for another time.)
Anyway, enough chat. Let’s create a cluster:
eksctl create cluster \
--name ReviewApps \
--region us-west-2 \
--without-nodegroup
Behind the scenes eksctl
will create a CloudFormation stack (called eksctl-ReviewApps-cluster
) with the elements required to run the EKS cluster, things like the VPC, private and public subnets, internet gateways, etc. We don’t create a node group yet, as we’ll see we need to connect the EC2 instances in the node group to the ELB infrastructure.
Next we’ll create our ELB. Here’s the CloudFormation template:
AWSTemplateFormatVersion: '2010-09-09'
Description: Review Apps Infrastructure
Parameters:
Vpc:
Type: String
Description: The VPC to place this ELB into.
PublicSubnets:
Type: CommaDelimitedList
Description: A comma-delimited list of public subnets in which to run the ELBs
Resources:
WorkerNodeTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckPath: /healthz
HealthCheckPort: "30080"
Protocol: HTTP
Port: 30080
Name:
!Sub "WorkerTargetGroup"
VpcId: !Ref Vpc
LoadBalancerListenerHttp:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
DefaultActions:
- Type: forward
TargetGroupArn:
Ref: WorkerNodeTargetGroup
LoadBalancerArn:
Ref: LoadBalancer
Port: 80
Protocol: HTTP
LoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
SecurityGroups:
- !Ref ElbSecurityGroup
Subnets: !Ref PublicSubnets
ElbSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Incoming Access To HTTP
VpcId: !Ref Vpc
Tags:
- Key: Name
Value:
Fn::Join:
- ''
- - Ref: AWS::StackName
- "ElbSecurityGroup"
ElbSecurityGroupEgress:
Type: AWS::EC2::SecurityGroupEgress
Properties:
GroupId:
Ref: ElbSecurityGroup
IpProtocol: "-1"
FromPort: 0
ToPort: 65535
CidrIp: 0.0.0.0/0
ElbSecurityGroupIngressHttp:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId:
Ref: ElbSecurityGroup
IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
NodeSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Incoming HTTP for Kubelet
VpcId: !Ref Vpc
Tags:
- Key: Name
Value:
Fn::Join:
- ''
- - Ref: AWS::StackName
- "NodeSecurityGroup"
NodeSecurityGroupEgress:
Type: AWS::EC2::SecurityGroupEgress
Properties:
GroupId:
Ref: NodeSecurityGroup
IpProtocol: "-1"
FromPort: 0
ToPort: 65535
CidrIp: 0.0.0.0/0
NodeSecurityGroupIngressHttp:
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId:
Ref: NodeSecurityGroup
IpProtocol: tcp
FromPort: 30080
ToPort: 30080
SourceSecurityGroupId: !Ref ElbSecurityGroup
Outputs:
NodeSecurityGroup:
Description: Node Security Group
Value: !Ref NodeSecurityGroup
ElbDomainName:
Description: Load Balancer Domain Name
Value: !GetAtt LoadBalancer.DNSName
WorkerNodeTargetGroup:
Description: Worker Node Target Group
Value: !Ref WorkerNodeTargetGroup
We’re going to use a tool called cfn, which you can download using Node/NPM: npm i -g cfn
.
We need to get some outputs from the eksctl-ReviewApps-cluster
stack, namely SubnetsPublic
and VPC
.
cfn deploy ReviewAppsLoadBalancer elb.yaml \
--Vpc $(cfn output eksctl-ReviewApps-cluster VPC) \
--PublicSubnets $(cfn output eksctl-ReviewApps-cluster SubnetsPublic)
You can get the outputs of the stack with this:
eks $ cfn outputs ReviewAppsLoadBalancer | json
{
"ElbDomainName": "Revie-LoadB-1FBHYHW4171RT-141325971.us-west-2.elb.amazonaws.com",
"NodeSecurityGroup": "sg-0e131391e2ef1f980",
"WorkerNodeTargetGroup": "arn:aws:elasticloadbalancing:us-west-2:596351669136:targetgroup/WorkerTargetGroup/34920be01bd9d6e7"
}
From these we get the target group ARNs and the node security group and enter them into this file:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: ReviewApps
region: us-west-2
nodeGroups:
- name: ng-1
instanceType: t3.medium
desiredCapacity: 3
securityGroups:
withShared: true
withLocal: true
attachIDs: ['...'] # node security group here
targetGroupARNs: ['...'] # target group ARN here
maxPodsPerNode: 110
That last parameter maxPodsPerNode
we’ll describe in more detail below.
And run
eksctl create nodegroup -f cluster.yaml
You should see some active nodes in the cluster now:
eks $ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-14-252.us-west-2.compute.internal Ready <none> 47s v1.13.8-eks-cd3eb0
ip-192-168-57-164.us-west-2.compute.internal Ready <none> 49s v1.13.8-eks-cd3eb0
ip-192-168-94-132.us-west-2.compute.internal Ready <none> 51s v1.13.8-eks-cd3eb0
Next we’ll install Ingress-Nginx. We take their “mandatory” resource set from this page: https://kubernetes.github.io/ingress-nginx/deploy/
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/mandatory.yaml
Then we’ll make a NodePort service for it so it’s listening on port 30080 on each of our nodes in the cluster:
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
spec:
type: NodePort
ports:
- name: http
port: 80
targetPort: 80
nodePort: 30080
protocol: TCP
selector:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
kubectl apply -f ingress-nginx-node-port.yaml
The ELB can take a little while to mark the instances as healthy again (by default 5 successful checks at 30s each = something like 3 minutes.)
Once they’ve started up, you should be able to check that it’s working. (We get the domain name of the ELB from the CloudFormation outputs above.)
eks $ curl Revie-LoadB-1FBHYHW4171RT-141325971.us-west-2.elb.amazonaws.com/healthz -v
* Trying 52.43.72.37...
* TCP_NODELAY set
* Connected to Revie-LoadB-1FBHYHW4171RT-141325971.us-west-2.elb.amazonaws.com (52.43.72.37) port 80 (#0)
> GET /healthz HTTP/1.1
> Host: Revie-LoadB-1FBHYHW4171RT-141325971.us-west-2.elb.amazonaws.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 09 Sep 2019 13:08:45 GMT
< Content-Type: text/html
< Content-Length: 0
< Connection: keep-alive
< Server: openresty/1.15.8.1
If that’s not working, here are some things you’ll need to check when getting an ELB to talk to nodes in a Kubernetes cluster:
GET /healthz
.Now we can start demonstrating some apps running behind our ingress load balancer. Put this into backend.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
spec:
replicas: 3
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: backend
image: refractalize/proxy-backend
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: backend
spec:
selector:
app: backend
ports:
- protocol: TCP
port: 80
targetPort: 80
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: backend
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
backend:
serviceName: backend
servicePort: 80
Then:
kubectl apply -f backend.yaml
You should now be seeing this:
eks $ curl Revie-LoadB-1FBHYHW4171RT-141325971.us-west-2.elb.amazonaws.com -H 'Host: app.example.com'
{
"host": "backend-5f99c56b5d-cjx26",
"url": "/",
"method": "GET",
"headers": {
"host": "app.example.com",
"x-request-id": "2d5143f6b46c0334447ac6e26d9c8f42",
"x-real-ip": "10.46.0.0",
"x-forwarded-for": "10.46.0.0",
"x-forwarded-host": "app.example.com",
"x-forwarded-port": "80",
"x-forwarded-proto": "http",
"x-original-uri": "/",
"x-scheme": "http",
"x-original-forwarded-for": "80.245.25.53",
"x-amzn-trace-id": "Root=1-5d76509a-19d0f1249e754dbf9d05751c",
"user-agent": "curl/7.54.0",
"accept": "*/*"
}
}
Notice how we’re passing -H "Host: app.example.com"
to curl
? The routing we setup for the Ingress looks at the Host
header on the request and directs it to the correct kubernetes service and pods. Remember when you type in “app.example.com” into your browser, the browser does two or three different things with that domain name:
Host: app.example.com
Ingress controllers can route based on the path and protocol too.
You will likely find that you can only run a limited number of pods on each node in the cluster… This, believe it or not, is a feature.
Let’s scale our backend to some high number, like 70. Across 3 medium nodes this should be quite reasonable.
kubectl scale --replicas 70 deploy/backend
You’ll find that some of our pods aren’t starting, they’re stuck in one of two states Pending
or ContainerCreating
. You can see the error in more detail by running kubectl describe pod/podname
:
Pending
will have an error message: 0/3 nodes are available: 3 Insufficient pods, 3 node(s) were unschedulable
(this implies that we’re hitting the kubelet’s max pod limit.)ContainerCreating
will have an error message: NetworkPlugin cni failed to set up pod "podname" network: add cmd: failed to assign an IP address to container
(this implies that we’re running out of available IP addresses on the node, or in the subnet.)This is where the max pods setting comes in. Each kubelet (each node, essentially) has what is known as a pod limit. By default this is 110 pods. In EKS it’s much less because of the default AWS CNI installed on each node. A CNI (or network plugin) tells the kubelet how pods should interact with the network interfaces on the node, and handles things like allocating new IP addresses for your pods.
The difference with the AWS CNI plugin, is that it allocates real IP addresses on the subnet. This makes your pods accessible from outside the cluster, from anywhere inside your VPN - which might sound useful, but actually since there’s no corresponding DNS entries, and since your pods are likely to be recycled with the usual ebb and flow of deployments it’s unlikely many people will want to be connecting directly with your pods, far more likely you’ll be using some form of ingress controller to do this.
Also far more likely is that people wonder why the can only run some limited number of pods on each node. The limit because there’s only a small number of network interfaces available on each EC2 instance. This number depends on the instance type, a full list of these limits can be found here.
To fix all of this, we need to do two things:
We’ll start by replacing the CNI, first delete the AWS CNI:
kubectl delete ds aws-node -n kube-system
And install weave-net:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
We’ve already used the maxPodsPerNode
setting above on the node group with eksctl
. If you’re using CloudFormation to create your cluster and nodes, you can add the following user-data to your LaunchConfiguration or LaunchTemplate, it should look something like this:
WorkerNodeLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
...
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh --use-max-pods false ${EksCluster}
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
Note the --use-max-pods
argument to the bootstrap.sh
. By default the bootstrap will start the kubelet with the limit for the instance type, by passing --use-max-pods
we ignore this limit and use the kubelet’s default limit, which is normally 110 pods per node. More information on how bootstrap.sh works here, and what cfn-signal is here.
You’ll very likely have to reboot all the nodes in the cluster to make both of these changes a reality. The eksctl
approach for this would be simply to scale the nodes to zero, and back up to original number.
eksctl scale nodegroup --name ng-1 --cluster ReviewApps --nodes 0
eksctl scale nodegroup --name ng-2 --cluster ReviewApps --nodes 3
Now you’re much more likely to be able to use your EC2 instances to their full capacity and save a few dollars in the process.
In summary, there are different ways to setup the EKS cluster, we’ve used eksctl
here, but CloudFormation is also a great option. We’ve setup a more traditional ELB configuration using a NodePort service for ingress-nginx. Finally, we’ve removed the limit on the number of pods per node imposed by AWS’s default CNI.
It’s always a good idea to get some decent logging and monitoring on your infrastructure too. I won’t go into that here, but we used DataDog on our project and made sure that each of our pods were annotated in such a way that we could easily identify logs coming from a single instance of a system (with mulitple services), a single service (with multiple pods) or indeed just a single pod.
The rest of our work in getting our review apps working involved some automation of build servers, deploying builds on new pushes, and finally cleaning up review apps when the pull requests were closed.
In all, our review apps (and their infrastructure) have been very successful. Our review apps are used by members of our team, of course for manual testing, but also by many of the people and teams we interact with, whether that’s to discuss how features work, for acceptance testing with clients and stake holders or just to see what other team members are up to.