Kubernetes Best Practices - Part 4 - Setting Resource Requests and Limits in Kubernetes

When a pod is scheduled by Kubernetes, it's important the containers have enough resources to actually run. If you schedule a large app on a node with limited resources, it's possible the node runs out of memory or CPU and things just stop working.

Let's take a look at how you can solve these problems using resource requests and limits.

Requests and limits are the mechanisms, Kubernetes uses to control resources such as CPU and memory.

Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node they can give it that resource.

Limits, on the other hand, make sure a container never goes above a value. The container is only allowed to go up to limit and then its restricted.

Let's see how these work.

Resource types

There are two types of resources CPU and memory.

A resource type has a base unit. CPU represents compute processing and is specified in units of Kubernetes CPUs. Memory is specified in units of bytes. For Linux workloads, you can specify huge page resources. Huge pages are a Linux-specific feature where the node kernel allocates blocks of memory that are much larger than the default page size.

The Kubernetes scheduler uses these to figure out where to run your pods.

Container resources example

A typical spec for pod and resources might look something like this.

 - name: container1
   image: busybox
       memory: "32Mi"
       cpu: "200m"
       memory: "64Mi"
       cpu: "250m"

Each container in the pod can set its own requests and limits, and these are all additive. CPU resources are defined in millicores. If your container needs two full cores to run, you'd put the value 2000m.  If your container only needs 1/4 of a core, you would put a value of 250m.

One thing to keep in mind is that if you put a value in that's larger than the core count of your biggest node, then your pod will never be scheduled.

Let's say you have a pod that needs four cores, but your Kubernetes cluster is just comprised of two core VMs. In this case, your pod will never be scheduled.

So unless your app is specifically designed to take advantage of multiple cores things like scientific computing, and some databases, it's usually a best practice to keep the CPU request at one or below, and then run more replicas to scale it out. This gives the system more flexibility and more reliability.

When it comes to CPU limits, things get interesting. So CPU is consider a compressible resource. If your app starts hitting your CPU limits, Kubernetes will start to throttle your container. This means your CPU will be artificially restricted, giving your app potentially worse performance. However, it won't be terminated or evicted.

Memory resources are defined in bytes. Normally, you give a mebibyte value from memory. But you can give it anything from bytes to petabytes. Just like CPU, if you put a memory request that's larger than the amount of memory on your nodes, the pod will never be scheduled.

Kubernetes pods lifecycle

At the end of the day, these resources requests are used by the Kubernetes scheduler to run your workloads. And it's kind of important to understand how this works so you can tune your containers correctly.

So let's say you want to run some pods on your cluster. Assuming the pod specifications are valid, the Kubernetes schedule will use round robin load balancing to pick a node to run your workload.

So Kubernetes will check if the node has enough resources to fulfill the request on the pod's containers. If it doesn't, then it'll move on to the next node. If none of the nodes in the system have resources left to fill the requests, then pods go into a pending state.

By using Google Kubernetes engine's features, such as the node autoscaler, GKE can automatically detect a state and then create more nodes automatically. And then if there's an excess capacity of nodes, the autoscaler can scale it down and remove nodes to save you money.

So Kubernetes schedule these pods based on the requests. But a limit can be higher than the requests, right?

So this means that in some scenarios a node can actually run out of resources.
And we call this an overcommitted state. So when it comes to CPU, like we said before, Kubernetes will start to throttle the pods. Each pod will get as much as it requested, but it might not be able to go up the limit. We'll start throttling it down.

But when it comes to memory, Kubernetes has to make some decisions on which pods to kill and which part to keep until you free up system resources. Otherwise, the whole system will crash.

Subscribe to Transcloud's blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.