Kubernetes Pod Priority & Preemption

Apr 26, 2022

Pod priority is a Kubernetes scheduling feature that allows Kubernetes to make scheduling decisions comparing other pods based on priority number.

To assign a pod a certain priority, you need a priority class.

You can set a priority for a Pod using the PriorityClass object (non-namespaced) with a Value.

The value determines the priority. It can be 1,000,000,000 (one billion) or lower. Larger the number, the higher the priority.

Also, there are two default high-priority classes set by Kubernetes

system-node-critical: This class has a value of 2000001000. Pods like etcd, kube-apiserver, and Controller manager use this priority class.
system-cluster-critical: This class has a value of 2000000000. Addon Pods like coredns, calico controller, metrics server, etc use this Priority class.

Here is how pod priority work,

If a pod is deployed with PriorityClassName, the priority admission controller gets the priority value using the PriorityClassName value.
If there are many pods in the scheduling queue, the scheduler arranges the scheduling order based on priority. Meaning, the scheduler places the high priority pod ahead of low priority pods
Now, if there are no nodes available with resources to accommodate a higher priority pod, the preemption logic kicks in.
The scheduler preempts (evicts) low priority pod from a node where it can schedule the higher priority pod. The evicted pod gets a graceful default termination time of 30 seconds. If pods have terminationGracePeriodSeconds set for preStop container Lifecycle Hooks, it overrides the default 30 seconds.
However, if for some reason, the scheduling requirements are not met, the scheduler goes ahead with scheduling the lower priority pods.

Note: This is an excerpt from an article published on devopscube.com. For a detailed example please visit → Pod PriorityClass Explained

DevOpsCube Bytes