KEP-1421: Make individual NodeFit predicates configurable #1421
Labels
kind/feature
Categorizes issue or PR as related to a new feature.
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
Is your feature request related to a problem? Please describe.
The
NodeFit
predicate was introduced to allow the descheduler to make better decisions about evictions to avoid cases where there's no feasible node for re-scheduling after a pod gets evicted. To enable the predicate,DefaultEvictor
plugin provides an optionalnodeFit
option that each plugin can utilize. The list of existing checks has been extended over time in good faith to improve the eviction decisions. TheNodeFit
predicate currently consists of the following checks:Some plugins adopted the
NodeFit
predicate natively through invocation of additionalPodFitsAnyOtherNode
,PodFitsAnyNode
andPodFitsCurrentNode
predicates built on top ofNodeFit
. Nevertheless, there are cases where it's more preferable to check only a subset of existing checks or disable the checks completely. Which is problematic for such plugins where it's impossible to fully disable the checks.User stories
RemovePodsViolatingNodeAffinity
orRemovePodsViolatingNodeTaints
have subset of
NodeFit
checks enabled natively. These checks can not be disabledwithout disabling the corresponding plugin. Instead, as an administrator
I'd like to disable specific checks like "a pod fits resource requests" to get
as close as possible to disabling all
NodeFit
checks. So I can evict and detectpending pods and allow cluster autoscalers or other tools to reconcile the situation.
PodLifetime
plugin to checkthere are nodes with sufficient resources that can accept any evicted pod
even though pod node selector does not match any node. So when there are
too many pending pods due to node label mismatch my automation can label
existing nodes and allocate more resources or have the multi-cluster scheduler
reschedule my workload to a different cluster.
RemovePodsViolatingInterPodAntiAffinity
plugin evicts pods even though there's currently no node with sufficient resources
while respecting node affinities and taints. So the cluster autoscaler
can scale up new nodes when too many Pending pods are observed.
For that I might need to disable some of the existing
NodeFit
checks thatare no longer valid or might collide with how the non-default scheduler works.
NodeFit
predicates with GPU oriented checks and enable them only for specific (custom)
plugins/workload.
NodeFit
checks that needto be disabled. Checks that either produce suboptional evictions or
are re-implemented by a given plugin.
Describe the solution you'd like
Allow to enable/disable individual checks the NodeFit predicate consists of.
Describe alternatives you've considered
TBD through a proposal.
What version of descheduler are you using?
descheduler version: 0.30.z
Additional context
The text was updated successfully, but these errors were encountered: