EKS cost at mid-market scale is dominated by four levers: node-group sizing, pod-resource right-sizing, idle workload elimination, and spot/savings-plan mix. Most savings claims you read about Kubernetes cost optimization come from one of these four; the order in which you address them determines how much work pays back. We cover each with the specific signals to act on.
Why EKS cost is its own problem
Kubernetes cost is a layered problem in a way that most other cloud spend is not. The bill is for nodes (EC2 instances). The workloads are pods. The pods request resources from nodes. The mismatch between what pods request, what they use, and what nodes provide is the source of most over-spend.
A typical mid-market EKS environment we see for the first time:
- Pods request 3-5x more CPU and memory than they use
- Node groups are over-provisioned to absorb the inflated requests
- Cluster Autoscaler keeps nodes alive that have a few small pods scattered across them, preventing scale-down
- Some workloads are running 24/7 that only need to run during business hours
- Compute commitment (Reserved Instances, Savings Plans) covers a small portion of the steady-state node base
The combination produces a bill that is often 40-60% larger than necessary. Closing that gap is the work.
This is mid-market context. Vendors like Spot.io (now part of NetApp) and CAST AI cover this space well at enterprise scale. We use them; we will say below where they are appropriate. The advice here is for the substantial population of companies running EKS at $20k-$200k/month who do not have a Kubernetes-specialist on staff.
The four levers
In order of typical ROI on a previously-untuned cluster:
| Lever | Typical first-pass savings | Implementation effort | Ongoing cost |
|---|---|---|---|
| Pod resource right-sizing | 20-40% of node spend | 1-2 weeks | 1 hr/week |
| Idle workload elimination | 10-25% of node spend | 3-5 days | 1 hr/week |
| Node-group sizing + bin-packing | 10-20% of node spend | 1 week | Set-and-forget after tuning |
| Spot + Savings Plan mix | 30-60% of compute spend (on the covered portion) | 1 week setup | 1 hr/month |
The percentages are not additive. The actual savings from running all four is usually 35-55% of total EKS bill, not the sum.
Lever 1: Pod resource right-sizing
The single most common Kubernetes cost issue we see: pods request 2-4x more CPU and memory than they use, and node provisioning follows the requests, not the usage.
The mechanism: when a pod is scheduled, the Kubernetes scheduler reserves the requested resources on a node. If a pod requests 4 CPU and uses 0.5 CPU, the other 3.5 CPU on that node is reserved-but-idle. The Cluster Autoscaler may add another node because the requested capacity is exhausted, even though the actual capacity is mostly free.
The fix: make pod resource requests reflect actual usage.
The standard tool: Vertical Pod Autoscaler (VPA) in Off (recommendation-only) mode. VPA observes pod usage over a window and recommends new request values. You apply them manually after review.
Setup:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # recommendation only
resourcePolicy:
containerPolicies:
- containerName: '*'
controlledResources: ["cpu", "memory"]After 7+ days of observation, run kubectl describe vpa my-app and read the Target recommendations. Compare to current requests: in the deployment. The gap is the rightsizing opportunity.
Two important nuances:
Don't auto-apply VPA recommendations to production. VPA in Auto mode evicts pods to apply new requests; this causes restarts that interrupt user-facing workloads. Use Off mode and apply changes through your normal deployment process.
Right-sizing is workload-shape-aware. A workload with steady CPU usage rightsizes cleanly. A workload with bursty usage (background jobs, batch processing, ML inference with traffic spikes) needs request headroom for the bursts. Don't strip the headroom; size for P95 usage with a small buffer, not P50.
A heuristic that holds for most web-service workloads we have rightsized: request = (P95 usage over 7 days) x 1.3 for both CPU and memory. The 30% buffer absorbs short-term spikes. For burst-heavy workloads, increase the buffer or use the P99.
A typical first-pass rightsizing on an untuned cluster reduces requested capacity by 35-50%. Node count drops proportionally as Cluster Autoscaler scales down.
Lever 2: Idle workload elimination
The second-largest savings come from workloads that should not be running at all.
The categories:
- Dev/test/staging environments running 24/7 that only need to run during business hours
- Forgotten test deployments from initiatives that ended
- Cron jobs that fire but do nothing (a service they call has been retired)
- Ingress controllers, monitoring agents, and service meshes deployed by default but not used by the workloads on this cluster
For dev/test environments, the standard pattern is scheduled scale-down. A CronJob (or KEDA cron-trigger) scales deployments to 0 outside business hours and back to normal during the day:
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-staging
spec:
schedule: "0 19 * * 1-5" # 7pm weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- kubectl
- scale
- deployment
- --all
- --replicas=0
- --namespace=stagingA separate CronJob scales them back at 8 AM. For non-prod environments running 168 hrs/week reduced to 50 hrs/week, the savings are 70%. On a typical staging cluster, this is one of the highest-ROI changes you can make in a day.
Forgotten workloads need a manual audit. The signal we look for: deployments with no recent activity (no pod restarts, no log volume, no CPU usage above noise floor) for 14+ days, owned by a team that has not been actively iterating on them. Each one becomes a Slack ping to the owner: "Is this still in use? If we don't hear back in 5 business days we'll scale to 0; if there's no impact in 14 days after that we'll delete."
This sounds aggressive. In our engagement experience, the actual response rate to "is this still in use" is about 20% — most of the workloads have no current owner because the original team has moved on. The default-to-decommission process is what makes lifecycle work.
Lever 3: Node-group sizing and bin-packing
Once pod requests are right-sized and idle workloads are gone, the remaining cost question is whether the node-group configuration matches the workload mix.
The signals to act on:
- Average node utilization below 50%. Cluster Autoscaler scales up but doesn't scale down efficiently because pods are scattered. Solutions: enable pod-disruption-budget review, deploy Karpenter (which is more aggressive about consolidation than Cluster Autoscaler), or simply right-size the node-group instance type.
- A single oversized node-group serving heterogeneous workloads. A node-group of
m5.4xlargeinstances running a mix of small services and one large service is likely over-provisioning. Split into multiple node-groups by workload size class. - Node-group instance types far from workload requirements. If your pods request 1 CPU / 2 GB and your nodes are
m5.4xlarge(16 CPU / 64 GB), each node fits up to 16 pods but the bin-packing is inefficient at the edges. Smaller instance types waste less.
Karpenter (the AWS-native autoscaler that replaces Cluster Autoscaler for many use cases) handles this better than Cluster Autoscaler in most environments we have migrated. The instance-type selection is more flexible, the consolidation is more aggressive, and the configuration is simpler. For mid-market clusters, the migration to Karpenter is usually a one-week project that produces 10-20% additional savings on top of right-sizing.
We do not recommend Karpenter blindly. There are environments where Cluster Autoscaler's stability is the right trade-off (very large clusters, complex pod-affinity rules, regulated environments where the more-mature CA codebase is preferred). For typical mid-market EKS, Karpenter wins.
Lever 4: Spot + Savings Plan mix
The cheapest compute on AWS is Spot. The most discounted committed compute is Savings Plans / Reserved Instances. The right mix for EKS is workload-dependent.
A defensible default for mid-market EKS:
- Production stateless workloads (web servers, API services): 60-80% on Spot (with the appropriate disruption handling), 20-40% on On-Demand or covered by a Compute Savings Plan
- Production stateful workloads (databases, queues, caches): 100% On-Demand, covered by Reserved Instances or Savings Plans
- Batch workloads (CI runners, data processing, ML training): 90-100% Spot with retry logic
- Dev/test: 100% Spot (these can tolerate interruption)
Spot disruption handling on Kubernetes is well-trodden:
- AWS Node Termination Handler (DaemonSet) listens for Spot interruption notices and gracefully drains the node before termination
- Pod Disruption Budgets prevent too many pods from going down at once
- Multiple instance types and AZs in the Spot pool reduce the probability of large-scale termination events
The transition from "all On-Demand" to a Spot-heavy mix is a 1-2 week project for a typical mid-market cluster. The savings on the Spot-covered portion are 60-90% off On-Demand pricing.
For the On-Demand portion that remains, a Compute Savings Plan covering the steady-state baseline produces another 17-28% discount. We covered Savings Plans broadly in our pillar piece on mid-market FinOps; the EKS-specific consideration is to size the Compute Savings Plan to the steady-state On-Demand portion only, not to the full EKS spend (the Spot portion is not eligible for Savings Plans).
When to bring in specialist tooling
The native EKS toolset (Cluster Autoscaler or Karpenter, VPA, Spot Termination Handler) is sufficient for most mid-market clusters. There are points where specialist tooling pays back:
- CAST AI automates rightsizing and bin-packing across multiple node groups, with a managed control plane. Worth it when EKS spend exceeds ~$50k/month and you do not want to staff someone on the optimization work.
- Spot.io (NetApp) handles Spot-mix optimization across multiple instance types and AZs more aggressively than the native tooling. Similar threshold.
- Kubecost (now part of IBM) provides cost attribution at the namespace and label level. Useful when you need to attribute EKS cost back to teams or products with more granularity than AWS Cost Explorer alone provides.
We have implemented all three across engagements. They are good products. They become economically appropriate at the spend levels above; below them, the savings don't pay back the tool cost plus the engineering time to integrate.
Implementation order
If you are starting from an untuned EKS cluster:
- Week 1: Deploy VPA in recommendation mode. Schedule scale-down for non-prod environments. Quick lifecycle audit for obviously-idle workloads.
- Week 2: Apply VPA recommendations for the top 20 deployments by spend. Measure node count change.
- Week 3: Migrate from Cluster Autoscaler to Karpenter (if appropriate). Right-size node-group instance types.
- Week 4: Move appropriate production workloads to Spot. Set up Spot Termination Handler. Apply Compute Savings Plan to the remaining On-Demand baseline.
After this 4-week pass, the cluster runs ongoing with the practices in our mid-market FinOps framework — monthly lifecycle audits, quarterly rightsizing reviews, ongoing Spot-mix tuning.
Typical first-pass result on a previously-untuned cluster: 35-55% EKS cost reduction. Sustained savings; not a one-time hit.
Where this advice doesn't fit
- Stateful workloads at scale. Databases, message queues, and other stateful workloads benefit from rightsizing but not from Spot. The framework still applies; the levers shift.
- GPU workloads. GPU instance types have different Spot dynamics, different rightsizing concerns, and different bin-packing constraints. We will write more on this separately.
- Very small clusters. Below ~$5k/month EKS spend, the optimization work doesn't pay back. Run it as it is; revisit when scale justifies.
- Highly-regulated environments. The agility this framework assumes (deploying VPA, switching to Karpenter, mixing Spot) requires change-management throughput that some regulated environments do not have. Adapt the timeline.
FAQ
Q: Will VPA recommendations conflict with Horizontal Pod Autoscaler (HPA)?
Yes, if you use both for the same metric (CPU or memory). The standard pattern is HPA on a custom metric (queue depth, request rate) and VPA on resource requests, or HPA on CPU/memory and VPA in recommendation-only mode that you apply manually. Don't run both in Auto mode on the same workload.
Q: How do we handle services that need to be on-demand for compliance reasons but the framework recommends Spot? Don't put them on Spot. The framework is a default; deviation for specific compliance, latency, or stability requirements is fine. Document why.
Q: Should we use AWS Karpenter or a different autoscaler on EKS? Karpenter for most cases. Cluster Autoscaler if you have specific reasons for stability or compatibility. Spot.io if you want managed multi-AZ Spot optimization without engineering investment.
Q: How does this differ for GKE or AKS? The four levers apply. The specific tooling differs (GKE uses Cluster Autoscaler, GCP has Spot VMs; AKS uses Cluster Autoscaler, Azure has Spot VMs). The patterns are portable; substitute the cloud-specific implementations.
Q: Does this work for EKS Fargate? Partially. Fargate eliminates node-group sizing and Spot decisions (they don't exist). Pod rightsizing still matters because Fargate bills per pod resources requested. Lifecycle still matters. The framework is half-applicable.
*For broader mid-market FinOps practice this fits into, see our pillar on mid-market FinOps. For an engagement-level look at where this has worked, see our case study on a 31% AWS bill reduction.*

