What Kubernetes Cost Optimization Actually Involves
Kubernetes cost optimization is the practice of aligning the compute resources provisioned in a cluster with the actual demand from running workloads. It involves four distinct levers: resource right-sizing, bin-packing efficiency, procurement model selection, and autoscaling strategy.
Most published content covers these levers individually. The gap in mid-market Kubernetes cost conversations is sequencing — which levers to pull first, in what order, and how to measure progress without a dedicated FinOps team.
This post describes a systematic approach for engineering teams running AWS EKS or GCP GKE at the 50-500 person company stage.
Why Kubernetes Bills Surprise Mid-Market Teams
Kubernetes abstracts the relationship between your application and the compute it runs on. A team deploys services by setting CPU and memory requests; the cluster scheduler figures out where to run them. This abstraction is powerful, but it means the link between "how much does this service actually use" and "how much are we paying for it" is invisible without deliberate tooling.
The result: mid-market teams typically discover they are running clusters at 25-40% average CPU utilization. They have paid for three times the compute they actually need, and the signal has been sitting in their cluster metrics for months, unread.
The Four-Lever Kubernetes Cost Model
Lever 1: Right-Sizing Resource Requests
Kubernetes allows containers to specify CPU and memory requests (the minimum guaranteed resources the scheduler reserves) and limits (the maximum allowed). Most production clusters at mid-market companies have resource requests set significantly higher than actual utilization — set once during initial deployment and never revisited.
Right-sizing requires three inputs: actual CPU and memory usage from Prometheus or cloud-native metrics, current request and limit settings from your configuration repository, and p95 usage patterns rather than averages. Right-sizing to average usage causes OOM kills during traffic spikes.
The Vertical Pod Autoscaler in recommendation mode is the most practical tool for generating right-sizing suggestions without automated changes. Run it in recommendation mode for 7-14 days before acting on its output. VPA recommendations on workloads with irregular traffic can be misleading in the first week.
A consistent finding across our right-sizing engagements: memory requests are frequently set 3-5x actual p99 usage. A 10-20% reduction in cluster node count is achievable from right-sizing alone before touching anything else.
Lever 2: Bin Packing and Node Utilization
Bin packing efficiency is the ratio of actual resource usage to provisioned node capacity. A cluster with 30% average CPU utilization is paying for resources that sit idle. At the scale of $80,000-$200,000 monthly cloud bills, this idle capacity is typically the largest single line item in the waste analysis.
Karpenter on AWS EKS: Karpenter provisions nodes in response to pending pods and terminates nodes when workloads scale down. Unlike the Cluster Autoscaler, Karpenter selects instance types that best match pending pod requirements — enabling significantly better bin packing. For EKS clusters on Kubernetes 1.24 or later, Karpenter is the current recommended approach over Cluster Autoscaler.
Node Auto Provisioning on GCP GKE: GKE Autopilot mode shifts node management entirely to Google, eliminating the bin packing problem for teams that do not need fine-grained node configuration control. For new GKE workloads without custom node requirements, Autopilot is worth evaluating.
Target cluster CPU utilization: 60-70% average. Consistently below 50% means right-sizing has not been completed or autoscaling is configured too aggressively.
Lever 3: Spot and Preemptible Instances
Spot instances on AWS and preemptible VMs on GCP offer 60-80% discounts compared to on-demand pricing in exchange for interruption risk — 2-minute notice on AWS Spot, 30-second notice on GCP preemptible.
Appropriate for spot/preemptible: - Batch processing and data pipelines with checkpointing - CI/CD runner nodes - Stateless microservices with graceful shutdown handling and fast restart - All development and staging workloads
Not appropriate for spot/preemptible: - Stateful workloads without interruption-aware handling - Any service where a 2-minute interruption causes data inconsistency - Services with startup times longer than 90 seconds
A practical split for mid-market EKS clusters: 70-80% of batch and non-critical services on spot, 100% on-demand for system workloads and critical-path services. This typically yields 30-40% compute cost reduction on the migrated workload subset.
Lever 4: Autoscaling to Match Demand
Horizontal Pod Autoscaler scales pod replicas based on metrics. The most common HPA configuration mistake at mid-market: using CPU utilization as the scaling metric for I/O-bound services where CPU is never the bottleneck. For services that wait on database queries or external API calls, CPU stays low regardless of request volume — the service does not scale when it should.
For I/O-bound services, use queue depth (from your load balancer or message broker) or custom application metrics via the Prometheus adapter. KEDA enables event-driven autoscaling based on Kafka consumer lag, SQS queue depth, or other external signals. For any workload that processes queued messages, KEDA produces better scaling behavior than CPU-based HPA and enables scale-to-zero when idle — eliminating the cost of pods sitting at minimum replica count during off-hours.
Implementation Sequencing
Days 1-30: Run VPA in recommendation mode. Do not apply changes automatically. After 14 days, review recommendations for your top 10 highest-cost workloads. Apply updated resource requests in a staging environment first, then roll out to production. Measure the reduction in node count after one week.
Days 30-60: Enable Karpenter or equivalent bin packing tooling. Consolidate fragmented node pools. Target: cluster CPU utilization above 55%.
Days 60-90: Migrate batch and non-critical workloads to spot node groups. Run 14 days in parallel with on-demand before directing traffic exclusively to spot. Configure Karpenter to maintain a fallback to on-demand if spot capacity is unavailable.
Where This Framework Breaks
Right-sizing breaks for workloads with extreme traffic variance. A service that idles at 0.01 CPU and spikes to 4 CPU in burst mode needs headroom, not reduction. VPA recommendations for these workloads will underestimate peak requirements.
Spot migration is a bad fit for any stateful workload that has not been built for interruption. We have seen companies migrate database-adjacent workloads to spot to reduce costs and create data corruption incidents within the first week.
Karpenter requires EKS 1.24 or later. Teams on older cluster versions need to upgrade before adopting it. Cluster version upgrades on production environments with significant workloads are non-trivial operations that deserve their own planning.
Frequently Asked Questions
What is a realistic cost reduction from Kubernetes right-sizing? In our engagements with mid-market EKS and GKE clusters, right-sizing alone yields 15-25% compute cost reduction. Combining all four levers — right-sizing, improved bin packing, spot migration for eligible workloads, and demand-responsive autoscaling — typically yields 35-55% total reduction compared to unoptimized baselines.
When should we migrate from Cluster Autoscaler to Karpenter on EKS? For any new EKS cluster, use Karpenter from the start. For existing clusters, the migration is worthwhile when you have a platform engineer available for 3-4 days of focused work and your cluster runs at least 20 nodes — below that scale, the bin packing improvements are real but the operational investment is harder to justify.
How do we attribute Kubernetes costs across multiple engineering teams? Namespace-based cost attribution is the standard approach. Kubecost and OpenCost both provide breakdowns by namespace with minimal configuration and are open-source. Both are significantly cheaper than enterprise Kubernetes cost management platforms for mid-market scale.
Is it safe to run all workloads on spot instances? No. Stateful workloads, services with long warm-up periods, and any service where a 2-minute interruption causes customer-facing issues should run on on-demand or reserved nodes. The economics of spot are compelling for the right workloads; the operational risk is unacceptable for the wrong ones.
If you want a structured review of your EKS or GKE cost baseline and a prioritized optimization plan, contact us for a free Kubernetes cost assessment.

