The Kubernetes Cost Problem
Kubernetes makes it easy to deploy applications. It also makes it easy to waste money. Developers request "generous" CPU and memory for their pods because under-provisioning causes crashes, while over-provisioning has no visible consequence -- until the cloud bill arrives.
Studies consistently show that Kubernetes clusters run at 20-40% average utilization. That means 60-80% of your compute spend is waste. This guide covers practical techniques to close that gap.
Understanding Kubernetes Resource Model
Requests vs Limits
Requests: The guaranteed resources Kubernetes reserves for your pod. This is what the scheduler uses to place pods on nodes. Over-requesting means nodes fill up quickly, forcing you to add more nodes than needed.
Limits: The maximum resources your pod can use. Setting limits too low causes CPU throttling and OOM kills. Setting them too high (or not setting them) allows runaway processes.
The gap between requests and actual usage is your optimization opportunity.
The Real Cost Driver
Your cloud bill is based on node count and size, not pod resource requests. But pod requests determine how many pods fit on each node, which determines how many nodes you need.
If your pods request 2 CPU / 4 GB but actually use 0.5 CPU / 1 GB, you are paying for 4x the compute you need.
Pod Rightsizing
Step 1: Measure Actual Usage
Before changing anything, collect utilization data: - Deploy Prometheus with kube-state-metrics and node-exporter - Collect CPU and memory usage per pod over at least 14 days (to capture weekly patterns) - Track P50, P95, and P99 usage -- not just averages - Note any periodic spikes (batch jobs, deployments, traffic peaks)
Step 2: Identify Over-Provisioned Pods
Look for pods where: - CPU request is more than 2x the P95 CPU usage - Memory request is more than 1.5x the P95 memory usage - Pods have limits set to values they never approach
Step 3: Set Optimal Requests
Recommended formula: - CPU request: P95 usage + 20% buffer - Memory request: P95 usage + 25% buffer (memory spikes can cause OOM kills, so be more conservative) - CPU limit: 2-3x the request (or no limit if your cluster has resource quotas) - Memory limit: 1.5x the request (hard cap to prevent OOM at the node level)
Step 4: Use VPA for Automation
The Vertical Pod Autoscaler (VPA) automates rightsizing: - Recommendation mode: VPA suggests optimal requests based on observed usage - Auto mode: VPA automatically adjusts pod requests (requires pod restart) - Start with recommendation mode, review suggestions, then enable auto mode for stable workloads
Node Optimization
Right Node Types
Choose instance types that match your workload profile: - Compute-optimized (c-series): For CPU-intensive workloads (API servers, data processing) - Memory-optimized (r-series): For memory-heavy workloads (databases, caches, JVM applications) - General-purpose (m-series): For mixed workloads -- the safest default - ARM-based (Graviton, Ampere): 20-40% cheaper for compatible workloads
Cluster Autoscaler Configuration
The Cluster Autoscaler adds and removes nodes based on demand. Optimize its configuration: - Set scale-down delay to 10 minutes (avoid thrashing) - Configure scale-down utilization threshold to 50% (remove nodes below 50% usage) - Use multiple node groups with different instance types for workload diversity - Set appropriate minimum and maximum node counts per group
Spot/Preemptible Nodes
For fault-tolerant workloads, spot instances offer 60-90% savings: - Run stateless application pods on spot nodes - Keep stateful workloads (databases, message queues) on on-demand nodes - Use pod topology spread constraints to distribute across spot and on-demand nodes - Configure pod disruption budgets to handle spot interruptions gracefully
Namespace-Level Governance
Resource Quotas
Set resource quotas per namespace to prevent any team from consuming unbounded resources: - Total CPU and memory requests per namespace - Maximum number of pods per namespace - Storage request limits per namespace
Limit Ranges
Set default requests and limits for pods that do not specify them: - Default CPU request: 100m - Default memory request: 128Mi - Maximum CPU per pod: 4 cores - Maximum memory per pod: 8Gi
This prevents both under-provisioned pods (no requests) and over-provisioned pods (requesting 64 GB for a simple API).
Cost Visibility Tools
Kubecost
Open-source Kubernetes cost monitoring: - Per-namespace, per-deployment, per-pod cost allocation - Efficiency metrics (CPU and memory utilization vs. requests) - Savings recommendations based on actual usage patterns - Integration with cloud billing for accurate cost attribution
Cloud Provider Tools
- AWS: EKS cost monitoring in Cost Explorer with split cost allocation
- Azure: AKS cost analysis in Azure Cost Management
- GCP: GKE usage metering with BigQuery integration
Quick Wins Checklist
- Remove idle workloads: Delete deployments with zero traffic for 30+ days
- Shut down dev/staging at night: Scale non-production namespaces to zero outside business hours
- Right-size the top 10 pods: Focus on the largest resource consumers first
- Enable Cluster Autoscaler: Ensure nodes are removed when no longer needed
- Add spot nodes: Move stateless workloads to spot instances
Advanced Rightsizing Strategies for Multi-Cluster Environments
Enterprises operating across multiple Kubernetes clusters -- spanning dev, staging, production, and disaster recovery -- face a compounded cost challenge. What works for a single cluster often falls apart when platform teams manage dozens of clusters across AWS EKS, Azure AKS, or GCP GKE.
Federated Cost Policies
Rather than configuring rightsizing rules per cluster, adopt a federated policy model:
- Define organization-wide resource quota templates that enforce baseline efficiency standards
- Use policy-as-code tools like OPA/Gatekeeper or Kyverno to reject pod specs that request more than 4x their historical P95 usage
- Create tiered policies: strict limits for production (where waste is most expensive), relaxed limits for development (where developer velocity matters more)
- Automate policy distribution across clusters through GitOps pipelines, ensuring every new cluster inherits your cost governance from day one
This approach integrates naturally with a broader GitOps workflow using ArgoCD, where cluster configurations and policies are version-controlled and auditable.
Workload-Aware Bin Packing
Default Kubernetes scheduling optimizes for spreading pods across nodes, which improves availability but hurts cost efficiency. For non-critical workloads, consider bin-packing strategies:
- Use pod priority classes to distinguish between latency-sensitive services and background jobs
- Configure the scheduler to prefer filling existing nodes before provisioning new ones
- Run batch and cron workloads on a dedicated node pool with aggressive scale-down (5-minute idle timeout)
- Use pod topology spread constraints selectively -- spread production pods but allow dev pods to pack tightly
Cost Allocation by Team and Business Unit
Rightsizing savings are only meaningful if teams can see the impact. Implement a tagging and labeling strategy that enables per-team cost attribution:
- Enforce mandatory Kubernetes labels: team, environment, cost-center, and application
- Configure Kubecost or your cloud provider's cost tools to aggregate by these labels
- Publish weekly cost-per-team dashboards showing both absolute spend and efficiency ratios
- Set per-team efficiency targets (e.g., minimum 60% CPU utilization across team namespaces)
Organizations in Europe and the Middle East often run hybrid Kubernetes environments spanning on-premises and cloud. In these cases, normalizing cost metrics across environments is critical -- use a common unit like "cost per request" or "cost per transaction" rather than raw infrastructure spend.
Kubernetes FinOps for Regulated Industries
Enterprises in financial services, healthcare, and government face additional constraints that impact Kubernetes cost optimization decisions.
Compliance Considerations
- Data residency: Pods processing regulated data must run in specific regions, limiting your ability to chase the cheapest compute. Factor this into rightsizing -- you may need to optimize more aggressively within constrained regions
- Audit trails: Every change to resource requests and limits should be tracked. Use admission webhooks to log all pod spec modifications with timestamps and approvers
- Separation of duties: The team recommending rightsizing changes should not be the same team approving them in production. Build approval gates into your CI/CD pipeline
Building a Sustainable Rightsizing Practice
One-time rightsizing delivers short-lived savings. Application resource profiles change with every deployment, and without ongoing discipline, waste creeps back within months. Build sustainability into your practice:
- Schedule quarterly rightsizing reviews aligned with your FinOps review cadence
- Assign a cost champion within each platform team who owns efficiency metrics
- Integrate VPA recommendations into pull request workflows so developers see cost impact before merging
- Track month-over-month efficiency trends, not just point-in-time savings
- Celebrate wins -- teams that achieve meaningful efficiency improvements should be recognized
The most successful enterprises treat Kubernetes cost optimization not as a project but as a continuous engineering discipline embedded into how teams build and operate services.
Building a Kubernetes Cost Optimization Practice
Kubernetes cost optimization is not a one-time exercise. Build it into your engineering culture through these practices:
Monthly rightsizing reviews: Schedule monthly reviews of resource utilization data with each team. Make it a collaborative exercise, not a top-down audit. Share dashboards showing request-to-usage ratios and help teams adjust their resource specifications. Over time, teams internalize the habit of right-sizing their resources proactively.
Cost visibility per namespace: Use tools like Kubecost or OpenCost to break down cluster costs by namespace and label. Share these costs with engineering teams so they understand the financial impact of their architectural decisions. When developers see that their over-provisioned staging environment costs as much as production, they are motivated to optimize.
Automated recommendations: Deploy admission webhooks or policy engines that recommend optimal resource requests based on historical usage patterns. These recommendations surface during pull request reviews, catching over-provisioned resource specifications before they reach production. Combine this with consistent tagging to attribute costs accurately and drive accountability across teams.
At Optivulnix, Kubernetes cost optimization is a specialty within our FinOps practice. We typically find 30-50% savings in Kubernetes infrastructure costs. Contact us for a free cluster cost assessment.

