Skip to main content
FinOps

Cloud Cost Optimization for Kubernetes: Rightsizing Pods and Nodes

Mohakdeep Singh|November 3, 2025|9 min read
Cloud Cost Optimization for Kubernetes: Rightsizing Pods and Nodes

The Kubernetes Cost Problem

Kubernetes makes it easy to deploy applications. It also makes it easy to waste money. Developers request "generous" CPU and memory for their pods because under-provisioning causes crashes, while over-provisioning has no visible consequence -- until the cloud bill arrives.

Studies consistently show that Kubernetes clusters run at 20-40% average utilization. That means 60-80% of your compute spend is waste. This guide covers practical techniques to close that gap.

Understanding Kubernetes Resource Model

Requests vs Limits

Requests: The guaranteed resources Kubernetes reserves for your pod. This is what the scheduler uses to place pods on nodes. Over-requesting means nodes fill up quickly, forcing you to add more nodes than needed.

Limits: The maximum resources your pod can use. Setting limits too low causes CPU throttling and OOM kills. Setting them too high (or not setting them) allows runaway processes.

The gap between requests and actual usage is your optimization opportunity.

The Real Cost Driver

Your cloud bill is based on node count and size, not pod resource requests. But pod requests determine how many pods fit on each node, which determines how many nodes you need.

If your pods request 2 CPU / 4 GB but actually use 0.5 CPU / 1 GB, you are paying for 4x the compute you need.

Pod Rightsizing

Step 1: Measure Actual Usage

Before changing anything, collect utilization data: - Deploy Prometheus with kube-state-metrics and node-exporter - Collect CPU and memory usage per pod over at least 14 days (to capture weekly patterns) - Track P50, P95, and P99 usage -- not just averages - Note any periodic spikes (batch jobs, deployments, traffic peaks)

Step 2: Identify Over-Provisioned Pods

Look for pods where: - CPU request is more than 2x the P95 CPU usage - Memory request is more than 1.5x the P95 memory usage - Pods have limits set to values they never approach

Step 3: Set Optimal Requests

Recommended formula: - CPU request: P95 usage + 20% buffer - Memory request: P95 usage + 25% buffer (memory spikes can cause OOM kills, so be more conservative) - CPU limit: 2-3x the request (or no limit if your cluster has resource quotas) - Memory limit: 1.5x the request (hard cap to prevent OOM at the node level)

Step 4: Use VPA for Automation

The Vertical Pod Autoscaler (VPA) automates rightsizing: - Recommendation mode: VPA suggests optimal requests based on observed usage - Auto mode: VPA automatically adjusts pod requests (requires pod restart) - Start with recommendation mode, review suggestions, then enable auto mode for stable workloads

Node Optimization

Right Node Types

Choose instance types that match your workload profile: - Compute-optimized (c-series): For CPU-intensive workloads (API servers, data processing) - Memory-optimized (r-series): For memory-heavy workloads (databases, caches, JVM applications) - General-purpose (m-series): For mixed workloads -- the safest default - ARM-based (Graviton, Ampere): 20-40% cheaper for compatible workloads

Cluster Autoscaler Configuration

The Cluster Autoscaler adds and removes nodes based on demand. Optimize its configuration: - Set scale-down delay to 10 minutes (avoid thrashing) - Configure scale-down utilization threshold to 50% (remove nodes below 50% usage) - Use multiple node groups with different instance types for workload diversity - Set appropriate minimum and maximum node counts per group

Spot/Preemptible Nodes

For fault-tolerant workloads, spot instances offer 60-90% savings: - Run stateless application pods on spot nodes - Keep stateful workloads (databases, message queues) on on-demand nodes - Use pod topology spread constraints to distribute across spot and on-demand nodes - Configure pod disruption budgets to handle spot interruptions gracefully

Namespace-Level Governance

Resource Quotas

Set resource quotas per namespace to prevent any team from consuming unbounded resources: - Total CPU and memory requests per namespace - Maximum number of pods per namespace - Storage request limits per namespace

Limit Ranges

Set default requests and limits for pods that do not specify them: - Default CPU request: 100m - Default memory request: 128Mi - Maximum CPU per pod: 4 cores - Maximum memory per pod: 8Gi

This prevents both under-provisioned pods (no requests) and over-provisioned pods (requesting 64 GB for a simple API).

Cost Visibility Tools

Kubecost

Open-source Kubernetes cost monitoring: - Per-namespace, per-deployment, per-pod cost allocation - Efficiency metrics (CPU and memory utilization vs. requests) - Savings recommendations based on actual usage patterns - Integration with cloud billing for accurate cost attribution

Cloud Provider Tools

  • AWS: EKS cost monitoring in Cost Explorer with split cost allocation
  • Azure: AKS cost analysis in Azure Cost Management
  • GCP: GKE usage metering with BigQuery integration

Quick Wins Checklist

  1. Remove idle workloads: Delete deployments with zero traffic for 30+ days
  2. Shut down dev/staging at night: Scale non-production namespaces to zero outside business hours
  3. Right-size the top 10 pods: Focus on the largest resource consumers first
  4. Enable Cluster Autoscaler: Ensure nodes are removed when no longer needed
  5. Add spot nodes: Move stateless workloads to spot instances

At Optivulnix, Kubernetes cost optimization is a specialty within our FinOps practice. We typically find 30-50% savings in Kubernetes infrastructure costs. Contact us for a free cluster cost assessment.

Stay Updated

Get the latest cloud optimization insights delivered to your inbox.

Ready to Transform Your Cloud Infrastructure?

Join 100+ companies that have reduced their cloud costs by 30-60% with our AI-powered optimization platform.

Schedule Your Free Consultation