Cloud Cost Optimization for Kubernetes: Rightsizing Pods and Nodes

The Kubernetes Cost Problem

Kubernetes makes it easy to deploy applications. It also makes it easy to waste money. Developers request "generous" CPU and memory for their pods because under-provisioning causes crashes, while over-provisioning has no visible consequence — until the cloud bill arrives.

Studies consistently show that Kubernetes clusters run at 20-40% average utilization. That means 60-80% of your compute spend is waste. This guide covers practical techniques to close that gap.

Understanding Kubernetes Resource Model

Requests vs Limits

Requests: The guaranteed resources Kubernetes reserves for your pod. This is what the scheduler uses to place pods on nodes. Over-requesting means nodes fill up quickly, forcing you to add more nodes than needed.

Limits: The maximum resources your pod can use. Setting limits too low causes CPU throttling and OOM kills. Setting them too high (or not setting them) allows runaway processes.

The gap between requests and actual usage is your optimization opportunity.

The Real Cost Driver

Your cloud bill is based on node count and size, not pod resource requests. But pod requests determine how many pods fit on each node, which determines how many nodes you need.

If your pods request 2 CPU / 4 GB but actually use 0.5 CPU / 1 GB, you are paying for 4x the compute you need.

Pod Rightsizing

Step 1: Measure Actual Usage

Before changing anything, collect utilization data: - Deploy Prometheus with kube-state-metrics and node-exporter - Collect CPU and memory usage per pod over at least 14 days (to capture weekly patterns) - Track P50, P95, and P99 usage — not just averages - Note any periodic spikes (batch jobs, deployments, traffic peaks)

Step 2: Identify Over-Provisioned Pods

Look for pods where: - CPU request is more than 2x the P95 CPU usage - Memory request is more than 1.5x the P95 memory usage - Pods have limits set to values they never approach

Step 3: Set Optimal Requests

Recommended formula: - CPU request: P95 usage + 20% buffer - Memory request: P95 usage + 25% buffer (memory spikes can cause OOM kills, so be more conservative) - CPU limit: 2-3x the request (or no limit if your cluster has resource quotas) - Memory limit: 1.5x the request (hard cap to prevent OOM at the node level)

Step 4: Use VPA for Automation

The Vertical Pod Autoscaler (VPA) automates rightsizing: - Recommendation mode: VPA suggests optimal requests based on observed usage - Auto mode: VPA automatically adjusts pod requests (requires pod restart) - Start with recommendation mode, review suggestions, then enable auto mode for stable workloads

Node Optimization

Right Node Types

Choose instance types that match your workload profile: - Compute-optimized (c-series): For CPU-intensive workloads (API servers, data processing) - Memory-optimized (r-series): For memory-heavy workloads (databases, caches, JVM applications) - General-purpose (m-series): For mixed workloads — the safest default - ARM-based (Graviton, Ampere): 20-40% cheaper for compatible workloads

Cluster Autoscaler Configuration

The Cluster Autoscaler adds and removes nodes based on demand. Optimize its configuration: - Set scale-down delay to 10 minutes (avoid thrashing) - Configure scale-down utilization threshold to 50% (remove nodes below 50% usage) - Use multiple node groups with different instance types for workload diversity - Set appropriate minimum and maximum node counts per group

Spot/Preemptible Nodes

For fault-tolerant workloads, spot instances offer 60-90% savings: - Run stateless application pods on spot nodes - Keep stateful workloads (databases, message queues) on on-demand nodes - Use pod topology spread constraints to distribute across spot and on-demand nodes - Configure pod disruption budgets to handle spot interruptions gracefully

Namespace-Level Governance

Resource Quotas

Set resource quotas per namespace to prevent any team from consuming unbounded resources: - Total CPU and memory requests per namespace - Maximum number of pods per namespace - Storage request limits per namespace

Limit Ranges

Set default requests and limits for pods that do not specify them: - Default CPU request: 100m - Default memory request: 128Mi - Maximum CPU per pod: 4 cores - Maximum memory per pod: 8Gi

This prevents both under-provisioned pods (no requests) and over-provisioned pods (requesting 64 GB for a simple API).

Cost Visibility Tools

Kubecost

Open-source Kubernetes cost monitoring: - Per-namespace, per-deployment, per-pod cost allocation - Efficiency metrics (CPU and memory utilization vs. requests) - Savings recommendations based on actual usage patterns - Integration with cloud billing for accurate cost attribution

Cloud Provider Tools

AWS: EKS cost monitoring in Cost Explorer with split cost allocation
Azure: AKS cost analysis in Azure Cost Management
GCP: GKE usage metering with BigQuery integration

Quick Wins Checklist

Remove idle workloads: Delete deployments with zero traffic for 30+ days
Shut down dev/staging at night: Scale non-production namespaces to zero outside business hours
Right-size the top 10 pods: Focus on the largest resource consumers first
Enable Cluster Autoscaler: Ensure nodes are removed when no longer needed
Add spot nodes: Move stateless workloads to spot instances

Advanced Rightsizing Strategies for Multi-Cluster Environments

Enterprises operating across multiple Kubernetes clusters — spanning dev, staging, production, and disaster recovery — face a compounded cost challenge. What works for a single cluster often falls apart when platform teams manage dozens of clusters across AWS EKS, Azure AKS, or GCP GKE.

Federated Cost Policies

Rather than configuring rightsizing rules per cluster, adopt a federated policy model:

Define organization-wide resource quota templates that enforce baseline efficiency standards
Use policy-as-code tools like OPA/Gatekeeper or Kyverno to reject pod specs that request more than 4x their historical P95 usage
Create tiered policies: strict limits for production (where waste is most expensive), relaxed limits for development (where developer velocity matters more)
Automate policy distribution across clusters through GitOps pipelines, ensuring every new cluster inherits your cost governance from day one

This approach integrates naturally with a broader GitOps workflow using ArgoCD, where cluster configurations and policies are version-controlled and auditable.

Workload-Aware Bin Packing

Default Kubernetes scheduling optimizes for spreading pods across nodes, which improves availability but hurts cost efficiency. For non-critical workloads, consider bin-packing strategies:

Use pod priority classes to distinguish between latency-sensitive services and background jobs
Configure the scheduler to prefer filling existing nodes before provisioning new ones
Run batch and cron workloads on a dedicated node pool with aggressive scale-down (5-minute idle timeout)
Use pod topology spread constraints selectively — spread production pods but allow dev pods to pack tightly

Cost Allocation by Team and Business Unit

Rightsizing savings are only meaningful if teams can see the impact. Implement a tagging and labeling strategy that enables per-team cost attribution:

Enforce mandatory Kubernetes labels: team, environment, cost-center, and application
Configure Kubecost or your cloud provider's cost tools to aggregate by these labels
Publish weekly cost-per-team dashboards showing both absolute spend and efficiency ratios
Set per-team efficiency targets (e.g., minimum 60% CPU utilization across team namespaces)

Organizations in Europe and the Middle East often run hybrid Kubernetes environments spanning on-premises and cloud. In these cases, normalizing cost metrics across environments is critical — use a common unit like "cost per request" or "cost per transaction" rather than raw infrastructure spend.

Kubernetes FinOps for Regulated Industries

Enterprises in financial services, healthcare, and government face additional constraints that impact Kubernetes cost optimization decisions.

Compliance Considerations

Data residency: Pods processing regulated data must run in specific regions, limiting your ability to chase the cheapest compute. Factor this into rightsizing — you may need to optimize more aggressively within constrained regions
Audit trails: Every change to resource requests and limits should be tracked. Use admission webhooks to log all pod spec modifications with timestamps and approvers
Separation of duties: The team recommending rightsizing changes should not be the same team approving them in production. Build approval gates into your CI/CD pipeline

Building a Sustainable Rightsizing Practice

One-time rightsizing delivers short-lived savings. Application resource profiles change with every deployment, and without ongoing discipline, waste creeps back within months. Build sustainability into your practice:

Schedule quarterly rightsizing reviews aligned with your FinOps review cadence
Assign a cost champion within each platform team who owns efficiency metrics
Integrate VPA recommendations into pull request workflows so developers see cost impact before merging
Track month-over-month efficiency trends, not just point-in-time savings
Celebrate wins — teams that achieve meaningful efficiency improvements should be recognized

The most successful enterprises treat Kubernetes cost optimization not as a project but as a continuous engineering discipline embedded into how teams build and operate services.

Building a Kubernetes Cost Optimization Practice

Kubernetes cost optimization is not a one-time exercise. Build it into your engineering culture through these practices:

Monthly rightsizing reviews: Schedule monthly reviews of resource utilization data with each team. Make it a collaborative exercise, not a top-down audit. Share dashboards showing request-to-usage ratios and help teams adjust their resource specifications. Over time, teams internalize the habit of right-sizing their resources proactively.

Cost visibility per namespace: Use tools like Kubecost or OpenCost to break down cluster costs by namespace and label. Share these costs with engineering teams so they understand the financial impact of their architectural decisions. When developers see that their over-provisioned staging environment costs as much as production, they are motivated to optimize.

Automated recommendations: Deploy admission webhooks or policy engines that recommend optimal resource requests based on historical usage patterns. These recommendations surface during pull request reviews, catching over-provisioned resource specifications before they reach production. Combine this with consistent tagging to attribute costs accurately and drive accountability across teams.

At Optivulnix, Kubernetes cost optimization is a specialty within our FinOps practice. We typically find 30-50% savings in Kubernetes infrastructure costs. Contact us for a free cluster cost assessment.