NAT Gateway Cost Optimization for High-Traffic Mid-Market AWS Workloads

NAT Gateway is the silent line item that shows up on the third page of the bill and quietly costs more than the EC2 fleet it serves. For data-heavy mid-market workloads — think 50-500 person companies running event pipelines, ETL jobs, container clusters with chatty sidecars, or analytics platforms — the combination of $0.045 per hour per gateway plus $0.045 per GB processed turns into five to six figures a month with nothing to show for it. The fix is rarely "buy something." It is almost always re-routing traffic that should never have touched NAT in the first place. This piece walks through when NAT Gateway costs become disproportionate, how Gateway VPC Endpoints and Interface VPC Endpoints change the math, when NAT instances are still defensible, and the hub-and-spoke patterns that decide your bill at multi-region scale.

Why NAT Gateway becomes the silent killer

NAT Gateway is one of those AWS services that gets provisioned during the initial VPC build, never gets reviewed, and compounds quietly. Three things conspire to make it expensive for mid-market teams.

First, the per-GB processing fee applies to every byte that flows through it — inbound and outbound. A common misconception is that NAT only charges egress. It does not. Traffic from a private subnet to a public service hits the gateway in both directions, and both directions are billed.

Second, most VPCs route all outbound traffic from private subnets through NAT by default. That includes S3 reads, DynamoDB queries, SQS polling, ECR image pulls, CloudWatch log shipping, and Secrets Manager fetches. These are same-region AWS API calls. None of them need NAT. All of them are billed through it.

Third, mid-market workloads tend to be data-heavy in ways that are invisible to the team that wrote the application. A Kafka consumer pulling from MSK to write enriched events to S3 can push 2-3 TB per day through NAT without a single engineer noticing. A nightly Spark job hitting S3 can spike the bill by 30% in a week.

We have walked into engagements where the NAT Gateway line item was the single largest networking cost on the bill — larger than data transfer out, larger than load balancers, larger than VPN. In one case, $38,000 per month going to NAT for traffic that was 90% S3 reads from the same region.

The cost math at three traffic tiers

The hourly fee is fixed and minor. The per-GB processing fee is what hurts. Here is the bill at three common mid-market traffic profiles, assuming a typical 3-AZ deployment with three NAT Gateways and the standard us-east-1 list rate of $0.045/hour + $0.045/GB processed. Verify in your region — rates differ in Mumbai, Frankfurt, and Sao Paulo.

Daily egress through NAT	Hourly cost (3 NAT GWs)	Processing cost	Monthly total
100 GB/day	~$97	~$135	~$232
1 TB/day	~$97	~$1,350	~$1,447
10 TB/day	~$97	~$13,500	~$13,597

At 100 GB/day, NAT is a rounding error and not worth optimizing. At 1 TB/day, it is real money and the VPC Endpoint conversation pays for itself in two to four weeks. At 10 TB/day, it is the most important conversation your platform team can have this quarter.

Most mid-market teams we see fall between 500 GB and 5 TB per day across all workloads, and the bill consolidates to $5,000-$50,000 per month. The savings from re-routing same-region AWS traffic to endpoints typically lands at 60-85% of that figure within six weeks.

When NAT Gateway cost feels disproportionate

A few patterns tell us the bill is bigger than it should be. We look for these in the first cost review of every engagement.

The NAT processing line on Cost Explorer is rising faster than EC2 or RDS spend. That almost always means the application is shipping more data to S3, DynamoDB, or a third-party API and the traffic is going the wrong way.

The ratio of NAT processing GB to EC2 data-out GB is greater than 2:1. EC2 data-out is what your users see. NAT processing is mostly internal AWS API traffic. If internal exceeds external by that margin, you have a routing problem, not a traffic problem.

CloudWatch logs cost is climbing alongside NAT cost. That is the giveaway that log shipping from private subnets is routing through NAT instead of an Interface Endpoint.

Kubernetes clusters with no VPC Endpoints configured. EKS and self-managed K8s pull container images, ship logs, fetch secrets, and write metrics — all of which go through NAT by default and add up to surprising numbers at scale.

ECR pulls correlated with deployment activity. Every pod restart pulls an image. Every image pull goes through NAT unless you have an Interface Endpoint for ECR.

VPC Endpoints: the lever that does most of the work

VPC Endpoints let traffic from your private subnets reach AWS services without traversing the public internet — and without traversing NAT. There are two flavors, and the distinction matters.

Gateway Endpoints (free)

Gateway VPC Endpoints exist for two services: S3 and DynamoDB. They cost nothing — no hourly fee, no per-GB fee. You attach them to a route table and traffic to those services routes through the endpoint instead of the NAT Gateway.

If you take one thing away from this article: every VPC that talks to S3 or DynamoDB from private subnets should have a Gateway Endpoint. There is no scenario where it costs more than not having one. Teams that skip this step are leaving money on the table for no engineering reason. It is a one-line Terraform change.

We have seen this single change cut NAT Gateway processing costs by 40-70% within 24 hours for analytics-heavy workloads.

Interface Endpoints ($0.01/hour + $0.01/GB)

Interface VPC Endpoints use AWS PrivateLink to expose AWS service APIs as ENIs inside your VPC. They are available for most AWS services — SQS, SNS, Secrets Manager, CloudWatch Logs, ECR, KMS, STS, SSM, and dozens more.

The pricing is $0.01 per hour per AZ plus $0.01 per GB processed. That sounds cheap until you remember that a 3-AZ deployment with ten Interface Endpoints costs $0.30/hour just in hourly fees — about $220/month before any traffic. The per-GB fee is five times cheaper than NAT processing, which is where the savings come from.

The break-even math is straightforward. An Interface Endpoint pays for itself when the traffic to that service exceeds roughly 200 GB/month per AZ. Below that threshold, you are paying hourly fees for traffic NAT would have handled for less. Above it, every GB saves you $0.035 versus NAT.

The endpoints worth deploying for most mid-market workloads, in roughly the order they pay back:

S3 Gateway Endpoint — free, deploy unconditionally.
DynamoDB Gateway Endpoint — free, deploy unconditionally if you use DynamoDB.
ECR (api and dkr) Interface Endpoints — pays back fast if you run EKS or ECS.
CloudWatch Logs Interface Endpoint — pays back fast for any workload shipping logs from private subnets.
STS Interface Endpoint — low traffic but improves reliability and removes a NAT dependency for IAM role assumption.
Secrets Manager and SSM Interface Endpoints — modest savings, meaningful reliability improvement.
SQS and SNS Interface Endpoints — pays back fast for message-heavy workloads.

The endpoints worth skipping unless traffic is high: KMS (low GB volume usually), EC2, and Lambda. These are often more cost-effective through NAT for low-traffic patterns.

When NAT instances are still defensible

AWS recommends NAT Gateway over NAT instances for everyone, and for good reason — managed service, no ops burden, predictable failover. But the recommendation flattens a real tradeoff that matters for small-scale deployments.

A NAT instance is an EC2 box running iptables. A t4g.small can comfortably push 1-2 Gbps for $12-15 per month. Compared to a NAT Gateway's $33/month hourly fee plus per-GB processing, the savings show up clearly for workloads under roughly 500 GB/month of NAT traffic.

We see NAT instances make sense in three scenarios:

Single-AZ dev/staging environments where uptime is not critical. Two NAT Gateways for dev and staging at $66/month base is more than the EC2 spend for the entire environment in many cases. A single t4g.small NAT instance does the job for under $15 and a five-minute restart if it fails.

Edge accounts in landing zone architectures. Sandbox accounts, training accounts, and per-developer accounts in AWS Organizations rarely justify managed NAT. A NAT instance in the network account or per-account adds up.

Cost-sensitive workloads with predictable, low traffic. A small SaaS company running a single production environment with no compliance requirement for managed services can run NAT instances reliably with monitoring and an Auto Scaling group of one.

NAT instances stop being defensible at production scale, in regulated environments where the audit trail of a managed service matters, and anywhere the operational overhead would land on a platform team already stretched thin. We do not recommend NAT instances for any client over 100 engineers, ever.

Transit Gateway and the hub-and-spoke math

Once you cross into multi-account or multi-region territory, NAT economics change. Transit Gateway becomes the routing fabric, and the question shifts from "how do I avoid NAT?" to "where should my NAT sit and who pays for the traffic?"

The pattern that consistently works at mid-market scale: centralized egress. One or two network accounts host the NAT Gateways. All workload accounts route internet-bound traffic through Transit Gateway to the central NAT. VPC Endpoints sit in either the central account (shared via PrivateLink) or in each workload VPC depending on traffic volume.

The savings come from two places. First, you consolidate three NAT Gateways per workload VPC into three NAT Gateways total across the organization. A 15-account environment with per-VPC NAT goes from 45 gateways at ~$1,460/month in hourly fees alone to three gateways at ~$97/month. Second, centralized monitoring makes traffic anomalies visible across the whole org instead of buried per-account.

The cost you take on: Transit Gateway attachment fees ($0.05/hour per VPC attachment plus $0.02/GB data processed) and the cross-account complexity. The break-even for centralized egress versus per-VPC NAT is roughly 5-8 accounts with non-trivial traffic. Below that, the simplicity of per-VPC NAT usually wins. Above it, centralized egress wins clearly.

For workloads spanning multiple regions — which we cover in our multi-region deployment guide for India — the conversation gets harder. Transit Gateway peering charges per-GB, and cross-region traffic that used to be free between availability zones is now billable. The rule we use: keep NAT egress to the internet in each region (do not peer for internet egress), and use Transit Gateway peering only for application traffic that has to cross regions.

A six-week implementation sequence

The optimization plays out in a predictable order. The first three steps recover most of the savings; the last three lock in the new normal.

Week 1 — Measure. Pull CloudWatch metrics for BytesOutToDestination and BytesInFromDestination per NAT Gateway. Enable VPC Flow Logs if not already on, and query the top destination prefixes by traffic volume. The goal is a one-page summary: where is the traffic going, how much of it is AWS API traffic, and what is the monthly cost trajectory. Cross-reference with Cost Explorer at hourly granularity (Cost Explorer caps hourly granularity at 14 days, so do this early).

Week 2 — Deploy Gateway Endpoints. S3 and DynamoDB endpoints in every VPC that talks to those services. Zero risk, zero cost, immediate savings. Update route tables for every private subnet. Verify traffic shift with VPC Flow Logs the next day.

Week 3 — Deploy high-volume Interface Endpoints. ECR, CloudWatch Logs, and SSM are the consistent winners. Deploy them in every AZ for every VPC that hits those services materially. The hourly fee is the cost of admission; the per-GB savings pay back within weeks.

Week 4 — Address EKS and container egress specifically. Kubernetes clusters need extra attention because images, logs, and secrets all flow through NAT by default. Configure containerd or your CRI to prefer the ECR Interface Endpoint. Verify Fluent Bit or whatever ships your logs is routing through the CloudWatch Logs Endpoint, not NAT.

Week 5 — Audit remaining NAT traffic. Re-run the Week 1 analysis. Whatever is left in NAT is genuine internet egress (third-party APIs, public package mirrors, OS updates). At this point the bill should be down 60-80% from where it started. Decide whether to consolidate to centralized egress via Transit Gateway based on the account count and remaining traffic volume.

Week 6 — Lock in the new normal. Add Service Control Policies or tag-based budget alerts that catch new VPCs deployed without Gateway Endpoints. Add the endpoint deployment to the standard VPC Terraform module so future accounts inherit the configuration. Document the routing decisions in a one-page architecture note for the next engineer who joins.

For the FinOps program that surrounds this work, see our FinOps savings guide and our FinOps framework for 50-500 person companies, both of which cover the operational cadence that keeps these savings from quietly disappearing six months later.

Honest tradeoffs

A few things this approach gives up that are worth naming.

Operational complexity increases. Every Interface Endpoint is another resource to monitor, another dependency in failure modes, another line in your Terraform. For a 5-VPC environment with 10 endpoints each, you are now managing 50 Interface Endpoints across the org. The savings are real, but so is the surface area.

Endpoint policies need attention. Interface Endpoints support policies that restrict which principals and resources can use them. Defaulting to wide-open policies is normal at first; tightening them under DPDPA or SOC 2 audit is a project in itself.

Centralized egress concentrates risk. A misconfigured central NAT account can take down internet access for every workload. The blast radius is wider than per-VPC NAT. This is solvable with multi-region active-active central egress, but the architecture is harder.

Some services do not have Interface Endpoints. Third-party SaaS APIs, public package mirrors, and OS update endpoints will always route through NAT. After all the optimization, you will still need NAT Gateway for the residual traffic.

Where this framework breaks

The math we used assumes US-East-1 list pricing and standard AWS list rates. It does not survive contact with certain edge cases.

Heavy egress to the public internet (not AWS services) is unaffected by VPC Endpoints. If your workload streams video to consumers or pushes large files to third-party APIs, NAT processing cost is the price of doing business and the optimization is at the application layer, not the network layer.

Workloads with strict latency SLAs may not tolerate the small additional hop through an Interface Endpoint. We have not seen this matter in practice for mid-market workloads, but for sub-millisecond trading systems it is worth measuring.

Organizations with a strong "no managed services" culture sometimes resist Interface Endpoints for the same reason they resist NAT Gateway. The argument we use: PrivateLink is the lower-risk option here. The data plane stays inside AWS, and the audit story is materially better than internet egress through NAT.

Greenfield deployments where the entire VPC architecture is being designed from scratch should put endpoints into the base Terraform module from day one. The retrofit is straightforward, but the second-best time to deploy a Gateway Endpoint is now.

FAQ

How much can a typical mid-market AWS team save on NAT Gateway costs?

Across the FinOps engagements we have run for 50-500 person companies, we see NAT Gateway cost reductions of 50-80% within six weeks. The dollar value ranges from $1,000-$5,000 per month for smaller workloads to $20,000-$50,000 per month for data-heavy SaaS and analytics platforms. The variance is driven by how much of the traffic is same-region AWS API traffic versus genuine internet egress.

Are Gateway VPC Endpoints really free, with no catch?

Yes, for S3 and DynamoDB. No hourly fee, no per-GB fee, no data processing fee. The only operational cost is the route table change. The catch — if you want to call it that — is that Gateway Endpoints only exist for those two services. Everything else uses Interface Endpoints with the standard $0.01/hour + $0.01/GB pricing.

When should we still use NAT instances instead of NAT Gateway?

NAT instances make sense for single-AZ dev and staging environments under roughly 500 GB/month of NAT traffic, for edge accounts in landing zones, and for very cost-sensitive small deployments where the team is willing to operate them. They stop making sense at production scale, in regulated industries, or for any team without spare bandwidth to monitor and maintain an EC2-based network device.

Does Transit Gateway always save money over per-VPC NAT Gateway?

No. Transit Gateway adds attachment fees ($0.05/hour per VPC) and data processing fees ($0.02/GB). The break-even for centralized egress via Transit Gateway is roughly 5-8 workload accounts with non-trivial traffic. Below that scale, per-VPC NAT with VPC Endpoints is simpler and often cheaper. Above it, centralized egress wins on both cost and operability.

How do we measure NAT Gateway traffic by source and destination?

VPC Flow Logs are the primary tool. Enable them at the VPC or subnet level, ship to S3 or CloudWatch Logs, and query with Athena. The fields you care about are srcaddr, dstaddr, and bytes. Group by destination prefix to identify AWS services. Cross-reference destination IPs against AWS IP ranges (the published JSON) to separate AWS API traffic from genuine internet egress. Most of the savings sit in the first category.

What is the right NAT Gateway architecture for a 50-200 person company?

For most teams in this range: per-VPC NAT Gateways (three for HA across AZs), Gateway Endpoints for S3 and DynamoDB unconditionally, Interface Endpoints for ECR, CloudWatch Logs, and SSM, and NAT instances only for non-production sandboxes. Move to centralized egress through Transit Gateway when you hit 5-8 workload accounts. Re-evaluate annually as the bill grows.

If you want help running this analysis on your environment — pulling the Flow Log data, building the savings model, and sequencing the implementation — our FinOps practice does this routinely for mid-market AWS teams. A typical engagement returns the fees within the first month from NAT Gateway and adjacent network egress optimizations alone.