AWS cost audit: Series B AI startup, $38.4K → $22.4K monthly

The Setup

Where the work started

A Series B AI startup was burning $38,400 a month on AWS and could not articulate where the money was going. Finance saw a line item climbing. Engineering saw a platform that felt appropriately sized. Neither side had the numbers to close the loop.

The team did not want a managed-service relationship or a six-week engagement. They wanted a fast, bounded read — what is actually being spent, what is waste, and what would a finance-defensible next move look like — before their next board update.

What had to be true

Produce a prioritized waste map of monthly AWS spend inside two weeks, flat $750 engagement fee.
Tie every recommendation to real line items in the bill, not generic best practices.
Leave the team with remediation owners and a savings runway they could execute without further consulting.

What I Did

The architecture

Read Cost Explorer, CUR data, and the live account together — rather than sequentially — so that every anomaly could be traced from the invoice down to the resource and back up to the team that owned it. No agentic rewrites, no landing-zone side quests; just the bill, the workloads, and a short list of moves ranked by blast radius per dollar saved.

01
GPU right-sizing on the RAG inference path
The inference tier was pinned to on-demand p4d instances sized for a peak load that never arrived. Downshifted to g5 for steady state with burst headroom, moved non-latency-sensitive batch embeddings to Spot, and locked in savings-plan commits on the baseline. Single largest line-item move.
02
EKS idle-node and autoscaler audit
Node groups were sized for the 99th-percentile traffic shape, not the median. Tightened HPA and Cluster Autoscaler thresholds, killed two standing node groups that were 30% utilized, and collapsed two overlapping clusters into one.
03
S3 storage class and lifecycle cleanup
Training artifact buckets were sitting on S3 Standard with no lifecycle policy. Pushed cold artifacts to Intelligent-Tiering and Glacier Instant Retrieval with a 30-day transition rule. Deleted 11 TB of abandoned intermediate artifacts after confirming ownership.
04
OpenSearch domain consolidation
Three dev/staging domains were running production-class instance sizes. Consolidated into a single right-sized domain with dedicated masters only where actually needed, and moved log retention to a cheaper tier behind a clear query SLA.
05
Bedrock and data-transfer quick wins
Caught two cross-region traffic patterns inflating egress charges and routed them to the primary region. Swapped a heavy, rarely needed model on a low-volume path for a cheaper default with the expensive model kept behind an explicit opt-in.

Outcome

What actually happened

Monthly AWS spend dropped from $38,400 to $22,400 — a 42% reduction — for $192,000 in annualized savings on a $750 engagement. A 256× ROI before any ongoing optimization.

$38.4K: Monthly spend before
$22.4K: Monthly spend after
$192K: Annualized savings
256×: ROI on engagement

Every remediation landed with a named owner on the engineering side and a one-page runbook in writing.
Savings-plan posture moved from reactive to intentional, with a baseline/burst split documented per workload.
Storage footprint fell 34% in the first 30 days post-engagement, without any change to model quality or training throughput.
Finance got a per-workload cost breakdown usable directly in the next board deck.

Why it matters

The parts another team can take

Read the bill and the cluster together, not sequentially. Every anomaly in Cost Explorer should be traceable to a live resource inside the same hour.
Size for median load with burst headroom, not for the 99th percentile. Spot and savings plans do the rest.
Storage and egress are almost always underinspected. Lifecycle policies pay back faster than any compute change.

Stack

AWS
EKS
EC2 GPU (p4d / g5)
OpenSearch
S3 Intelligent-Tiering
Bedrock
Cost Explorer / CUR
Savings Plans