Cloud Cost Optimization Audit (representative engagement, anonymized)

AWS cost audit: Series B AI startup, $38.4K → $22.4K monthly

Monthly AWS spend fell 42% — from $38.4K to $22.4K — for a net $192K annualized reduction on a $750 engagement.

Duration
2-week engagement, $750 flat fee
Monthly spend before
$38.4K
Monthly spend after
$22.4K
Annualized savings
$192K
  • AWS
  • Cost Optimization
  • FinOps
  • GPU
  • RAG

The Setup

Where the work started

A Series B AI startup was burning $38,400 a month on AWS and could not articulate where the money was going. Finance saw a line item climbing. Engineering saw a platform that felt appropriately sized. Neither side had the numbers to close the loop.

The team did not want a managed-service relationship or a six-week engagement. They wanted a fast, bounded read — what is actually being spent, what is waste, and what would a finance-defensible next move look like — before their next board update.

What had to be true

  • Produce a prioritized waste map of monthly AWS spend inside two weeks, flat $750 engagement fee.
  • Tie every recommendation to real line items in the bill, not generic best practices.
  • Leave the team with remediation owners and a savings runway they could execute without further consulting.

What I Did

The architecture

Read Cost Explorer, CUR data, and the live account together — rather than sequentially — so that every anomaly could be traced from the invoice down to the resource and back up to the team that owned it. No agentic rewrites, no landing-zone side quests; just the bill, the workloads, and a short list of moves ranked by blast radius per dollar saved.

  1. 01

    GPU right-sizing on the RAG inference path

    The inference tier was pinned to on-demand p4d instances sized for a peak load that never arrived. Downshifted to g5 for steady state with burst headroom, moved non-latency-sensitive batch embeddings to Spot, and locked in savings-plan commits on the baseline. Single largest line-item move.

  2. 02

    EKS idle-node and autoscaler audit

    Node groups were sized for the 99th-percentile traffic shape, not the median. Tightened HPA and Cluster Autoscaler thresholds, killed two standing node groups that were 30% utilized, and collapsed two overlapping clusters into one.

  3. 03

    S3 storage class and lifecycle cleanup

    Training artifact buckets were sitting on S3 Standard with no lifecycle policy. Pushed cold artifacts to Intelligent-Tiering and Glacier Instant Retrieval with a 30-day transition rule. Deleted 11 TB of abandoned intermediate artifacts after confirming ownership.

  4. 04

    OpenSearch domain consolidation

    Three dev/staging domains were running production-class instance sizes. Consolidated into a single right-sized domain with dedicated masters only where actually needed, and moved log retention to a cheaper tier behind a clear query SLA.

  5. 05

    Bedrock and data-transfer quick wins

    Caught two cross-region traffic patterns inflating egress charges and routed them to the primary region. Swapped a heavy, rarely needed model on a low-volume path for a cheaper default with the expensive model kept behind an explicit opt-in.

Outcome

What actually happened

Monthly AWS spend dropped from $38,400 to $22,400 — a 42% reduction — for $192,000 in annualized savings on a $750 engagement. A 256× ROI before any ongoing optimization.

$38.4K
Monthly spend before
$22.4K
Monthly spend after
$192K
Annualized savings
256×
ROI on engagement
  • Every remediation landed with a named owner on the engineering side and a one-page runbook in writing.
  • Savings-plan posture moved from reactive to intentional, with a baseline/burst split documented per workload.
  • Storage footprint fell 34% in the first 30 days post-engagement, without any change to model quality or training throughput.
  • Finance got a per-workload cost breakdown usable directly in the next board deck.

Why it matters

The parts another team can take

  • Read the bill and the cluster together, not sequentially. Every anomaly in Cost Explorer should be traceable to a live resource inside the same hour.
  • Size for median load with burst headroom, not for the 99th percentile. Spot and savings plans do the rest.
  • Storage and egress are almost always underinspected. Lifecycle policies pay back faster than any compute change.

Stack

  • AWS
  • EKS
  • EC2 GPU (p4d / g5)
  • OpenSearch
  • S3 Intelligent-Tiering
  • Bedrock
  • Cost Explorer / CUR
  • Savings Plans

Next step

Want a similar read on your stack?

Start with a $249 Architecture Review, or book a 30-min discovery call for larger scope.

Public summary. Client-confidential specifics are not published. Figures reflect the engagement outcome as delivered.