Case Studies
Selected work, sanitized.
Public summaries of prior engagements, written in STAR format. Outcomes are quantified. Client-confidential specifics are not published.
At a glance
- Industry
- Healthcare — regional urgent-care network
- Timeframe
- 90-day design, build, and phased rollout
- Environment
- Azure, HIPAA-regulated, EMR-integrated
Situation
Where the work started
A regional urgent-care network was losing patients at the front door. Peak-hour intake lines backed up at multiple clinics, and the average patient wait time had crept to 94 minutes. Every minute over 30 correlated with higher abandonment and worse clinical satisfaction scores.
The network could not meaningfully expand headcount in the near term. Any solution had to run entirely inside a HIPAA posture — PHI encrypted in transit and at rest, Business Associate Agreements honored at every boundary, and auditable logs retained for the statutory window.
Task
What had to be true at the end
- Cut the average patient wait time by more than half without adding intake staff.
- Integrate with the existing EMR so the IVR could authenticate patients, look up upcoming appointments, and route callbacks back into the scheduling system of record.
- Ship to production inside 90 days and pass a full HIPAA audit before the first patient call hit the new system.
Action
How we delivered it
Delivered as a three-tier flow (triage → intake routing → callback queue) on Azure, scoped tight to the intake-call domain. No general-purpose assistant, no clever edge cases — just the top intents that covered the overwhelming majority of call volume.
01
Narrow NLU, built for intake
Trained an intent model focused on the top intake paths (appointment, prescription, triage escalation, billing, transfer). Refused to automate long-tail intents on day one — staff kept the long tail, the model kept its accuracy.
02
HIPAA-scoped architecture
VNet-isolated inference endpoints behind Private Link, Key Vault-managed PHI encryption keys, audit logs written to immutable storage with 7-year retention, and BAA boundaries verified at every service edge.
03
EMR integration via private scheduling API
Patient authentication and appointment lookup ran over a Private Link path to the EMR's scheduling API. Callback routing wrote back into the same system of record so staff never saw a second queue to reconcile.
04
Graceful human handoff
A break-glass escalation path was built in from day one. If confidence dropped or the patient asked for a person, the call routed to the existing intake team with full conversational context attached — no repeat, no friction.
05
Phased rollout with real load
Load-tested at 3× projected peak before go-live. Rolled out to two pilot clinics first, tightened the intent catalogue on real traffic, then extended to the full network.
Result
What actually happened
Average wait time dropped from 94 minutes to 22 minutes — a 77% reduction — without any change to staffing levels.
- 77%
- Wait time reduction
- 18% → 4%
- Abandonment drop
- 0
- Audit findings on launch
- 90 days
- Delivery window
- Call abandonment rate fell from 18% to 4% within the first 90 days of full rollout.
- Intake staff hours reallocated from phone queue triage to higher-touch patient-facing work.
- HIPAA audit passed on first review with no findings on the new system boundary.
- 99.95% system availability across the first six months post-launch.
What's portable
The parts another team can take
- Start with the 20% of intents that cover 80% of calls. Long-tail intents belong to humans until the model earns them.
- Scope PHI inside the inference boundary from the first line of code. Retrofitting compliance is where schedule and budget go to die.
- Build the break-glass handoff before the first patient call. Trust in the system grows faster than trust in any single model.
Stack
- Azure
- AI / NLP
- IVR
- HIPAA
- EMR integration
- Private Link
- Key Vault
Public summary. Client-confidential specifics are not published. Figures reflect the engagement outcome as delivered.
At a glance
- Industry
- Homebuilding, construction, media, enterprise software
- Timeframe
- Multi-year delivery across dozens of accounts
- Environment
- AWS, Well-Architected Framework, multi-account landing zones
Situation
Where the work started
An AWS partner ecosystem needed scalable architecture review delivery across mid-market and enterprise accounts. Generic reviews — checklists, screenshots, thirty-page PDFs — were not moving the needle. Findings were landing as shelfware.
Partners needed technical architecture support credible enough to close deals and retain accounts, with cost and risk findings that executive buyers could actually act on.
Task
What had to be true at the end
- Deliver Well-Architected reviews across dozens of customer accounts with consistent depth and quality.
- Translate every finding into business-language impact — dollars saved, risk avoided, delivery unblocked.
- Build reusable playbooks so partner teams could carry remediation after the review, not stall once the formal engagement closed.
Action
How we delivered it
Applied the AWS Well-Architected Framework's six pillars — operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability — per engagement. Treated every finding as a delivery contract with the customer's engineering team, not a slide in a deck.
01
Pillar-based review with evidence
Each pillar review produced concrete evidence: IAM policy extracts, VPC topology diagrams, cost attribution reports, incident history. No finding left the engagement without an artifact behind it.
02
Prioritization by blast-radius × effort
Findings were ranked by likely blast radius (security, reliability, cost) multiplied by remediation effort. High-impact, low-effort items ran first. Shelfware died early.
03
Business-language executive summaries
Every review produced a separate executive summary written for the business sponsor — no jargon, no screenshots. Dollar impact, risk exposure, and a three-decision recommendation. Technical detail lived in a paired engineering document.
04
Reusable remediation playbooks
Built a library of playbooks across common remediation patterns: multi-account landing zones, IAM segmentation, VPC rearchitecture, cost-attribution tags, data-lake governance. Partner teams used them to finish what the review started.
05
Named remediation owners per finding
Each prioritized finding left the engagement with a named owner on the customer side. No anonymous handoffs. Remediation completion rates moved accordingly.
Result
What actually happened
$60M+ in partner contracts closed, $200M+ in customer cloud spend influenced, and cost reductions ranging from 20% to 70% delivered across engagements.
- $60M+
- Partner contracts closed
- $200M+
- Customer revenue influenced
- 20–70%
- Cost reduction range
- ~75% @ 6 mo
- Remediation completion
- Typical engagement produced 30–45% cloud cost reduction within six months of remediation.
- Average review-to-remediation cycle shortened to roughly six weeks.
- Remediation completion rate held above 75% inside six months post-engagement — well above the industry norm.
- Review templates and playbooks became standard delivery artifacts across the partner organization.
What's portable
The parts another team can take
- Tie every technical finding to a dollar or risk number. Otherwise it will not get prioritized, no matter how correct it is.
- Write two documents — executive and engineering — not one hybrid. Different readers, different decisions, different failure modes.
- Leave every prioritized finding with a named remediation owner on the customer side before the engagement closes.
Stack
- AWS
- Well-Architected Framework
- IAM
- VPC design
- Cost allocation tags
- Landing zones
Public summary. Client-confidential specifics are not published. Figures reflect the engagement outcome as delivered.
At a glance
- Industry
- Professional sports — enterprise AI
- Timeframe
- Phased production rollout on Azure
- Environment
- Azure ML, AKS, 8× NVIDIA A100, PyTorch DDP/FSDP, vLLM
Situation
Where the work started
A professional sports organization needed production-ready GenAI infrastructure. The existing ML platform was not built for LLM-scale training or inference, and cost attribution was opaque — leadership had no way to evaluate whether a given LLM-powered feature was carrying its weight.
The environment was Azure-first. Any platform built here had to respect existing enterprise agreements, security postures, and identity boundaries.
Task
What had to be true at the end
- Stand up an LLMOps platform capable of distributed training, production inference with bounded latency, and end-to-end observability.
- Expose per-request cost attribution so feature teams could be charged back accurately and leadership could evaluate ROI at the individual feature level.
- Keep the platform defensible from a security standpoint: private networking, scoped identity, auditable model storage, and reproducible training runs.
Action
How we delivered it
Architected and deployed an LLMOps platform on Azure with a clean separation between the training tier, inference tier, and the observability and cost-attribution layer that spans both. Every design choice was tested against a single question: does this make per-request cost visible?
01
Distributed training on Azure ML
PyTorch DDP for smaller models and FSDP for models above the single-GPU memory ceiling, running across 8× NVIDIA A100s on Azure ML compute. Checkpoint strategy tuned so failure-restart economics did not dominate wall-clock cost.
02
vLLM inference on AKS
vLLM with paged attention deployed on Azure Kubernetes Service, with separate request classes for latency-sensitive traffic and batch workloads. Autoscaling driven by request-class-aware metrics, not raw CPU.
03
Per-request cost attribution
Custom middleware tagged every inference with tenant, feature, model, and a cost envelope. A daily pipeline aggregated usage into feature-level chargeback reports — the first time leadership could ask ROI questions at that granularity.
04
End-to-end observability
MLflow for experiment and model-lineage tracking, Azure Monitor and Prometheus for infra and request metrics, and structured logs traceable from request ID to GPU-second. Drift and quality gates ran on a held-out evaluation set on a fixed cadence.
05
Security posture by default
Private Endpoints on model and artifact storage, Key Vault-backed secrets, RBAC-scoped compute, and reproducible training runs anchored to specific commits, datasets, and compute SKUs.
Result
What actually happened
Production GenAI deployed on Azure with full cost observability, including per-request cost attribution that enabled ROI evaluation at the individual feature level.
- Per request
- Cost attribution granularity
- 8× A100
- Training hardware
- vLLM on AKS
- Inference runtime
- MLflow-tracked
- Experiment lineage
- Monthly chargeback reports became a standard input to feature prioritization — features were evaluated on ROI, not enthusiasm.
- Inference latency SLAs held at target p95 under production load across request classes.
- Training wall-clock time dropped meaningfully after FSDP + activation-checkpointing tuning.
- Full reproducibility — every model in production traced to a specific commit, dataset version, and compute configuration.
What's portable
The parts another team can take
- Bake per-request cost into the inference path from day one. Retrofitting cost attribution onto an LLM system that was not designed for it is an entire second project.
- FSDP pays off once models cross the single-GPU memory ceiling. Below that line, DDP is simpler and usually faster.
- Treat LLM evaluation as a CI gate, not a quarterly exercise. Quality regressions compound silently otherwise.
Stack
- Azure ML
- Azure Kubernetes Service
- PyTorch DDP / FSDP
- vLLM
- NVIDIA A100
- MLflow
- Azure Monitor
- Private Endpoints
Public summary. Client-confidential specifics are not published. Figures reflect the engagement outcome as delivered.
Next step
Work like this, applied to your situation.
Start with a $249 Architecture Review, or book a 15-minute fit check for larger scope.
