What if every AI question you asked ran through Microsoft’s cash register?

May 02, 2025

When Microsoft announced it would host Grok on Azure AI Foundry, it wasn’t merely another cloud partnership - it was a declaration of intent. In FY 25 Q3, Microsoft Cloud revenue surged past $42 billion (up 22% in constant currency), with Azure & other cloud services growing 33% (35% in consta

nt currency). Fully 16 pp of that growth came from AI services - proof that inference is now the driver of cloud economics. Meanwhile, xAI walked away from a $10 billion Oracle training deal, yet Microsoft doubled down on inference, onboarding Grok alongside ChatGPT, Gemini and DeepSeek’s R1 in mere days - ensuring every model call rings its cash register.

Why Grok? Capturing Inference $$$ at Scale

Training vs. inference isn’t academic. Training new LLMs demands massive capital outlays - GPU fleets, power, cooling - often with uneven utilization. Inference, by contrast, powers every user interaction: chat prompts, code completions, document summaries. Satya Nadella made this explicit: “Cloud and AI are the essential inputs for every business to expand output, reduce costs, and accelerate growth.”

In FY 25 Q3, Microsoft reported its cost /token has more than halved, while AI performance across its blended fleet rose 30% ISO power. By turning Grok into a plug‑and‑play inference endpoint, Azure locks in those high‑margin API calls. xAI’s Oracle detour underlines the choice: chase training capacity or monetize billions of monthly inference requests. Microsoft made its bet - on predictable, recurring inference revenue.

Azure’s Neutral‑Ground Strategy: A Universal Inference Marketplace

Rather than tether itself to one AI lab, Azure positions itself as the neutral host for any model that drives usage. Azure AI Foundry processed 100 trillion tokens this quarter (5× YoY) across more than 70,000 enterprises, showcasing that every LLM - from OpenAI’s GPT to Google’s Gemini to xAI’s Grok - must flow through Microsoft’s pipelines. Far from loyalty, this “model‑agnostic” play is about owning the platform where every interaction occurs - and every dollar is billed.

Proof in the Numbers: Inflection Points in AI Economy

Microsoft’s FY 25 Q3 earnings deck lays it bare:

Microsoft Cloud: $42 B revenue, +22% CCY
Azure & other cloud: +33% (35% CCY), with 16 pp from AI services
GitHub Copilot: 15 million users (4× YoY)
Microsoft 365 Copilot: seats tripled, with deal sizes and expansions accelerating
Copilot Studio: 230,000+ organizations building agents; 1 million custom agents created this quarter (130% QoQ)
Power BI, Fabric & Data Services: Fabric paid customers up 80% YoY (21,000+), PostgreSQL usage growing among 60% of Fortune 500, Cosmos DB accelerating
Azure bookings: Commercial bookings +18% (17% CCY), fueling a $315 B remaining performance obligation (up 34%)
Operating margins: improved to 46%, supported by a 2% increase in operating expenses despite strategic AI capex

Every one of those usage events- every code suggestion, financial report, slide deck or chatbot interaction - flows through Azure’s billing meters. AI is the new revenue engine.

Locking In Workflows from Startups to Enterprises

Embedding AI into workflows - from rapid‑prototype startups to global enterprises automating supply chains - is now table stakes. Yet when the critical services powering those workflows all run on Azure, the switching costs multiply:

Data gravity: Migrating petabytes from Azure Storage to another cloud can take months and cost millions.
Pipeline dependency: Rewriting CI/CD in Azure DevOps versus AWS CodePipeline means retraining and operational risk.
Compliance recertification: Every security accreditation—SOC 2, ISO 27001—must be redone if you move platforms.

That’s why incumbents have the unfair advantage: they built the rails, optimized them for AI, and charge you each time you ride.

Platform Neutrality: A Veneer Over a Unified Revenue Funnel

Microsoft’s marketing touts “bring any LLM, plug it in, go”—an appealing promise of choice. In practice, the act of plugging in locks you into one value loop:

Storage: Your data sits in Azure Blob or Data Lake.
Compute: Inference runs on Azure Machine Learning and AI Foundry.
Pipelines: You orchestrate with Azure DevOps and pipelines.
Automation: Citizen developers build flows in Power Platform.
Billing: Every API call, every compute cycle, every automation run is charged through Azure’s metering system.

True neutrality requires seamless portability - shifting data, billing and workflows between clouds with near-zero friction. Azure offers branded neutrality: an illusion masking a single, unified revenue funnel under Microsoft’s control.

The Road Ahead: Breaking (or Embracing) the Lock‑In

The fight to become the backend OS for AI agents is just beginning. Emerging multi‑cloud inference fabrics (e.g., Fly.io’s edge LLM serving, ThirdBrain.ai’s abstraction layers) promise portability; open‑source orchestrators (like Endo.ai, Krusty) aim to let you swap LLM providers on the fly. But the depth of integration - security, compliance, performance optimizations and pricing efficiencies - that platforms like Azure deliver presents a formidable barrier.

For founders, engineers and CIOs, the warning is clear: every inference call you bake into your stack is a vote of confidence - and a recurring invoice - to Microsoft. As AI scales into every business function, the platforms hosting it will dictate who has access, who pays, and ultimately, who wins.

Are you ready to pay the tab?

Shail’s Substack

Discussion about this post