ROCH Labs

The QCOM Setup: Why Edge Inference is the key, and what makes QCOM so special

Rohit Chauhan — Tue, 26 May 2026 14:44:17 GMT

Qualcomm is priced as a smartphone chipmaker at a 46% discount to the semiconductor sector median. The revenue mix says it stopped being primarily a smartphone business a while ago. That gap between the label and the numbers is the trade.

Why I’m writing this now

On May 23, 2026, QCOM moved +11.6% in a single session. Closed at $238.16. The 39-analyst consensus average target before that move was $177-$181. The stock blew clean through the entire analyst target band in one day.

Here’s the part that matters: none of those 39 desks have updated their models yet.

When Cristiano Amon provides any directional FY2027 framework at the June 24 Investor Day, every one of those analysts has to update simultaneously. That is not a catalyst. It is a mechanical consequence of where the price is relative to where the models are.

And then there’s the Ming-Chi Kuo report. Kuo (TF International Securities, 233K followers, the highest-credibility supply chain analyst in consumer electronics) published a survey in late April reporting that Qualcomm and MediaTek are co-development partners with OpenAI on a custom smartphone processor designed for an AI Agent-focused phone. 924K impressions. 179 quote-tweets. Mass production target 2028. Revenue per chip: Kuo estimates one AI chip project is worth 30-40 standard mobile processors in revenue. I’ll come back to what this means and what the limits of it are.

Three demand vectors are building simultaneously. The market is pricing one.

The three demand vectors

1. Automotive: the cleanest moat in the story

Automotive is a genuinely different business from mobile. ADAS inference cannot go to the cloud. Sub-10ms latency requirements for safety-critical workloads make cloud inference physically impossible. Not economically suboptimal. Physically impossible. The round-trip latency on any cloud connection is 40-80ms under ideal conditions. A collision avoidance model cannot wait 40ms. Every automotive ADAS chip sold is a permanent, structurally non-negotiable edge inference win.

The numbers, per Q2 FY2026 earnings and Q3 guidance:

Automotive revenue: $1.33B in Q2 (+38% YoY, record quarter)
Annualized run rate: crossed $5B for the first time
Q3 FY2026 guide: ~50% YoY automotive growth
FY2026 exit run rate target (management): >$6B annualized
Design-win pipeline: $45B (from $3B in 2017)
In production now: BMW iX3 Ride Pilot since October 2025

The $45B pipeline converts across FY2027-2030. This is not a “future promise” number. It is already in quarterly revenue, growing faster than any other segment, with 3-5 year platform refresh cycles baking in the next few years of revenue before the contracts even need renewing.

Snapdragon Ride Flex is the only shipping chip that unifies digital cockpit and ADAS on a single platform. NVIDIA Orin and Thor are high-performance AV, a different tier entirely. No direct competitor in the unified cockpit+ADAS space.

This is the demand vector the market is most under-pricing because it doesn’t look like “AI” in the way NVDA’s data center revenue looks like AI. It is. It just runs at 90mph on a highway in Munich instead of in a hyperscaler rack.

2. Data center: the binary on June 24

Amon confirmed on the Q2 FY2026 earnings call that Qualcomm will ship custom data center processors to a named hyperscaler before end of 2026. The customer has not been disclosed. The AI200 targets commercial shipments in 2026. The AI250 ships in 2027 and carries up to 768GB onboard memory, which matters because inference at scale is memory-bandwidth-bound, not compute-bound.

Here is the critical point: as of May 2026, there are zero FY2027 revenue estimates for AI200/AI250 on any bank desk. Zero. Not a single analyst has published a quantified FY2027 data center contribution. There is nothing to model yet because Amon has not given a number. When he gives even a directional range on June 24, every analyst is forced to build a model from scratch for a business line currently sitting at zero in their spreadsheet.

The stock was at $238 with every desk at $177. Whatever the June 24 number implies, the revision cascade follows mechanically regardless of whether the fundamental thesis is right.

The bear argument here deserves fair treatment: Google TPU-v6, Amazon Inferentia-3, and Meta Artemis all already exist. Hyperscalers have structural incentives to keep inference in-house and the capability to do it. The unnamed hyperscaler shipment could be a 5,000-unit evaluation run that never converts to commercial volume. One cancelled NRE agreement wipes the narrative. If Amon delivers no FY2027 framework on June 24, the data center thesis stays speculative and the multiple ceiling reverts to 20-22x on handset/auto fundamentals alone.

3. AI Agent phone: a 2028 story, not a 2026 trade

The Kuo report is worth understanding precisely because it is being misread in both directions.

What it says: Qualcomm and MediaTek are co-development partners with OpenAI on a custom smartphone processor for an AI Agent phone. The chip handles on-device context awareness and small-model inference continuously. Heavy compute offloads to cloud. Mass production target 2028. Specs and supplier selection expected finalized late 2026 or Q1 2027.

What it does not say: that QCOM is exclusive. MediaTek is named in the same sentence. Anyone running a thesis that relies on QCOM being the only partner here is building on a factually incorrect premise.

The revenue math is why this landed with 924K impressions: Kuo frames one AI chip design win as equivalent to 30-40 standard mobile processors in revenue potential. If that holds and QCOM secures meaningful volume for the 2028 ramp, the EPS curve changes materially. At current QCOM automotive ASP dynamics as a reference, that is not an unreasonable order-of-magnitude estimate.

This is a 2028 story. It has no bearing on June 24. The reason it matters now is signal quality: the best supply chain analyst in consumer electronics hardware is naming QCOM as a co-development partner on the most anticipated AI hardware product of the decade. That does not happen to a company the market should be pricing as a smartphone cyclical at 20x.

The CT indicator is also worth noting. Jukan05 (124K followers, tech equity account, was explicitly bearish on QCOM as recently as January 2026) quote-tweeted the Kuo report with: “Should I put together a bull thesis on Qualcomm? $QCOM.” 60 replies. 363 bookmarks on a single question post. He has not published the full thesis yet. When he does, it will circulate widely. Narrative formation at 124K follower accounts with 363 bookmarks per post is a leading indicator of broader CT uptake.

The valuation math

At $238, QCOM’s forward P/E is approximately 20x against FY2026 consensus EPS of ~$12.10. The semiconductor sector median forward P/E is 37x. That is a 46% discount.

For comparison:

NVDA: ~47x forward P/E
AMD: ~54x
ARM: ~115x
AVGO (most comparable on diversified AI infrastructure revenue mix): ~37x

The FCF yield case is the other side of this. At ~9% FCF yield, the market is pricing QCOM the way you price a business in structural decline. The sector median FCF yield is ~2%. A company returning $3.7B to shareholders in a single quarter (Q2 FY2026: $2.8B buybacks + $945M dividends) plus a freshly authorized $20B buyback program does not print 9% FCF yield from a position of weakness.

The historical context matters here. QCOM traded at ~30x P/E during the 2020-2021 smartphone cycle peak. No AI premium required. Just re-rating the automotive and IoT mix shift to something closer to prior-cycle highs would imply $260-280 on EPS alone. The stock is not expensive relative to its own history. It is trading near mid-cycle valuation while the revenue composition has shifted materially underneath it.

Re-rating to the sector median (37x) on FY2027 EPS of ~$14.00 implies ~$518. I am not underwriting that. It requires a fully validated data center business with named customers and recurring revenue, which does not exist yet.

The scenario I think is underwritten:

Partial re-rating to 28x (halfway between current 20x and AVGO’s 37x) on FY2027 EPS of ~$13.20 implies ~$370, approximately +55% from $238. This does not require QCOM to become NVDA. It requires the Street to stop pricing it as a phone chip company once data center revenue becomes visible.

The honest bear case

The arithmetic runs first.

QTL generated $1.38B in Q2 FY2026 at 72% EBT margin. That is approximately $1B per quarter in high-margin profit, representing ~30% of non-GAAP operating income. The Apple licence is terminable. If it expires unrenewed, nothing in automotive or IoT replaces that margin on a two-year timeline. This risk sits entirely outside the edge inference and data center thesis and does not get better if both theses deliver.

Against the edge inference case: AI Hub has 1,800 registered companies and 175+ pre-optimized models. It also has zero published DAU, MAU, or deployment volume figures anywhere. “1,800 companies” is a registration count, not an active usage figure. The platform is real. The adoption depth is unverifiable from public data. MediaTek’s Dimensity 9500 demonstrated a 20B-parameter LLM running on-device. The premium positioning QCOM relies on in Android is compressing faster than the bull case assumes. Thermal throttling is also real: sustained on-device inference degrades significantly from burst-mode TOPS claims under thermal limits. QCOM’s marketing figures are burst-mode. The sustained benchmark that matters more as usage patterns evolve is lower.

Against the data center bet: the specific architecture of the AI Agent phone (small model on-device, heavy compute offloads to cloud) means QCOM is not a play on AI data center scale. The OpenAI phone validates edge inference, not cloud inference. The AI200/AI250 is a separate product line from a different category. One validates the thesis that QCOM knows how to build inference silicon. The other has to stand on its own commercial merits, which are unproven.

The historical reversion pattern is worth naming explicitly. QCOM has run a diversification re-rating story four times: RF front-end (2019), IoT (2021), automotive (2023), and now AI/data center (2026). Each prior attempt saw the multiple expand and then contract when a handset guidance cycle disappointed. The automotive revenue evidence in the current cycle is stronger than in any of the prior three attempts. But the pattern exists and Q3 FY2026 handset guidance -- management guided ~$4.9B in handsets for Q3, down from $6B in Q2 -- means the near-term handset print could reassert the cyclical narrative if it misses.

The trade

I am not positioned yet. This is a pre-event setup note.

Entry: $199-207. The post-earnings support base from May 15-21 (five consecutive daily closes), the June-July 2024 structural zone, and approximately 20x forward P/E on FY2026 consensus all converge in that range.

Stop: $189 daily close. Below the May 20 swing low of $191.02. Risk at entry: 6-9%.

TP1: $247.9. Reclaim of May 12 intraday high. Pure mechanics, no thesis confirmation required.

TP2: $300. No overhead supply on weekly chart between $248 and $300.

TP3: $350-370. Full thesis, conditional on June 24 delivering a quantified FY2027 data center framework.

R/R to TP1 from mid-entry (~$203): approximately 2.5:1. Catalyst window: June 24 Investor Day.

Where I land

One thing the Kuo report changes is the narrative ceiling. Before it, the QCOM re-rating case was a valuation argument against a still-skeptical market. After it, the highest-credibility supply chain analyst in the world is naming QCOM as a co-development partner on an OpenAI AI Agent phone with 30-40x the revenue potential of a standard mobile processor.

MediaTek is in the room too. The exclusive framing is wrong and I want to be precise about that.

But here is the thing about the setup as a whole. Automotive ADAS cannot go to the cloud on physics grounds. OpenAI is designing a phone around on-device inference. The unnamed hyperscaler is receiving custom data center processors before year end. These three demand vectors are not coincidental. They are converging on the same underlying capability: Qualcomm’s edge inference silicon is good enough to be chosen for the most latency-sensitive workload in consumer electronics (automotive), the most anticipated AI product of the decade (OpenAI phone), and a hyperscaler’s production data center.

The market is pricing a smartphone chip company at 20x.

The upgrade cascade is not a question of whether. It is a question of when. June 24 is the most likely trigger.

Watching closely.

*Conviction: MEDIUM. Not positioned. Upgrades to HIGH on two conditions: Amon delivers a quantified FY2027 data center framework on June 24, and Q3 automotive holds the guided ~50% YoY growth. One without the other is not a full re-rating.*

*Not investment advice.*

Sources

1. [QCOM Q2 FY2026 Earnings Press Release — Qualcomm Investor Relations](https://investor.qualcomm.com/news-releases/news-release-details/qualcomm-announces-second-fiscal-quarter-2026-results) — Q2 FY2026 automotive revenue $1.33B, +38% YoY; annualized run rate; $45B design-win pipeline; $2.8B buybacks + $945M dividends; $20B buyback authorization.

2. [QCOM Q2 FY2026 Earnings Call Transcript — Seeking Alpha](https://seekingalpha.com/article/4786000-qualcomm-qcom-q2-2026-earnings-call-transcript) — Cristiano Amon confirms custom data center processor shipment to named hyperscaler before end of 2026; AI200 commercial shipment timeline; Q3 FY2026 guidance including ~50% YoY automotive growth and ~$4.9B handset segment.

3. [Qualcomm Investor Day 2026 — June 24 — Qualcomm Investor Relations](https://investor.qualcomm.com/events-presentations) — Scheduled June 24, 2026. Amon to provide strategic framework; data center business update expected.

4. [Ming-Chi Kuo X Post — OpenAI AI Agent Phone: Qualcomm and MediaTek as Co-Development Partners](

— TF International Securities supply chain survey. QCOM and MediaTek co-development partners for OpenAI custom smartphone processor. Mass production target 2028. Revenue per AI chip estimated at 30-40x standard mobile processor. 924,241 impressions.

5. [Jukan05 X Post — QCOM Bull Thesis Framing](

— Quote-tweet of Kuo report. 124K follower tech equity account previously bearish on QCOM as of Jan 2026 flags intent to build bull thesis. 363 bookmarks.

6. [Snapdragon Ride Flex — Qualcomm Product Page](https://www.qualcomm.com/products/automotive/snapdragon-ride) — Unified digital cockpit and ADAS on single platform. Only shipping chip in this category. BMW iX3 Ride Pilot launched October 2025.

7. [MarketBeat QCOM Analyst Price Targets — Consensus Data](https://www.marketbeat.com/stocks/NASDAQ/QCOM/price-target/) — 39-analyst consensus average price target $177-$181 pre-May 23 move. Baird $300 target raised May 1, 2026. 38 desks below current price $238.

8. [S&P Global Market Intelligence — Semiconductor Sector Forward P/E Consensus](https://www.spglobal.com/marketintelligence/en/) — Semiconductor sector median forward P/E approximately 37x; NVDA ~47x; AMD ~54x; ARM ~115x; AVGO ~37x as of May 2026.

9. [QCOM Form 10-Q Q2 FY2026 — SEC EDGAR](https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=QCOM&type=10-Q) — QTL segment Q2 FY2026 revenue $1.38B at 72% EBT margin. Full financial statements.

10. [Qualcomm AI Hub — Developer Platform](https://aihub.qualcomm.com/) — 1,800+ registered companies, 175+ pre-optimized models as of May 2026. On-device AI model deployment platform.

11. [QCOM Historical Price Data — May 23, 2026 Session](https://finance.yahoo.com/quote/QCOM/history/) — Closing price $238.16, single-session move +11.6%. May 20 swing low $191.02.

12. [MediaTek Dimensity 9500 — On-Device LLM Benchmark](https://www.mediatek.com/products/smartphones/dimensity-9500) — 20B-parameter LLM on-device capability. Direct competition for premium Android AI processing against Snapdragon Elite.

13. [Qualcomm AI200/AI250 Data Center Processor Announcement](https://www.qualcomm.com/news/releases/2025/10/qualcomm-announces-ai-inference-processors-for-data-centers) — AI200 commercial shipments targeted 2026; AI250 ships 2027, up to 768GB onboard memory.

The Onchain Inference Trade

Rohit Chauhan — Fri, 22 May 2026 14:56:22 GMT

TL;DR

- OpenRouter processed over 12 trillion tokens per week by early 2026, up 12x year on year (per OpenRouter / a16z State of AI report, Dec 2025). Falling inference costs expand the market via Jevons’ Paradox which is what makes the trade compelling.

- Decentralized inference providers have real ARR (~$12-14M Venice, ~$6M Chutes) and three structural moats Web2 cannot replicate: privacy, uncensored AI, and a 70-85% pricing advantage.

- Most revenue is still token-subsidized (Chutes ~90% subsidy ratio is worse than the Helium 78% DePIN benchmark). Organic economics at scale are not yet proven.

- Galaxy Digital’s formal institutional report is in production and unpublished. The window for early positioning is open.

- Core: $VVV, $AKT. Higher beta: $TAO, $POD. Farm: Ritual, Inference Labs.

Conviction Tier: MEDIUM CONVICTION

Upgrades to HIGH CONVICTION on enterprise-scale production deployment confirmation from at least one major protocol, OR organic revenue covering more than 50% of total protocol revenue without token subsidies.

PMF is confirmed (Venice ARR, Chutes OpenRouter #1 ranking, DIEM perpetual credit adoption). Tokenomics innovation is real. Pricing moat is structural. Risks are specific and named: subsidy dependency is the primary one, TAO halving pressure is real, Venice centralization debate is unresolved, enterprise scale remains unproven. Size these positions relative to those risks.

Introduction

OpenRouter grew from ~10 trillion tokens processed annually to over 100 trillion as of mid-2025. By early 2026, the platform was handling over 12 trillion tokens per week, a 12x increase YoY (per OpenRouter / a16z State of AI report, Dec 2025). The standard read is that inference is becoming a commodity, costs are collapsing, and that is structurally bad for any provider operating in the space. That read has it backwards, and completely misses the point.

When a resource gets 10x cheaper and usage grows 30x in response, you are not watching margin compression. You are watching Jevons’ Paradox operate at the scale of global infrastructure. The inference market is not contracting as costs fall. It is expanding faster than any model projected, and the protocols positioned in the correct structural slots are compounding into something that has not been formally covered by a single major research institution.

That institutional gap is where this piece sits. Galaxy Digital’s Research Associate Lucas Tcheyan publicly signalled a formal research pivot to decentralized inference in May 2026, soliciting project names for inclusion in a forthcoming report. That report is not yet published. What follows is a synthesis of where the category stands, who the leaders are, and what the real risks look like before that coverage arrives.

The Structural Shift

The history of AI access can be told in three phases.

The first was closed AI: expensive, proprietary, censored. Built by OpenAI, Anthropic, and Google on centralised infrastructure, accessible through APIs with content restrictions and data logging baked into the architecture. The pricing reflected the monopoly on both intelligence and distribution.
The second was open AI (not the company ironically): The release of Llama, Mistral, and DeepSeek created commoditised base intelligence. Open-weight models collapsed the cost of raw capability. The question that emerged from this phase was not who could build AI, but who would serve it at scale, to whom, and under what terms.
The third phase, which is where we are now, is decentralized AI: Token emissions fund research teams without dilutive venture rounds. Idle consumer GPUs yield inference revenue. Distributed compute networks source capacity at ~70-85% below AWS SageMaker pricing and pass that discount to developers routing inference requests at machine speed through agent harnesses running billions of API calls per session.

The transition from the second phase to the third is not hypothetical. Tools built on top of open-weight models were burning billions of inference tokens daily before major closed labs moved to restrict automated harness usage. That restriction, widely read as bearish for inference demand, is structurally bullish for the decentralized layer. When closed labs restrict access, developers migrate to open-weight models on decentralized rails that carry no content policy. The demand does not disappear. It relocates. This is the thesis, as well as the premise of this piece.

The Business Model

The unit economics of a decentralized inference provider are worth understanding precisely, because they are the structural source of the advantage.

Cutting away the jargon, the whole workflow can be written and understood in a few simple steps:

Source compute from a decentralized cloud marketplace at ~70-85% below AWS SageMaker pricing.
Host open-weight and private AI models at that cost basis.
Charge developers via API subscriptions or usage fees that are still materially cheaper than OpenAI or Anthropic.
Capture the spread. Then layer in token mechanics that create switching costs no centralised provider can replicate.

The two clearest executions of this model are Venice AI and Chutes.

Venice AI ($VVV)

Venice AI is running ~$12-14M in annual recurring revenue on 50-80B tokens per day, with a single-day peak of 80B tokens, per platform analytics at venicestats.com.

The tokenomics innovation that matters specifically for Venice is DIEM. Here’s how it works in practice; Stake $VVV, earn approximately 18% APY, and mint DIEM tokens. Each DIEM token generates one dollar per day in Venice inference credits, indefinitely.

This is not a governance token or a speculative instrument in the traditional sense. It is a perpetual inference credit: DeFi yield mechanics applied to AI compute. Capital staked into Venice cannot be easily withdrawn without unwinding a position generating daily yield. That lock-in is by design. The Pendle and Penpie analogy from DeFi explain this choice by Venice precisely: PT/YT leverage markets and governance arbitrage on inference credit bribe markets will follow as the ecosystem matures.

Chutes (Subnet 64, Bittensor Ecosystem Project)

Chutes, built on Bittensor Subnet 64, is running ~$6M ARR on comparable token volume, ranked number one on OpenRouter for decentralised inference as of May 2026.

The distinction from Venice is structural: Chutes is fully decentralized, with no centralised infrastructure critique applicable to it. Its subsidy ratio, however, sits at an estimated ~90% i.e. in simple words, nine in every ten dollars of revenue is token-subsidised, not organically generated. This exceeds even Helium’s 78% benchmark at a comparable stage of the DePIN cycle. That distinction matters significantly for the bear case, which I address directly below.

Per ETH Denver benchmarks from February 2026, the decentralized inference segment was already at ~$20-30M ARR across Bittensor subnets, against a subnet market cap of ~$212M. For reference, the broader DePIN sector was generating $72M in revenue against $10B in market cap at the same timeframe. The decentralized inference sub-sector is generating comparable revenue at a fraction of the market cap assigned to DePIN broadly.

That should tell you how early we are in the narrative cycle despite the ~8x pump recorded by VVV token in the past few weeks.

The Stack

Five distinct layers form the decentralized AI inference supply chain. Each carries different maturity, different tokenomics design, and different risk profiles.

The Compute layer

Akash Network ($AKT) is the primary decentralized GPU marketplace, the infrastructure backbone from which most inference providers source capacity. Burn-and-mint emissions tokenomics are live for $AKT. Render Network ($RNDR) and io.net ($IO) serve adjacent compute demand across the same structural thesis.

This is the pick-and-shovel layer: every inference provider that scales through decentralized compute builds on top of it.

The Coordination layer

This is where Bittensor ($TAO) sits on the whole map. Its 128 subnets run in continuous Darwinian competition for resources, talent, and inference share. Grayscale has taken stakes in TAO and is building ETF products around the asset.

The framing that has taken hold institutionally is “TAO is the BTC of AI,” a macro-levered coordination layer providing diversified exposure to the entire decentralized AI stack. At ~$20-30M in ecosystem ARR, it is the most levered expression of the category thesis.

The Inference layer

This is where primary value accrual is happening now. Venice and Chutes lead the current cycle. Lium on Bittensor Subnet 51 is generating ~$600K in monthly burn volume, the largest of any Bittensor subnet. Dolphin AI ($POD) is executing Venice’s tokenomics playbook on consumer GPU infrastructure, with its v2 peer-to-pool network imminent.

The Verifiable Inference layer (a sub-segment for now, but a full segment eventually) is the one the market is not pricing. I return to this in the moats section below.

The Application layer

This covers AI agents, DeFAI, and tooling infrastructure. This is where end-user value accrual eventually concentrates, but where most token fundamentals remain weak relative to the infrastructure layers above them.

The Four Moats

Four structural reasons explain why decentralized inference holds territory that centralised labs cannot enter regardless of capital or compute resources. Below I breakdown exactly what makes them special, and defensible against large centralized incumbents:

Privacy:
1. Per Cisco’s 2024 Consumer Privacy Survey, 84% of GenAI users are concerned about data entered in tools going public. For enterprise customers operating under GDPR, CCPA, HIPAA, and financial data regulations, this is not a preference item. It is a liability constraint.
2. OpenAI and Anthropic are structurally compelled to log user interactions for model improvement, abuse detection, and compliance auditing. They cannot offer genuine data separation without dismantling their core model improvement pipelines.
3. Venice AI, Dolphin AI, and the TEE-stack protocols own this market by structural default. The demand is not niche: enterprise legal, healthcare, and financial services clients all face mandates that make inference-with-logging a material compliance exposure.
Uncensored AI:
1. Centralised labs face advertiser sensitivity, regulatory exposure, and reputational risk from uncensored model outputs. Companion AI, penetration testing, red-teaming, political speech in jurisdictions with content restrictions, and enterprise internal tooling with sensitive subject matter are categories the closed labs are structurally blocked from serving at scale.
2. Venice and Dolphin AI sit permanently on the open-weight side of this divide. As model liability frameworks mature and the market split between compliant-censored and open-weight uncensored AI sharpens, this moat deepens rather than narrows.
Pricing:
1. AkashML inference costs ~$2-4 per million tokens. OpenAI’s API runs ~$15 per million tokens on GPT-class model output; 2026 flagship models range from $2.50 to $30 per million tokens depending on model and token direction.
2. The 70-85% structural discount to AWS is not promotional: it derives from idle consumer GPU supply at near-zero marginal cost of capital, token emission subsidies covering infrastructure overhead, and no centralised data centre CAPEX on the balance sheet.
3. The margin structure improves as the network scales. This is the directional opposite of a traditional cloud provider’s cost curve, and it is the structural basis for the lock-in mechanics described in the business model section above.
Verifiable Inference:
1. This is the fourth moat and the least understood. TEE-backed inference with cryptographic attestations produces a proof that a specific model generated a specific output, without revealing the model’s weights or the user’s input data.
2. For autonomous agents executing financial transactions on-chain, the inference output is the instruction. Without cryptographic proof, that instruction is trusted. With it, it is verified.
3. Ritual, backed by Archetype with ~$25M raised, is building Infernet: smart contracts that natively call ML models with cryptographic proofs. Inference Labs shipped Sertn’s Proof Inspector in May 2026, generating the highest-engagement technical post in the category that month. Neither has a token.

The narrative has not produced a breakout yet, and that’s understable since we haven’t seen a breakout success, or proof it can work at scale, or has any specific enterprise deals tied to it yet. Despite that, verifiable inference is arguably the most defensible structural moat in the stack and is currently trading at the lowest awareness premium of the four.

The Bear Case

The honest version of this thesis requires stating four risks that could break the thesis around decentralized inference, and make this whole thesis stale. Below I break down each one of them:

Dependency Risk:
1. Chutes is running an estimated ~90% subsidy ratio which means nine in every ten dollars of revenue is token-subsidised, not organically generated. This exceeds even Helium’s 78% benchmark from the DePIN cycle, which was itself considered alarming at the time.
2. Most decentralized inference revenue is still token-subsidised rather than organically generated. If token prices fall, compute subsidies shrink, the pricing advantage narrows, and users with no structural switching costs have limited reasons to remain.
3. No protocol in this stack has proven sustainable organic unit economics at scale. That is not a minor caveat. It is the foundational question the thesis must answer to convert from speculative to durable.
The TAO halving risk:
1. This one is specifically for Bittensor-native projects. Daily emissions were cut from 7,200 to 3,600 TAO in December 2025. Subnet miners whose incentives fall below compute cost break-even will exit.
2. The Pareto distribution of subnet quality means a significant portion face meaningful pressure. Inference subnets with real revenue are better insulated than meme subnets, but the halving risk is present across the ecosystem and the 60-day churn data post-halving is the primary risk variable to monitor.
Venice’s centralisation critique:
1. The argument that Venice runs on centralised infrastructure with a decentralised token wrapper is not a fringe position.
2. If regulatory pressure tightens around centralised AI infrastructure operating with decentralised tokenomics, Venice faces a structural exposure that Chutes, by architecture, does not.
Lopsided demand curve:
1. Demand is still scarce relative to supply. Significant capital has been deployed into decentralized AI infrastructure. Institutions are running pilots, not production workloads.
2. No enterprise-scale production deployment is confirmed for any protocol in this stack. The category’s revenue is real. The scale at which it needs to operate to justify current market caps requires enterprise adoption that has not arrived.

What to Watch

Three developments will determine whether this thesis upgrades from its current state to full institutional validation.

The Galaxy Digital decentralized inference report is the most important near-term catalyst. When it publishes, it will be the first institutional-grade research document formally scoped to this category. The historical pattern in crypto is consistent: when the first serious institutional report drops on a category with real fundamentals, narrative velocity accelerates regardless of price action at the time of publication. The pre-publication window is where asymmetric positioning has historically been available.
The Dolphin AI v2 launch is the nearest product catalyst: If the peer-to-pool consumer GPU inference network ships with an early ARR trajectory comparable to Venice’s first months, it validates the hypothesis that the Venice tokenomics model is replicable at scale on distributed consumer hardware, not just enterprise GPU pools sourced from Akash. That would expand the addressable inference supply significantly and harden the pricing moat thesis.
The verifiable inference narrative is the sleeper catalyst: Ritual and Inference Labs have no tokens and no major account has published a dedicated analysis. When the first significant thread or research note covers cryptographic proof of inference at scale, the protocols in this layer will attract pre-token positioning attention. The farming window for those positions exists now, before that awareness arrives.
The TAO halving subnet survival data is the ongoing risk monitor: If churn is manageable and inference subnets with real revenue hold their miner bases over the next 60 days, the halving becomes a supply shock narrative for TAO. If churn is severe, it partially invalidates the Bittensor coordination layer thesis at current prices. It will be interesting to watch how this plays out.

In Conclusion

The 12x expansion in OpenRouter token consumption in a single year is not a growth rate. It is evidence of a structural transition in how the world processes intelligence (per OpenRouter / a16z State of AI report, Dec 2025). The protocols serving that demand from the positions the large labs cannot occupy (private compute, uncensored inference, cutthroat pricing, cryptographic verification) are already generating real revenue, and the suits don’t even know as the institutional research coverage has not arrived yet!

That combination rarely persists. The window between confirmed product-market-fit and institutional narrative is historically short once the first formal report drops. The question is not whether decentralized inference is a real sector: the revenue data answers that. The question is whether the subsidy dependency resolves before the institutional wave fully prices the category.

That is the risk. And it is worth taking seriously before dismissing either the thesis or the bear case that sits underneath it.

Sources

Fact-checked as of 22-May-2026

- Ritual $25M raise — Archetype lead (https://cointelegraph.com/news/ai-infrastructure-startup-ritual-raises-25-m-gaps-crypto)

- Lucas Tcheyan, Research Associate, Galaxy Digital (https://theorg.com/org/galaxy-digital/org-chart/lucas-tcheyan)

- Cisco 2024 Consumer Privacy Survey (https://www.cisco.com/c/dam/en_us/about/doing_business/trust-center/docs/cisco-consumer-privacy-report-2024.pdf)

- OpenAI API Pricing 2026 (https://openai.com/api/pricing/)

- Akash Network GPU pricing (https://akash.network/pricing/gpus/)

- Bittensor halving December 12, 2025 (https://blog.mexc.com/news/bittensors-historic-first-halving-starts-december-12-will-tao-rally-to-1000-as-daily-emissions-drop-50/)

- Grayscale TAO ETF filing (https://phemex.com/news/article/grayscale-files-for-bittensor-etf-stabilizing-tao-price-51072)

- VVV Robinhood listing May 19, 2026 (https://www.cryptotimes.io/2026/05/20/venice-token-vvv-rockets-24-as-robinhood-listing-ignites-rally/)

- Venice AI: Programmatic Buy & Burns (https://venice.ai/blog/programmatic-vvv-buy-and-burn)

- Bittensor subnet ARR: Unsupervised Capital (https://www.unsupervised.capital/writing/bittensors-ai-compute-subnets-collectively-reach-20m-arr)

- OpenRouter / a16z State of AI report (Dec 2025): token consumption data (https://openrouter.ai/state-of-ai)

- OpenRouter weekly token data context - Trending Topics (https://www.trendingtopics.eu/chinese-ai-models-overtake-us-rivals-in-global-token-consumption/)

How I Structure Every Research Note (And Why Most Investment Research Is Theater)

Rohit Chauhan — Fri, 15 May 2026 13:17:44 GMT

Most investment research is written for engagement, not accountability.

A chart with arrows. A ten-tweet thread. Numbers specific enough to sound credible. Then the market moves against it and the post disappears. No follow-up. No post-mortem.

This is not research. Its a prayer disguised as research.

The tells are always the same. No stated conviction level. No explicit kill conditions. No position disclosures (or worse an advertisement in the garb of research). No mechanism for closing the loop when the thesis fails. The author reserves the right to be right in hindsight and wrong in silence. The reader has no way to distinguish a high-conviction bet from a speculative guess because every post reads the same.

It is structurally dishonest. So I built a different system.

Conviction Is Not a Feeling but a label

Every research note I write opens with one of three words: HIGH, MEDIUM, or SPECULATIVE. That label is the first line. Not buried in paragraph four. Not implied by tone. Written before the thesis begins.

HIGH conviction means the thesis is tested. Multiple confirming signals exist. I am positioned accordingly. If I cannot name three specific scenarios that would invalidate the call, I do not have HIGH conviction. I have wishful thinking.

MEDIUM means I am building a position. One or two signals have confirmed. I am staged in and watching for the third. The risk/reward justifies owning it, not fully sizing it.

SPECULATIVE means the thesis is forming. I may have a starter position or nothing at all. The purpose of a SPECULATIVE note is to document the logic early and state exactly what would upgrade it. If I cannot say what moves it to MEDIUM, I have not thought clearly enough to publish.

The practical difference matters. A HIGH conviction note on NVDA would require five data-anchored thesis points, a specific catalyst with a date, and three named kill conditions. A SPECULATIVE note on ARM would state explicitly: it will not upgrade until royalty revenue beats by more than 10% and v9 architecture penetration guidance moves above 35%. Same sector, different epistemic states. Same format, different labels.

Most research platforms publish both types identically. The audience cannot tell which is which until the outcome reveals it. That is not analysis. That is a lottery where the ticket looks the same regardless of odds.

A Thesis Is Not a Prediction

A prediction says: I think this goes up.

A thesis says: here is the mechanism, here is what would break it, and here is what I own.

The kill section is what separates them.

Every HIGH conviction note I write includes it. The format is three specific scenarios that, if they materialize, mean the thesis is wrong and the position is closed. Not vague scenarios. “Hyperscaler CapEx guidance cut more than 15% in any Q2 earnings call” is a kill condition. “Macro deteriorates” is not.

This is not hedging. Hedging is writing “risks include macro uncertainty” and leaving the reader to interpret it. A kill condition is a commitment device. It states in advance what would change my mind. When that event occurs, I am not permitted to rationalize around it. The condition was pre-defined. The discipline is mechanical.

Losses Should Be Visible

When a thesis is killed, I publish a post-mortem in the thesis graveyard. Every one. No exceptions.

The post-mortem answers three questions: what did I call, what actually happened, and what was the root cause of the miss. The root cause is assigned to one of six categories: narrative risk, timing error, data error, regime change, execution failure, or unknowable at the time of entry.

Over time, the distribution of categories tells you something specific about your analytical process. If most kills are timing errors, your entry framework is too early. If most are narrative risk, you are overweighting fundamentals relative to market structure. The graveyard is a diagnostic tool, not a confession box.

The “unknowable” category is narrow. Most misses are explainable in retrospect. My ai16z position lost 95% before I exited. The thesis was right; agentic AI infrastructure matters. The bet was wrong. I sized into a team, not a thesis. The infighting and internal breakdown that killed the project were not in my kill section because I was too attached to the position to write honest kill conditions. That is execution failure compounded by attachment bias. The post-mortem said so. Attachment to a position is itself a kill condition. I know that now.

Most research platforms never close this loop. Kills are silent. The track record is constructed from hits. The audience ends up with a biased sample and no ability to assess actual edge.

The Framework in Practice

Every note I publish carries a conviction label. Every HIGH conviction note has a kill section. Every closed thesis gets a post-mortem. Every data point is a number. Every position is disclosed at entry, not after the outcome.

The live data infrastructure is at ai-tracker-sigma.vercel.app. It covers the AI supply chain across equities and tokens. The tracker is the data layer behind every note.

Over time this expands. More dashboards. More tracked theses. Full transparency on every position I hold and every one I exit. The track record builds in public, not in hindsight.

The research Discord built on this framework launches soon. It will house every thesis, call, post-mortem, and position update in one place. A living record, not a highlight reel.

If you’re curious to join, follow along ROCH Labs or subscribe to be the first to know.

Tether AI: The Genius of Low-Rank Adapation or LoRA

Rohit Chauhan — Tue, 17 Mar 2026 18:30:59 GMT

Earlier today Tether’s founder and CEO, Paolo Ardoino announced Tether’s QVAC Fabric breakthrough.
The QVAC Fabric LLM is a framework from Tether Data (the team behind the stablecoin,
$USDT
) to decentralize AI fine tuning, and enabling private AI.
Consider this, if you’re a MacBook Air M1 user, this framework is particularly relevant to you because the M1 chip’s Unified Memory Architecture is exactly the kind of “ordinary device” this framework is designed to exploit.
The QVAC Fabric is a free, open source software framework (think “toolkit” or “engine”) which lets anyone run and personalize powerful AI models directly.

Usually, if you want a model to know your specific writing style (like your twitter voice) or your specific niche (mine is financial data analysis), you have to upload that data to a cloud provider like Microsoft Azure, which powers OpenAI’s infrastructure.)

To receive persaonlized responses from an LLM provider like OpenAI, your data must be sent to and stored on their remote servers.

QVAC changes this as the framework allows data to stay on your device, and bypass the cloud entirely because the computation happens locally on your hardware.

While for most things, it probably doesn’t matter for the cloud to have access to all your data. But for sensitive stuff, like financial audits, or proprietary investment research, using a local framework like QVAC will allow you to retain 100% custody.

A pre-trained LLM like Llama, Qwen is like a super-smart but generic university professor who knows everything about the world in general. If you want this professor to become your personal doctor AI (that understands your specific health history, diet, symptoms, and speaks in your style) you need to teach them your private medical notes, blood tests, Whoop data etc.

teaching = fine-tuning = updating the model’s knowledge with your data.

In the old way, we would retrain the entire professor from head to toe, and change very single connection in their brain. For a 13B parameter model, that’s billions of numbers to update. This needs huge servers, costs a fortune in electricity/GPUs, takes days/weeks, and your private data has to leave your phone.

This is where LoRA (Low-Rank Adaptation) is genius. LoRA says don’t touch the professor’s core brain at all. Instead, give them a tiny set of clip-on cheat sheets or post-it notes that they can consult everytime they answer.

These post-its only contain a very small number of new instructions (often just 0.1% to 1% of the original model’s size - up to 99% fewer parameters to actually train.)

The original professor stays frozen (unchanged, safe, no risk of “forgetting” general knowledge.)

When the professor speaks, they combine: original brain (frozen) + tiny LoRA post-its (your personal tweaks).

The answer feels fully personalized to you, but almost no extra compute/memory is needed.

Mathematically its clever. The update to any big weight matrix is approximated as the product of two much smaller matrices (low rank = skinny rectangles instead of huge squares). So instead of changing 10,000 x 10,000 numbers, you only train something like 100 x 10,000 + 10,000 x 100 = ~2 million numbers. Huge savings.

BitNet already makes the base model tiny and fast on phones (weights are just -1/0/+1 instead of full floating point numbers which already saves ~70-90% of memory & compute).

But fine-tuning BitNet normally was still hard/impossible on phones because even “updates” were too heavy.

LoRA slashes the update size so dramatically that now the whole personalization process fits in a phone’s limited RAM and memory.

Think of your phone’s AI as a very compressed zip file of a genius librarian (BitNet makes the zip super small so it fits on phone.)

LoRA = keep the zipped library untouched, just add a tiny bookmark file with your personal notes/index. Your phone can handle adding/using the bookmark easily, and when you ask questions, the librarian reads the main zip + your bookmark instantly.

It turns AI from “rented cloud services that sees all your data” into “your own private brain extension that lives on your device and learns only from you.

No more paying $20/month to OpenAI, no data sent anywhere, no censorship, works offline.

Tether/QVAC solved making LoRA + BitNet work cross-platform (AMD/Intel/Nvidia/Apple/Mobile GPUs) for the first time.

LoRA is the unlock button that lets billion parameter personalization escape data centers and live in your pocket. That’s why Paolo and the team are calling it the start of “Stable intelligence”. If this clicks now, the rest of the announcement will feel much more exciting and logical.

Now that you understand LoRA, the post should no longer feel “cool tech jargon + big numbers” shill. Impressive but abstract!

Now that you grasp LoRA as the “tiny post’it notes” trick that freezes the huge core model and only trains a minisule add-on layer (up to 99% frwer paramters to update), the whole thing snaps into place and becomes genuinely revolutionary.

The “billion-parameter training on a phone” claim stops sounding impossible.

Fine-tuning a 13B model normally means updating billions of numbers which needs massive VRAM and power that no phone has.

But with LoRA, you’re only training maybe millions (or fewer) of tiny adapter parameters. Add BitNet’s extreme compression (weights as simple as -1/0/+1), and suddenly that tiny update process fits in a phone’s limited RAM/battery.

The iPhone demo (13B fine-tine), and Samsung/Pixel 3.8B ones aren’t hype, they’re the direct result of LoRA slashing the workload by orders of magnitude.

The “biggest unlock” line hits hard. Paolo calls heterogeneous GPU fine-tuning the biggest unlock because LoRA + BitNet now works everywhere (not just Nvidia CUDA). Before, even if BitNet existed, personalizing it required expensive Nvidia rigs or cloud. Now your AMD laptop, Intel PC, Apple Silicon Mac, or flagship phone can do it.

LoRA is the key that democratizes that persaonlization step azcross hardware - no more “sorry, only works on $10K GPU”.

Privacy and “serve the people” vision feels tangible, not fluffy.

Without LoRA, “local private AI” would mean running a generic frozen model (no real personalization) or shipping your data to the cloud for fine-tuning.

LoRA lets the model learn deeply from your emails/docs/health data entirely on-device, with almost no extra cost. Your AI becomes truly “yours” - customized to your life, offline, private forever.

That’s why Paolo frames it as the “first real-world signal of a local private AI that can truly serve the people.”

“Era of Stable Intelligence” lands as a real shift.

Tether AI is the stablecoin moment for decentralized artificial Intelligence

“Stable” here echoes stablecoins: reliable, decentralized, user-controlled, and no central gatekeepers. LoRA + BitNet + cross-platform Fabric = the technical foundation to escape cloud dependency. What used to be locked in data centers (personalized frontier AI) now “escapes” to your pocket.

It’s not just faster/cheaper but it’s a philosophical unlock toward AI sovereignty.

In short, LoRA isn’t just a technique, it’s the unlock button that turns theoretical edge AI into practical reality today.

The announcement isn’t about raw model size anymore. It is about who controls the intelligence (you, on your device) instead of renting it from Big Tech.

With this lens, the post reads like a declaration of independence for personal AI, and the demos prove it’s not vaporware.

Super exciting if you’re into decentralization, privacy, or just hating monthly API bills.

The Wattage Waterfall of Compute

Rohit Chauhan — Thu, 05 Mar 2026 05:56:05 GMT

In Part 1 we established that every logic operation costs energy and generates heat, a constraint thermodynamics imposes permanently.

Today we follow that heat all the way up the stack; from a single chip to a full-scale data center running AI workloads around the clock.

The Wattage Waterfall

A single logic operation at the level of a chip generates heat that demands cooling. A cluster of ten thousand chips demand a power grid. A training run demands that the grid run without interruption for weeks, and once a model deploys, inference demand runs the grid forever.

The Chip: The micro-unit of compute

A single H100 draws 700W of power at peak. One H100 server (8 GPUs) draws 10.2kW of total system power. A single GB200 NVL72 rack - Nvidia’s current Blackwell configuration draws ~120kW. A standard office building draws roughly 50-100 kW which means one AI training rack consumes more power than an entire office building. (1), (2), (3)

The Cluster: Where scale meets physics

A frontier model training cluster runs 10k to 25k+ GPUs simultaneously. At 700W per H100, a 10k GPU cluster draws 7MW continuously. xAI’s colossus cluster in Memphis is ~100,000 H100s drawing ~150-200 MW continuously. (1), (4), (5)

The training run level: The wattage it takes to bring GPT-4 to life

GPT-4 is estimated to consume ~50 GWh of power for training. This is equivalent to powering ~4,800 average US homes for a full year. For xAI’s Grok 4, estimates show ~310 GWh of power consumed for training, a 5-6x increase in magnitude over GPT-4. As models become bigger, the training compute will continue to expand, and Epoch AI estimates training compute to scale ~4x per year and by extension the energy cost of training scales at roughly the same rate. (5), (6), (7), (8)

The inference level: Where energy demand becomes permanent

OpenAI CEO Sam Altman disclosed in a June 2025 blog post that a single ChatGPT query uses ~0.34 watt-hours. Data compiled by Epoch AI independently corroborated Altman’s own estimates and calculates ~0.3 Wh per query. Altman equated this to an oven in usage for a little over one second, or a high-efficiency lightbulb in use for a couple of minutes in his blog. (6), (9), (10)

At 2.5B daily queries (or prompts) reported by TechCrunch citing OpenAI in July 2025 allows us to infer that the daily energy demand from inference alone costs 0.34 x 2.5B watt hours of electricity. In other words, ~850 MWh per day, which works to an annual run rate of 850 MWh x 365, ~310 GWh per year. (11)

As noted above, GPT-4 model training consumed ~50 GWh of electricity. A simple division shows that inference would consume roughly the equivalent of the electricity consumed in the overall training process in ~59 days (50,000 MWh ÷ 850 MWh/day = ~59 days). MIT further states that Inference’s share of the total AI lifecycle energy is ~80-90%, and reasoning models (o1/o3-class models) only escalate this consumption as data from Epoch AI shows 2.5x more tokens consumed by reasoning models over regular models. (6), (10)

The Data Center Level: Where it all aggregates

IEA estimates global data center energy consumption to double from ~415 TWh (c.1.5% of global electricity) to 945 TWh as the base case between 2024 and 2030. For context, this is equivalent to the entire current annual electricity consumption of Japan. A Goldman Sachs report further adds that data centers will command 8% of the total US power consumption by 2030 against ~3% in 2022. This massive demand for power explains the approx. $720B grid spending estimates by 2030 by Goldman Sachs, and an estimated ~$736B capex announcements by the top 5 hyperscalers in 2025-26 alone as outlined above. (12), (13)

Conclusion

The wattage waterfall makes the constraints of physics (thermodynamics) structural requirements. This brings us to the three primary conclusions relevant for all investors trying to build a foundational understanding of the AI trade.

First, power must be generated. Second, that power must be effectively transmitted across the racks, and third, these AI workloads will generate significant heat which must be removed from the environment.

These three requirements not only mandate large upfront investments in the build-out for power, grid, and cooling infrastructure; but also force them as structural, not cyclical requirements irrespective of who wins the model race.

Now that we’ve explained the constraints of thermodynamics in Part 1, and the current energy & cooling requirements for SOTA chips powering massive AI workloads, the next question is who builds it, who powers it, and who profits from it.

In Part 3, I'll map the exact companies whose business models exist to solve the physical constraints outlined above. The investment case for each one is anchored in physical laws that don’t change regardless of who wins the model race. Stay tuned!

If you found this useful, subscribe. Part 3 drops next week.

Citations

NVIDIA Corporation, "NVIDIA H100 GPU — Product Specifications," accessed February 2026.
NVIDIA Corporation, "NVIDIA DGX H100 Datasheet," March 2022. Document number A4-2146027-R3.
NVIDIA Corporation, "DGX GB200 NVL72 Datasheet," NVIDIA Documentation, accessed March 2026.
Pilz, K., Rahman, R., Sanders, J. and Heim, L., "AI Training Cluster Sizes Increased by More Than 20x Since 2016," Epoch AI, October 23, 2024.
James Sanders, Luke Emberson and Yafah Edelman, "What did it take to train Grok 4?" Epoch AI, September 12, 2025.
James O'Donnell and Casey Crownhart, "We did the math on AI's energy footprint," MIT Technology Review, May 20, 2025.
U.S. Energy Information Administration, "Electricity Use in Homes," updated December 18, 2023. Derived estimate: 50 GWh ÷ 10,500 kWh/household/year = ~4,762 households. Rounded to ~4,800 in text.
Sevilla, J. and Roldán, E., "Training Compute of Frontier AI Models Grows by 4-5x Per Year," Epoch AI, May 28, 2024.
Sam Altman, “The Gentle Singularity,” blog.samaltman.com, June 11, 2025.
You, J., "How much energy does ChatGPT use?" Epoch AI, February 7, 2025.
Dastin, J., “ChatGPT users send 2.5 billion prompts a day,” TechCrunch, July 21, 2025.
International Energy Agency, “Energy and AI — Energy Demand from AI,” IEA, April 10, 2025. https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai
Goldman Sachs Research, "AI to drive 165% increase in data center power demand by 2030," Feb 4, 2025.

The Physics Behind the AI Trade

Rohit Chauhan — Fri, 27 Feb 2026 15:56:01 GMT

A Physics first lens to playing the AI trade

Physics is the baseline for the natural world, and also the absolute ceiling for any problem we solve (including artificial intelligence.) Everything above the ‘physics layer’ are human choices that can theoretically be changed. Confusing human choices for physics laws is what most get wrong.

The analysis that follows is grounded in the constraints and opportunities physics imposes. Hard science, not media narratives, will decide how to find the winning opportunities in the AI trade.

The physics that follows exists to serve the investment thesis, not academic completeness. The concepts most applicable are chosen to highlight the constraints, build the analytical foundation, and establish a new lens to evaluate the AI trade.

This framework also serves as a filter to fact-check every tall claim about AI disruption. physics is the reality check that prevents costly mistakes born of following the crowd rather than grounding analysis in reality.

The Physics behind the AI trade

At its core, the entire AI lifecycle from inception to end is governed by five primary physical fundamentals. Each manifests at various layers of the AI assembly line.

The law of Thermodynamics (Landauer’s Limit and Heat Dissipation)
The law of speed of light
The principles of Quantum Mechanics
The laws of information theory (Shannon’s theorem)
The law of Wave physics (Diffraction Limit)

Thermodynamics is already the dominant cost driver in AI infrastructure. The energy and cooling buildout it demands is leading capital allocation decisions across every stakeholder in the race to dominate AI. (1)

Data compiled by Goldman Sachs shows the five largest hyperscalers in the US will deploy a cumulative $736B in Capex for the AI buildout in 2025 and 2026 alone. Going further, McKinsey has projected a total investment of ~$5.2 trillion for AI data centres by 2030. (2)

Of that $5.2 trillion, McKinsey estimates $1.3 trillion flows directly into the energisers; the utilities, cooling system manufacturers, and electrical infrastructure providers whose entire business exists to solve the heat and power problem thermodynamics creates. This is the thermodynamics trade expressed in dollars.

The IEA projects the total demand for energy from data centres to double from ~415 TWh in 2024 to ~945 TWh by 2030, this is the equivalent to adding the entire electricity consumption of Japan in 2024 to the global grid in just six years. These figures establish the scale of what thermodynamics demands in capital terms, and the physics below explains why this demand is permanent. (3, 4)

Thermodynamics imposes physical energy constraints on chip compute capabilities

If you strip away AI to the precise point where physics shows up, it boils down to a single factor: transistors switching states. Billions of times per second, per chip. Those switching events are what execute the matrix multiplications underlying every model, and every switching event costs energy and generates heat.

Every compute operation is subject to the constraints imposed by the second law of thermodynamics. A constraint neither engineering nor software can solve. This is nature imposing its will on every transistor switch on every chip at every data centre ever built by humanity.

The physical law creates two distinct constraints on compute, one theoretical, and one practical. Understanding both builds the foundation for our thesis that the energy infrastructure trade isn’t cyclical capex but a permanent structural requirement.

Constraint 1: Landauer’s Limit - The Theoretical Floor of Compute

First introduced in 1961, the Landauer’s limit provides the baseline of the minimum quantity of heat dissipated in the environment to erase a single bit of information. At room temperature (~300K), the minimum heat released is 2.8 × 10⁻²¹ joules per bit operation. This is thermodynamics enforcing absolute limits no material, architecture, optimization or engineering can go beyond. This is the theoretical floor of all compute, including AI.

Peer reviewed literature on this topic argues that modern chips release roughly a million times more energy per logic operation which exceeds the Landauer’s limit by many orders of magnitude. (5)

While this sets the physics enforced ceiling on compute per watt, a better lens to understand this is backed by solid research published by Ho, Erdil & Besiroglu (2023) which ignores the Landauer floor and instead tackles the more important and immediate question, “how much more efficient can silicon transistors realistically get before hitting their own engineering limits?” (6)

The current state of the art CMOS architectures, the transistor-based chip designs running today’s AI GPUs are operating ~207x above the maximum efficiency Ho et al’s research outlines. In other words, there is capacity to improve chip efficiency to the order of 207x before we hit the practical limits of how much energy per logic operation these chips will consume.

While this may sound a lot, current AI compute demand is scaling 4-5x per year, and all the efficiency gains are consumed by demand growth before they can actually reduce energy infrastructure requirements. This explains the massive grid capex and hyperscaler expansion efforts in line with the estimates and data compiled by McKinsey and Goldman Sachs outlined above. (7)

Constraint 2: Heat Dissipation - The practical constraint of compute

Landauer’s principle is evidence of physics imposing its will on compute. Every switch event is generating heat as a consequence of executing logic operations within the constraints of physics.

Take the Nvidia H100 for instance, which draws ~700W per chip. The next-gen GB200 NVL72 rack draws nearly 120kW for 72 Blackwell GPUs, in comparison, a single H100 rack consumes 10-30kW. This is 4-12x increase in rack power density in a single generation of GPU offerings. This increase across generations isn’t a one-time thing either, mapping the energy consumption across four generations of chips starting with V100 (~300W in 2017), A100 (~400W in 2020), H100 (~700W in 2022), and the latest generation of GB200 (~1,000W in 2024) shows how each generation demands meaningfully more power and cooling than the last. (8)

Note: Citations 9-12 links to the original source material for the table data above.

At these rates of power consumption, liquid cooling isn’t just optional infrastructure, it is mandatory, and scales permanently with every watt deployed.

To sum up, Landauer’s limit establishes the physical cost of compute which forces generation and consumption of electricity, and heat dissipation mandates cooling infrastructure as heat scales permanently with compute. These aren’t cyclical capex driven by a temporary infrastructure gap. They are structural requirements imposed by physics.

Thermodynamics has already decided what the AI infrastructure stack must look like. The next question is who builds it, who powers it, and who profits from it. In Part 2, I'll map the exact companies whose business models exist to solve the physical constraints outlined above. The investment case for each one is anchored in physical laws that don’t change regardless of who wins the model race. Stay tuned!

If you found this useful, subscribe below. Part 2 drops next week.

Citations

Goldman Sachs Research, "How AI Is Transforming Data Centers and Ramping Up Power Demand," July 16, 2024.
Jesse Noffsinger et al., "The cost of compute: A $7 trillion race to scale data centers," McKinsey Quarterly, April 2025.
International Energy Agency, "Energy and AI — Energy Demand from AI," April 2025.
Enerdata via Statista, "Electricity consumption worldwide in 2024, by leading country," July 2025.
Chattopadhyay, P., Misra, A., Pandit, T., and Paul, G., "Landauer Principle and Thermodynamics of Computation,", June 2025.
Ho, A., Erdil, E., and Besiroglu, T., "Limits to the Energy Efficiency of CMOS Microprocessors," 2023 IEEE International Conference on Rebooting Computing, December 2023. . H100 today achieves ~1.4×10¹² FP16 FLOP/J. The estimated maximum CMOS efficiency ceiling is ~2.9×10¹⁴ FP16/J. That gap is approximately 207x.
Sevilla, J. and Roldán, E., "Training Compute of Frontier AI Models Grows by 4-5x Per Year," Epoch AI, May 28, 2024.
NVIDIA Corporation, "DGX GB200 User Guide — Hardware," NVIDIA Documentation, August 2025.
NVIDIA Corporation, “NVIDIA V100 Tensor Core GPU Datasheet,” January 2020.
NVIDIA Corporation, "NVIDIA A100 Tensor Core GPU Datasheet," May 2022.
NVIDIA Corporation, "NVIDIA H100 GPU — Product Specifications," accessed February 2026.
Lenovo, "ThinkSystem NVIDIA HGX B200 180GB 1000W GPU Product Guide," Lenovo Press, LP2226, January 13, 2026. https://lenovopress.lenovo.com/LP2226 — citing NVIDIA GPU specifications, Table 2.