YouTube Research · April 19, 2026
Jensen Huang × Dwarkesh Patel — Will Nvidia's Moat Persist?
TL;DR
- Jensen frames Nvidia as "the thing in the middle" of turning electrons into tokens — a job he insists is harder to commoditise than the software people see from the outside.
- The moat Dwarkesh prods at is the supply chain: ~$100B of explicit purchase commitments and another ~$150B of implicit ones locking up scarce logic, HBM, packaging, and photonics capacity.
- On TPUs he is blunt — Anthropic is a one-off, not a trend; without Anthropic there's "no TPU growth at all, no Trainium growth at all."
- He rejects the Nvidia-as-hyperscaler idea on philosophy — "do as much as needed, as little as possible" — and openly concedes that missing the early Anthropic investment was his biggest mistake.
- The China section is the episode's longest and sharpest fight: Jensen argues export controls only push Chinese AI onto a non-American stack while leaving their compute (energy + 7nm + algorithms) perfectly adequate.
- Parting frame: Nvidia without AI would still be very large — accelerated computing scales general-purpose computing past the end of Moore's law, AI is just the biggest domain that reached it first.
Electrons to tokens[00:00]
Dwarkesh opens with the bear case: Nvidia designs, TSMC manufactures, OEMs assemble — so if software is getting commoditized by AI, why won't the same fate catch Nvidia? Jensen's response is the line the rest of the interview keeps circling back to.
He argues the work of making a token cheaper, better, and more valuable — "the artistry, engineering, science, invention that goes into making that token valuable" — is "far from deeply understood." His mental model of the company follows: input electrons, output tokens, and the rule is "do as much as necessary, as little as possible".
The visible moat then is financial and relational: ~$100B of explicit purchase commitments disclosed in the last 10-Q, with SemiAnalysis estimating another ~$150B of implicit ones.2 Jensen's claim is that upstream partners make these bets for Nvidia specifically because "they know I have the capacity to buy it and sell it through my downstream." Bottlenecks (CoWoS, HBM, silicon photonics) get "swarmed" for a year or two and then fall. "None of the bottlenecks last longer than 2 or 3 years. None of them."
Why TPUs don't break the moat[16:25]
Dwarkesh's most pointed challenge: two of the top three models (Claude, Gemini) are trained on TPUs, not Nvidia. What does that say about the moat? Jensen's answer routes around the TPU question entirely: Nvidia didn't build a tensor processing unit — it built accelerated computing, which also happens to accelerate AI. Molecular dynamics, fluid dynamics, structured data, image generation — all run on CUDA. "Our market reach is far greater than any TPU, any ASIC can possibly have."
The deeper argument is algorithmic, not benchmark. Moore's law gives you ~25% a year; Nvidia's Hopper→Blackwell claim is 35–50× energy efficiency on real inference workloads3 — credited to the rack-scale NVL72 fabric that ties 72 GPUs into a single NVLink domain13, and you don't get that from transistors alone. You get it by rewriting the algorithm every generation — new attention variants, hybrid SSMs, diffusion+AR fusions — and only a programmable architecture with CUDA lets you do that end-to-end across the processor, the fabric, and the libraries.
When Dwarkesh points at Anthropic's just-announced multi-gigawatt TPU deal with Google and Broadcom1, Jensen is direct: "Anthropic is a unique instance and not a trend. Without Anthropic, why would there be any TPU growth at all? It's 100% Anthropic. Without Anthropic, why would there be any Trainium growth at all? It's 100% Anthropic." OpenAI, for all its AMD and in-house Titan dabbling9, remains "vastly Nvidia." (AWS's Project Rainier — the ~500k-Trainium2 cluster14 that trains Claude — is the concrete reason Anthropic is the TPU/Trainium outlier.)
Why Nvidia doesn't become a hyperscaler[41:06]
Dwarkesh rewinds: Nvidia had the cash years earlier — why not become a foundation lab, or a cloud? Jensen's answer is a philosophy statement — "do as much as needed, as little as possible"6 — and an unusual concession: "My mistake is I didn't deeply internalize that they really had no other options — that a VC would never put in $5–10B of investment into an AI lab with the hopes of it turning out to be Anthropic."
Instead, Nvidia backstops the neo-cloud ecosystem that wouldn't otherwise exist — CoreWeave has a reported ~$6.3B revenue backstop with ~$2B invested5, Nscale and NBS are similarly seeded — and he is now committing up to $100B to OpenAI progressively as it deploys 10 GW of Nvidia systems4. The scarcity allocation rule, for once stated flatly: not highest bidder, first-in-first-out by PO, with data-center readiness the tiebreaker.
The TSMC relationship punctuates the chapter: 30 years, no legal contract.7 "Sometimes I got a better deal, sometimes I got a worse deal" — but you can bet a hundred-billion-dollar AI factory on Nvidia's yearly cadence, and on exactly one foundry. "Your token cost will decrease by an order of magnitude every single year. I can count on it like I can count on the clock."
Should we sell AI chips to China?[57:36]
The longest and most adversarial segment. Dwarkesh opens by channeling Dario Amodei's "country of geniuses in a data center"8 and the recent Mythos model that Anthropic claims found cyber vulnerabilities in every major OS (including OpenBSD). Jensen's reply reframes the premise: "Mythos was trained on a fairly mundane amount of fairly mundane capacity — abundantly available in China."
His core claim is that export controls haven't taken Chinese compute off the table — they've just shifted the bottleneck from logic to energy, and China has energy. "They've got plenty of chips. They've got most of the AI researchers. If you're worried about them, what is the best way to create a safe world? Victimising them, turning them into an enemy — likely isn't the best answer."
The sharp exchange is about what Dwarkesh calls the "critical years" thesis — that the next few years of capability gap matter most. Jensen inverts it: "Why are you causing one layer of the AI industry to lose an entire market so that you could benefit another layer?" DeepSeek11 is the concrete fear: the day it ships optimized for Huawei, not Nvidia, is "a horrible outcome for our nation." His preferred future has every AI developer in the world — including in China — building on the American tech stack, because that is what preserves US centrality, not what threatens it.
One architecture, and what Nvidia would be without AI[1:35:06]
Two bow-tie questions. First: why not run Cerebras-style wafer-scale, Dojo-style huge-package, and a no-CUDA architecture all in parallel to hedge? Jensen's answer is empirical rather than ideological: they do simulate it all. "It's just that we don't have a better idea. They're in our simulator provably worse." The one recent exception is folding Grok into the CUDA ecosystem as a higher-ASP, lower-throughput inference segment10 (this is of a piece with the $5B Intel stake announced in late 202515 — accelerators are welcome if they address a real workload Nvidia itself can't), because "the value of tokens has gone up so high" that disaggregating the inference market finally pencils.
Second: what would Nvidia be without the deep-learning revolution? Still enormous, he says — accelerated computing is a structural bet that general-purpose computing has "largely run its course." CUDA plus a GPU speeds up specific algorithms 100×; AI was the biggest such domain, but molecular dynamics, seismic processing, computational lithography, structured-data processing and quantum chemistry are each their own CUDA-X library. "Tensors is not the only way that you compute."
Annotations & Sources
- 1 Anthropic announced on 6 April 2026 a ~3.5 GW expansion of Google TPU capacity (built with Broadcom), coming online through 2027 and sited primarily in the US. Anthropic's run-rate had surpassed $30B at announcement, up from ~$9B at the end of 2025. source →
- 2 Nvidia's fiscal Q2 2026 10-Q discloses long-term supply, capacity and inventory purchase obligations with foundry, memory and packaging suppliers. SemiAnalysis has argued the true forward-looking commitment — including informal capacity reservations — is closer to $250B. source →
- 3 Nvidia's March 2024 Blackwell announcement claims GB200 NVL72 delivers up to 25× lower TCO/energy and 30× faster real-time LLM inference vs H100. Gains come from FP4 Transformer Engine, HBM3e, and NVLink-5 rack-scale fabric. source →
- 4 Nvidia's 22 September 2025 press release commits up to $100B to OpenAI progressively as OpenAI deploys 10 GW of Nvidia systems (first GW on Vera Rubin in 2H 2026). Nvidia's CFO later noted the deal remained a letter of intent without a definitive agreement. source →
- 5 Per CoreWeave's October 2025 SEC filings, Nvidia is obligated to buy any unused CoreWeave capacity up to $6.3B through April 2032 — a revenue backstop that lets CoreWeave raise cheaper debt while capping Nvidia's exposure. source →
- 6 At GTC 2025 Nvidia unveiled Rubin NVL144 (3.6 EFLOPS dense FP4, HBM4 at 13 TB/s) for 2026 and Rubin Ultra NVL576 (15 EFLOPS FP4) for 2H 2027, followed by a Feynman architecture in 2028 introducing co-packaged optics. source →
- 7 Jensen has publicly described the Nvidia–TSMC partnership as built on decades of trust rather than legal agreements, with no signed contract despite cumulative business likely reaching hundreds of billions of dollars. In 2013 TSMC founder Morris Chang offered Huang the TSMC CEO role. source →
- 8 Published October 2024, Dario Amodei's essay "Machines of Loving Grace" summarizes advanced AI as "a country of geniuses in a datacenter" — millions of parallel superhuman instances operating at accelerated speed — and projects cluster sizes sufficient to run those populations by ~2027. source →
- 9 OpenAI's Triton 1.0 announcement introduced a Python-embedded DSL where researchers with no CUDA experience can write kernels that approach hand-tuned expert performance. It is now the kernel language behind PyTorch's torch.compile, with ports to AMD and other accelerators. source →
- 10 SemiAnalysis's InferenceMAX runs nightly inference benchmarks across Nvidia GB200, AMD MI355X and other accelerators, tracking real-world LLM serving performance and cost per token as software stacks evolve. Nvidia's TCO advantage here is what Jensen keeps inviting TPU and Trainium teams to show up and contest. source →
- 11 DeepSeek-R1 (20 Jan 2025) showed that RL without SFT (R1-Zero) produces emergent reasoning, and the fully open weights release plus six distilled dense models challenged assumptions that frontier reasoning required Western compute budgets. source →
- 12 Huawei's Ascend 910C reportedly delivers ~800 TFLOPS FP16 and 3.2 TB/s memory bandwidth vs H200's 4.8 TB/s. The CloudMatrix 384 rack (384 910C dies) reaches ~300 PFLOPS BF16 — roughly 1.7× a GB200 NVL72 — by trading efficiency for scale and an all-optical mesh. source →
- 13 The liquid-cooled GB200 NVL72 rack (120 kW) pairs 36 Grace CPUs with 72 Blackwell GPUs over a 130 TB/s NVLink fabric with 30 TB unified memory, delivering up to 1.4 EFLOPS FP4. source →
- 14 Brought online in late October 2025, AWS's Project Rainier spans multiple US data centers using Trainium2 UltraServers (16 chips each, 4-server groups) with petabit-scale EFA networking — giving Anthropic more than 5× its prior training compute and scaling toward 1M+ Trainium2 chips. source →
- 15 Announced 18 September 2025 and closed 29 December 2025, Nvidia's $5B common-stock stake (~4%) in Intel is tied to Intel building Nvidia-custom x86 data-center CPUs linked via NVLink, plus consumer x86 SoCs integrating RTX GPU chiplets. source →
Full transcript Click any highlighted quote above to jump to its line here. Scroll the panel below to browse the whole interview.
00:00 We've seen the valuations of a bunch of
00:02 software companies crash because people
00:04 are expecting AI to commoditize
00:05 software. And there's a a potentially
00:08 naive way of thinking about things which
00:09 is like look Nvidia sends a GDS2 file to
00:13 TSMC. TSMC builds the logic dies. It
00:16 builds the switches. Um then it packages
00:18 them with the HBM that SK Highex and
00:20 Micron and Samsung make. Then it sends
00:22 it to an ODM in Taiwan where they
00:24 assemble the racks. And so Nvidia is
00:26 fundamentally making software that other
00:27 people are manufacturing. And if
00:28 software gets commoditized, does Nvidia
00:30 get commoditized?
00:32 >> Well, in the end, something has to
00:33 transform electrons to tokens.
00:38 That transformation
00:40 um there's no the transformation of
00:43 electrons to tokens
00:45 uh and making those tokens more valuable
00:48 over time. I I I don't I think that that
00:54 that's hard to hard to um completely
00:57 commoditize
00:59 the transformation from electrons to
01:00 tokens is such an such an incredible
01:03 journey and and making that token. You
01:07 know, it's like making a one molecule
01:09 more valuable than another molecule,
01:11 making one token more valuable than
01:13 another. the amount of artistry,
01:15 engineering, science, invention that
01:17 goes into making that token valuable.
01:21 Obviously, we're we're watching it
01:22 happening in real time. And so, so the
01:26 the the the transformation, the
01:28 manufacturing, um all of the science
01:30 that goes in there is far from un deeply
01:33 understood and it's far from the journey
01:35 is far from far from over. And so, so I
01:38 I doubt that it will happen. Um we're
01:41 going to make it more efficient, of
01:42 course. I mean the whole the whole thing
01:44 about Nvidia
01:45 in fact the way that you frame the
01:47 question is is my mental model of our
01:49 company
01:50 the input is electron the output is
01:53 tokens
01:54 that is in the middle Nvidia and our job
01:58 is to to do as much as necessary as
02:02 little as possible to enable that
02:04 transformation to be done at incredible
02:06 capabilities and and what I mean by as
02:09 little as possible whatever I don't need
02:11 to
02:13 I partner with somebody and I make it
02:14 part of my ecosystem to do. And if you
02:16 look at Nvidia today, we probably have
02:18 the largest ecosystem of partners both
02:20 in supply chain upstream, supply chain
02:22 downstream. all of the computers,
02:25 computer companies and all the
02:26 application developers and all the model
02:28 makers and all the you know AI is a five
02:31 five layer cake if you will and and we
02:34 have ecosystems across the entire five
02:36 layers and and so we try to do as little
02:39 as possible but the part that we have to
02:42 do as it turns out is insanely hard and
02:45 and um
02:46 >> I I don't think that that gets
02:47 commoditized in fact in fact um
02:50 >> uh I also don't think that the the
02:52 enterprise software ware companies uh
02:55 the tools makers you know most of the
02:57 software companies today are tools
02:59 makers um some of them are not um but
03:02 are some of them are workflow
03:05 um codification
03:07 you know systems um but for a lot of
03:10 companies they're tool mmakers for
03:11 example you know Excel is a tool
03:13 powerpoint's a tool uh cadence makes
03:15 tools synopsis makes tools
03:18 I I actually see the opposite of what
03:21 people see I think the number of agents
03:24 are going to grow exponentially. The
03:27 number of tool users are going to grow
03:28 exponentially and it's very likely that
03:32 the number of instances of
03:36 all these tools are going to skyrocket.
03:39 It is very likely the number of
03:41 instances of synopsis design compiler is
03:45 going to skyrocket and the number of
03:48 number of agents that are going to be
03:49 using the floor planners and all of our
03:52 layout tools and our design design rule
03:54 checkers. The number of agents that are
03:58 today we're limited by the number of
03:59 engineers. Tomorrow those engineers are
04:01 going to be supported by a bunch of
04:02 agents. We're going to be exploring out
04:04 the design space like you've never seen
04:05 explore before and want to use the tools
04:07 that we use today. And so, so I think I
04:10 think tool use is going to cause cause
04:12 these software companies to skyrocket.
04:14 The reason why it hasn't happened yet is
04:16 because the agents aren't good enough at
04:18 using their tools yet. And so either
04:20 these companies are going to build the
04:21 agents themselves or agents are going to
04:24 get good enough to be able to use those
04:25 tools. And I think it's going to be a
04:27 combination of both. Um I think in your
04:30 latest filings it was you had almost
04:32 hundred billion dollars in purchase
04:33 commitments with people foundaries
04:36 memory packaging and then uh semi
04:39 analysis has reported that you will have
04:42 $250 billion of these kinds of purchase
04:44 commitments and so one interpretation is
04:45 Nvidia's mode is really that you've
04:47 locked up many years of these scarce
04:50 components that are uh you know somebody
04:52 else might have an accelerator but can
04:54 they actually get the memory to build
04:55 it? Can they actually get the logic to
04:56 build it? And this is really Nvidia's
04:59 big mode for the next few years.
05:00 >> Well, it it's one it's one of the things
05:02 that we can do that is hard for someone
05:04 else to do. The reason why we could we
05:07 we've made enormous commitments
05:09 upstream. Um some of it is explicit.
05:12 These commitments that you mentioned,
05:14 some of it is implicit. Um, for example,
05:17 a lot of the investments that are
05:19 upstream are made by our our supply
05:22 chain because I said to the CEOs, "Let
05:25 me tell you how big this industry is
05:27 going to be and let me explain to you
05:28 why and let me reason through it with
05:30 you and let me show you what I see." And
05:33 so as a result of that that process of
05:36 of uh informing inspiring um aligning
05:41 with CEOs of all different industries
05:45 upstream they're willing to make the
05:47 investments. Now why are they willing to
05:49 make the investments for me and not
05:50 someone else and the reason for that is
05:51 because they know that I have the
05:54 capacity to buy it buy their supply and
05:58 sell it through my downstream. the fact
06:00 that Nvidia's downstream supply chain
06:03 and our downstream demand is so large,
06:07 they're willing to make the investment
06:09 upstream. And so if you look at GTC
06:13 um and and uh you know, people are
06:15 marveled by the scale of GTC and the
06:17 people that go, it's a 360° that the
06:20 entire universe of AI all in one place
06:24 and they they're all in one place
06:26 because they need to see each other. I
06:28 bring them together so that the the
06:29 downstream could see the upstream. The
06:31 upstream could see the downstream and
06:33 all of them could see all the advances
06:35 in AI and very importantly they can all
06:38 meet the AI natives and all the AI
06:39 startups that are all you know being
06:41 being built and all the amazing things
06:43 that are happening so that they could
06:45 see firsthand all the things that I tell
06:46 them. And so I spend a lot of my time
06:49 informing directly or indirectly um our
06:53 supply chain and our partners and our
06:55 ecosystem about the opportunity that's
06:57 that's in front of us. You know, most of
06:59 my keynotes, you know, some some people
07:02 always say, you know, Jensen
07:05 in most keynotes, it's like one
07:07 announcement after another announcement
07:08 after another announcement after another
07:10 announcement.
07:12 our keynotes are there's always a part
07:15 of it that's a little torturous in the
07:17 sense that it's almost comes across like
07:19 an ed like education and and in in fact
07:22 that's exactly on my mind. I need to
07:25 make sure that the entire supply chain
07:27 upstream and downstream the ecosystem
07:30 understands
07:32 what is coming at us, why it's coming,
07:35 when it's coming, how big is it going to
07:37 be, and be able to reason about it
07:39 systematically just like I reason about
07:41 it. and and so so I think the the the
07:45 the mode as you you describe it we're
07:48 able to of course um build for a future
07:52 uh if our next next several years is a
07:55 trillion dollars in in scale we have the
07:57 supply chain to do it without our reach
08:02 the velocity of our business you know
08:05 just as there's cash flow there's supply
08:07 chain flow there turns uh nobody's going
08:10 to build a supply chain for an AR
08:12 architecture if the architecture the
08:14 business turns is low. And so our
08:17 ability to sustain the scale is only
08:20 because our downstream demand is so
08:22 great and they see it and they all hear
08:24 about it. They they see it all coming.
08:26 And so that's it allows us to do the
08:29 things that we're able to do at the
08:30 scale we're able to do.
08:32 >> I do want to understand more concretely
08:33 whether the upstream can keep up. Um for
08:37 many years now you guys have been 2xing
08:40 revenue year-over-year. You guys have
08:41 been more than tripling the amount of
08:43 flops you're providing to the world year
08:44 over year
08:44 >> and 2xing at the scale now is really
08:46 incredible.
08:47 >> Exactly.
08:47 >> So then you look at logic say you're the
08:51 biggest customer on TSMC's N3 node and
08:55 um you're one of the biggest on uh AI as
08:58 a whole this year is going to be 60% of
08:59 N3. It's going to be 86% next year
09:01 according to some analysis. How how do
09:03 you 2x if you're the majority? Um and
09:07 how do you do that year-over-year? So
09:09 are we are we in a regime now where the
09:11 growth rate in AI compute has to slow
09:13 because of upstream? Do you see a way to
09:15 get around these you know you how do we
09:18 build 2x more fabs year-over-year
09:20 ultimately?
09:21 >> Yeah, at some at some level um the the
09:26 instantaneous demand
09:28 uh is greater than the supply upstream
09:32 and downstream uh in the world. And and
09:37 it could be at any instant any instance
09:41 we could be limited by the number of
09:42 plumbers.
09:43 >> Mhm.
09:44 >> Which which actually happens.
09:46 >> The plumbers are invited to next year's
09:47 GTC.
09:48 >> Yeah. You know, by the way, great idea.
09:51 >> But that's a good condition. You you
09:53 want you want you want a market you want
09:56 an industry where the instantaneous
09:58 demand is greater than the total supply
10:01 of the industry. Um the opposite is
10:03 obviously less good. If we're too far
10:06 apart, uh if one particular item, one
10:09 particular component is too far too far
10:11 away, um obviously obviously the
10:14 industry swarms it. So for example,
10:17 notice people aren't talking very much
10:18 about co-ass anymore.
10:20 >> Yeah.
10:20 >> And the reason for that is because for
10:22 two years we swarmed a living daylights
10:23 out of it and we double double double on
10:26 on several doubles and and now I think
10:28 we're in a fairly good shape. And TSMC
10:31 now knows that co-ass supply has to keep
10:34 up with the rest of the logic demand and
10:36 the memory demand and and so so they're
10:38 scaling co-ass um and their scaling uh
10:42 you know future packaging technologies
10:44 at the same level as a scale logic which
10:46 is terrific because for a long time
10:48 co-ass was rather specialty and um uh
10:52 HBM was rather specialty but they're not
10:55 specialties anymore people now realize
10:56 they're mainstream computing technology
10:59 Um and and then and of course uh we're
11:02 now much more able to influence a larger
11:07 scope of our supply chain. In the past
11:10 in the past um you know in the beginning
11:13 of the AI revolution all the things that
11:15 I say now I was saying 5 years ago and
11:18 some people believed in it and invested
11:20 in it. for example, uh, Sanjay and and
11:23 the Micron team. I still remember the
11:25 meeting really well where where I I was
11:28 clear about exactly what's going to
11:29 happen and why it's going to happen and
11:31 and the predictions the predictions that
11:33 that um of today and they they really
11:37 doubled down on it and we partnered with
11:39 them and uh across LPDDR across you know
11:42 HBM memories uh they really invested in
11:44 it and and it it it obviously has been
11:47 tremendous for the company. uh some some
11:50 people came a little bit later and uh
11:52 but they now they're all here and so I I
11:54 think the each one of these generation
11:56 each one of these bottlenecks
11:59 gets a great deal of attention um and
12:02 now we're we're prefetching the
12:04 bottlenecks uh years in advance. So for
12:06 example uh the the the investments that
12:09 we've done uh with uh with Lum and
12:12 Coherent and um all of the silicon
12:15 photonix ecosystem uh the last several
12:18 years we really reshaped the ecosystem
12:20 and the supply chain silicon photonix.
12:23 We we u built up an entire supply chain
12:25 around TSMC. We partnered with them on
12:28 coupe uh invented a whole bunch of
12:30 technology. We licensed uh those patents
12:33 to the supply chain. Keep it nice and
12:34 open. Um, and so we're preparing the
12:37 supply chain through invention of new
12:39 technologies, new workflows, uh, new
12:42 test, new testing equipment,
12:43 double-sided probing, um, investing in
12:46 companies, helping them scale up their
12:48 capacity. Um, and so, so you could see
12:51 that we're trying to shape the ecosystem
12:53 so that it's ready, the supply chain so
12:55 that it's ready to support the scale. It
12:57 seems like some bottlenecks are easier
12:58 than others. And so scaling up co-ass
13:01 versus scaling up
13:02 >> I went to the hardest one by the way
13:04 >> which is
13:05 >> plumbers.
13:07 >> Yeah,
13:07 >> it's true.
13:08 >> Yeah. Yeah. I actually went to the
13:09 hardest one. Yeah.
13:10 >> Yeah. Plumbers and electricians. And the
13:12 reason for that is because
13:13 >> because and this is one of the concerns
13:14 that I have about of all the doom the
13:16 doomers um describing the end of end of
13:20 work and killing of jobs. And you know,
13:23 one of the things that that that um if
13:26 we discourage people from being software
13:28 engineers, we're going to run out of
13:30 software engineers. And and uh the same
13:33 prediction 10 years ago, some of the
13:35 some of the doomers were were uh uh
13:38 saying that we're telling people
13:40 whatever you do, don't be a radiologist.
13:42 And you might hear some of those some of
13:44 those videos are still on the web. You
13:46 know, radiology is is going to be the
13:48 first career to go. Nobody's the world's
13:50 not going to need any more radiologists.
13:51 Guess what? But we're short of
13:52 radiologists.
13:54 >> Oh, but okay. So, going back to this
13:55 point about well some things you scale
13:58 other things like how do you actually
14:00 get how do you actually manufacture 2x
14:02 the amount of logic a year? Ultimately
14:03 that's bottleneck by memory and logic
14:05 are bottleneck by UV. How do you get to
14:07 2x as many UV machines a year?
14:09 >> Yeah.
14:10 >> Year over year.
14:10 >> None of that none of that's impossible
14:12 to scale quickly. You just need to you
14:15 you could do all of that is easy to do
14:17 within two or three years.
14:19 You just need a demand signal that it's
14:21 not it once you once you can build one
14:24 you can build 10 and once you can build
14:25 build 10 you can build a million and so
14:28 these things are not not hard to
14:29 replicate. How far down the supply chain
14:31 do you go where you do you go to ASML
14:34 and say hey if I look out three years
14:35 from now for me to for Nvidia to be
14:38 generating two trillion in a year in
14:40 revenue we need way more AUV machines
14:41 and
14:42 >> some of them I have to directly uh some
14:44 of them are indirectly and some of them
14:46 um if I can convince TSMC as ASML will
14:49 be convinced and so that's that you know
14:51 we have to think about the critical
14:53 critical pinch points and uh but if TSMC
14:56 is convinced uh you'll have plenty of EV
15:00 machines in a few years. And so none of
15:03 that my point is that none of the
15:05 bottlenecks last longer than a couple 2
15:07 three years. None of them. And meanwhile
15:10 meanwhile we're uh improving computing
15:13 efficiency by 10x 20x in the case of
15:16 Hopper to Blackwell some 30 50x um we're
15:20 coming up with new algorithms because
15:22 CUDA is so flexible. Uh we're we're
15:25 developing all kinds of new techniques
15:26 so that we drive efficiency. uh in
15:29 addition to increasing capacity. Yeah.
15:31 And so so there those those are those
15:33 are things that that none of that worry
15:35 me.
15:36 >> It's the stuff that's downstream from
15:38 us. Um energy policies that prevent
15:41 energy from from you know you can't grow
15:44 you can't create you can't create an
15:46 industry without energy. You can't
15:47 create a whole new manufacturing
15:49 industry without energy. Uh we want to
15:51 re-industrialize the United States. We
15:53 want to bring back uh chip manufacturing
15:55 and computer manufacturing and packaging
15:57 and we want to build new things like EVs
15:59 and robots and we want to build AI
16:01 factories and you you can't build any of
16:03 these things without energy and those
16:06 things take a long time but more chip
16:09 capacity that's a two threeear problem
16:11 more coass capacity 2 three year problem
16:13 >> interesting I I feel like I have guests
16:15 tell me the exact opposite thing
16:16 sometimes and I don't in this case I
16:18 just don't have the technical knowledge
16:19 to adjudicate but
16:20 >> well the beautiful thing is you're
16:21 talking to the expert Yeah, true, true.
16:25 Um, okay. I want to ask about um your
16:27 competitors.
16:28 >> Yeah.
16:28 >> So, if you look at TPU,
16:31 >> arguably two out of the top three models
16:33 in the world, Claude and Gemini, were
16:36 trained on TPU,
16:39 what does that mean for Nvidia going
16:40 forward?
16:41 >> Um, well, we have we have a very
16:43 different we built a very different
16:44 thing. Um, you know, what what Nvidia
16:48 built is accelerated computing.
16:51 not a tensor processing unit.
16:55 And uh accelerated computing is used for
16:57 all kinds of things. You know, molecular
16:58 dynamics and quantum chromodnamics and
17:02 it's used for data processing,
17:05 data frames, structured data,
17:07 unstructured data. It's used for um
17:11 fluid dynamics, particle physics, you
17:13 know, and in addition, we use it for AI.
17:17 And so accelerated computing is is um
17:20 much more diverse and and although AI is
17:23 the conversation today is obviously very
17:25 important and impactful uh computing is
17:29 much broader than that and what Nvidia
17:32 has done is reinventing reinvented the
17:34 way computing is done from general
17:35 purpose computing to accelerate
17:37 computing. Our market reach is
17:41 far greater than any any TPU can any ASA
17:46 can possibly have. And so if you look at
17:47 our position,
17:49 uh we're the only company that that
17:52 accelerates applications of all kinds.
17:54 We have a gigantic ecosystem and so all
17:57 kinds of frameworks and algorithms all
17:59 run on Nvidia. And because our computers
18:04 are designed to be operated by other
18:07 people, anyone who's an operator could
18:10 buy our systems.
18:13 Most of these homebuilt systems you have
18:16 to be your own operator because it was
18:18 never designed to be flexible enough for
18:20 other people to operate. And so as a
18:22 result of the fact that anybody can
18:24 operate our systems, we're in every
18:26 cloud including Google and Amazon and
18:29 you know Azure and OCI and right and so
18:32 whether you want to operate it to rent
18:35 or operate it if you want to operate to
18:37 rent you better have large ecosystem of
18:39 customers in many industries that be the
18:42 offtakers. if you're operating it if you
18:46 if you want to operate it for yourself
18:48 um we you know we obviously have the
18:50 ability to help you operate yourself
18:52 like for example for Elon with XAI and
18:55 uh because we could we could enable
18:57 operators uh in any any company in any
19:02 industry you could use it uh to build a
19:04 supercomput for uh scientific research
19:07 and drug discovery at Lily and so we can
19:11 help them operate their own
19:12 supercomputer and and use it for the
19:14 entire diversity of drug discovery and
19:17 biological sciences um that that we
19:19 accelerate
19:20 >> and so so there there just you know a
19:23 whole bunch of applications that we can
19:25 address that you can't do so with TPUs
19:28 because Nvidia's built CUDA as a
19:31 fantastic tensor processing unit as well
19:34 but it does you know it does every every
19:36 life cycle of data processing and
19:38 computing and AI and so on so forth and
19:41 so I our our market opportunity is just
19:43 a lot larger. Our reach is a lot greater
19:47 and because we have such a large um we
19:51 basically support every application in
19:53 the world now you could build Nvidia
19:55 systems anywhere and know that there
19:56 will be customers for it
19:58 >> and so it's a very different thing. Uh
20:00 this is going to be sort of a long
20:01 question but you know you have
20:02 spectacular revenue um and this revenue
20:05 is mostly you're not making 60 billion a
20:07 quarter from uh pharma and um quantum
20:10 you're making it because AI is
20:13 unprecedented technology that is growing
20:14 unprecedentedly fast and so then the
20:16 question is what is best for AI
20:18 specifically and I'm not in the details
20:20 but I talked to my AI researcher friends
20:21 and they say look when I use a TPU it's
20:23 this big systolic array that's perfect
20:25 for doing matrix multiplies whereas a
20:27 GPU is very flexible It's great when you
20:30 have lots of branching when you have um
20:33 irregular memory access but with these
20:36 you know what what is AI just like these
20:37 very predictable matrix multiplies again
20:39 and again and again and you don't have
20:41 to give up any die area for warp
20:43 schedulers for you know switches between
20:45 threads and memory banks and so the TPU
20:48 is really optimized for the majority the
20:50 bulk of this growth in revenue and use
20:52 case for uh compute that is coming
20:54 online right now um yeah I I wonder how
20:57 you react to
20:59 um
21:01 matrix multiplies is an important part
21:03 of AI but it's not the only part of AI
21:06 and if you want to come up with a new
21:08 attention mechanism or if you want to
21:10 disagregate in a different way if you
21:13 want to come up with a whole new type of
21:17 architecture altogether for example you
21:20 know a hybrid SSM uh if you want to use
21:23 a you want to create a model that that
21:26 um that fuses diffusion and auto
21:30 reggressive somehow. Uh you you want an
21:33 architecture that's just generally
21:35 programmable
21:36 and and we run everything you can
21:40 imagine. And so that's the advantage. It
21:42 allows for invention of new algorithms a
21:45 lot more a lot a lot more easily.
21:48 >> And so because it's a programmable
21:50 system and and the ability to invent new
21:53 algorithms is really what makes AI
21:56 advance. So quickly, you know,
22:00 TPUs like anything else is impacted by
22:03 Moore's law. And we know that Moore's
22:05 law is increasing about 25% per year.
22:08 And so the only way to really get 10x
22:12 leaps, 100x leaps,
22:15 is to fundamentally change the algorithm
22:19 and how it's computed every single year.
22:22 >> And that's Nvidia's fundamental
22:23 advantage. The only reason why we were
22:27 able to make black well to hopper 50
22:29 times, you know, I said it was 35 times
22:32 and and and when I first announced it
22:34 was going to black wall is going to be
22:35 35 times more energy efficient than
22:38 hopper. Uh nobody believed it and and uh
22:42 and then and then Dylan wrote an
22:43 article. He said he said in fact in fact
22:45 I sandbagged it's actually 50 times. And
22:49 you can't reasonably do that with just
22:50 Moore's law. And so the the way that we
22:54 solve that problem is new out new models
22:59 um uh parallelized and disagregated and
23:02 and distributed uh uh across a computing
23:06 system uh and without the ability to
23:10 really get down and come up with new
23:12 kernels with CUDA, it's really hard to
23:14 do and and so the combination of the
23:18 programmability of our of our
23:20 architecture
23:21 uh the the fact that Nvidia is an
23:24 extreme codeesign company where we could
23:27 even offload some of the computation
23:29 into the fabric itself, MVLink for
23:31 example into the network spectrum X um
23:35 uh and that we could affect change
23:38 across the processors, the system, the
23:43 fabric, the libraries, the algorithm.
23:47 All of that was done simultaneously.
23:49 Without CUDA to do that, I wouldn't even
23:51 know where to start.
23:53 >> My sponsor Cruso was among the first
23:54 clouds to offer Nvidia's Blackwell and
23:56 Blackwell Ultra platforms, and they just
23:58 announced their Nvidia Vera Rubin
24:00 deployment scheduled for later this
24:01 year. But access to state-of-the-art
24:03 hardware is only part of the story. For
24:05 example, most inference engines already
24:07 do KV caching for a single user's
24:08 forward passes, but Cruso does it across
24:11 users and GPUs. So if a thousand agents
24:13 are running on the same system prompt,
24:14 Cruso only has to compute the KV cache
24:16 once for it to become available to every
24:18 single GPU in the cluster. This is
24:20 especially important as systems get more
24:21 agendic and require much longer prefixes
24:24 in order to use tools and access files.
24:27 In a recent benchmark, Crusoe was able
24:28 to deliver up to 10 times faster time to
24:32 first token and up to five times better
24:33 throughput than VLM. This is just one
24:36 among many reasons that you should run
24:37 your inference workload with Cruso. And
24:39 if you need GPUs for training, you don't
24:41 need to switch clouds. Cruso's got you
24:42 covered there, too. Go to
24:43 cruso.ai/torcashe
24:45 to learn more.
24:47 >> So, this gets at a interesting question
24:49 about um Nvidia's
24:52 clientele where if 60% of your revenue
24:55 is coming from these big five
24:58 hyperscalers, you know, in a in in a
25:01 different era where different customers,
25:02 let's say it's professors who are
25:03 running experiments and they are helped
25:05 a bunch by they need CUDA. um they can't
25:08 use another accelerator. They need to
25:10 just run PyTorch with CUDA and have
25:12 everything optimized. But if you got
25:14 these hyperscalers, they have the
25:15 resources to write their own kernels. In
25:17 fact, they have to to get that extra
25:18 last 5% that they need for their
25:21 specific architecture. Um Anthropic,
25:24 Google are mostly running their own
25:26 accelerators or running TPUs um and
25:29 Tranium, but even OpenAI using GPUs has
25:32 um has Triton which they're like we need
25:35 our own kernels. So they've um down to
25:38 CUDA C++ they've instead of using Kublas
25:41 and Nickel and everything they've got
25:43 their own stack which compiles to other
25:45 accelerators as well. Um and so if most
25:47 of your customers can can and do make
25:51 replacements for CUDA to what extent is
25:53 CUDA really the thing that is going to
25:55 make Frontier AI happen on Nvidia? CUDA.
25:59 CUDA is um is a a rich ecosystem and so
26:04 if you want to build on any computer
26:06 first, building on CUDA first is
26:09 incredibly smart and because the
26:12 ecosystem is so rich uh we support every
26:15 framework. uh if you want to create
26:17 custom kernels uh if you need for
26:20 example we contribute enormously to
26:22 Triton and so the back end of Triton um
26:25 huge amounts of NVIDIA technology
26:28 we're delighted to help every framework
26:30 uh become as great as it can be and
26:33 there's lots and lots of frameworks
26:34 there's Triton there's VLM there's SG
26:36 lang and then there's more right and now
26:38 there's there's a whole bunch of new
26:40 reinforcement learning frameworks coming
26:42 out you know you got Verl you got Nemo
26:44 RL you got a whole bunch of new and then
26:45 the the now with with with post-
26:48 trainining and reinforcement learning
26:50 that entire area is just exploding right
26:53 and so if you want to build on on an
26:55 architecture building on a CUDA makes
26:57 the most sense because you know that the
26:58 ecosystem is great you know that if
27:01 something happens it's more likely in
27:03 your code and not in the mountain of
27:05 code underneath you know don't forget
27:07 the amount of code that you're dealing
27:08 with when you're building these systems
27:11 when something doesn't work was it you
27:14 or was it the computer, you would like
27:16 it always to be you and to to be able to
27:19 trust the computer and and you know,
27:21 obviously we still have lots and lots of
27:22 lots and lots of bugs ourselves, but but
27:25 our system is so well rung out that you
27:29 could at least build on top of the
27:30 foundation. So that's number one is that
27:32 the richness of the ecosystem, the
27:34 programmability of it, the capability of
27:35 it. The second thing is is um if you
27:38 were a developer and you were building
27:40 anything at all, the single most
27:42 important thing you want more than
27:43 anything is install base. You want the
27:45 software that you run to run on a whole
27:47 bunch of other computers. You don't want
27:49 to build a software. You're not building
27:50 software just for yourself. You're
27:52 building software for your fleet or for
27:54 everybody else's fleet because you're a
27:55 framework builder. And Nvidia's CUDA
27:58 ecosystem is ultimately its great
28:01 treasure. We are now I don't know how
28:04 many several hundred million GPUs. Every
28:07 cloud has it goes back to A10, A100,
28:11 H100, H200,
28:14 you know, the L series, the P series. I
28:18 mean, there's a whole bunch of them and
28:21 and they're they're they're in all kinds
28:23 of sizes and shapes. And if you're a
28:24 robotics company, you want that CUDA
28:26 stack to actually run in the CUDA in the
28:28 robot itself. We're literally
28:29 everywhere. And so the install base says
28:32 that once you develop the software, once
28:34 you develop the model, it's going to be
28:36 useful everywhere. And so the install
28:38 base is just too incredibly valuable.
28:41 And then lastly, the fact that we're in
28:43 every single cloud makes us genuinely
28:46 unique because you're an AI company and
28:49 you're an AI developer. You're not
28:51 exactly sure which CSP you're going to
28:53 partner with and where you would like to
28:54 run it. And we'd run it everywhere,
28:56 including on prem for you if you like.
28:58 And so so I think that that the the
29:02 richness of the ecosystem, the
29:05 expansiveness of the of the of the
29:08 install base and the versatility of
29:11 where where where we are, that
29:13 combination is is uh makes CUDA
29:15 invaluable.
29:16 >> That makes a lot of sense. I guess I I
29:17 guess the thing I'm curious about is um
29:20 whether those advantages matter a lot to
29:24 your main customers. um like there
29:28 there's many people who who they might
29:29 matter for for the kind of person who
29:30 can actually build their own software
29:32 stack who are make up most of your
29:33 revenue um especially if you go to a
29:35 world where AI is getting especially
29:37 good at the things which have tight
29:39 verification loops where you can RL on
29:41 them and then this question of how do
29:43 you write a kernel that does attention
29:46 or MLP the most efficiently across a
29:48 scale up it's a very verifiable sort of
29:51 feedback loop and so oh can everybody
29:54 can all the hyperscalers write these
29:55 custom kernel for themselves. Um, and
29:58 they might still Nvidia has uh still has
30:01 great price performance. So, they might
30:02 still prefer to use Nvidia. But then the
30:04 question is does it just become a
30:05 question of who is offering the best
30:08 specs, the best um flops and memory and
30:11 memory bandwidth for a given dollar
30:13 where historically Nvidia has just had
30:14 and still has you know the best margins
30:17 in all of AI across hardware and
30:18 software 70% plus because of this CUDA
30:21 mode. And the question is, oh, can you
30:23 sustain those margins if for most of
30:26 your customers they can actually afford
30:28 to build
30:31 build instead of the CUDA mode. The
30:34 number of engineers we have assigned to
30:35 these AI labs is insane.
30:38 working with them, optimizing their
30:39 stack. And the reason for that is
30:42 because because um nobody knows our
30:44 architecture better than we do. And
30:46 these architectures are not not as
30:49 general purpose as a CPU. The reason the
30:52 reason why a CPU is so, you know, a CPU
30:54 is kind of like like a Cadillac, you
30:56 know, it's it just always, you know, it
30:59 it's a nice cruiser. It never goes too
31:01 fast.
31:03 Everybody drives it pretty well. You
31:05 know, it's got cruise control. you know,
31:08 and everything is easy. But in a lot of
31:11 ways, Nvidia's GPUs are accelerators are
31:14 kind of like F1 racers. And yeah, I I
31:18 could imagine everybody's able to drive
31:20 it at 100 100 miles an hour, but it
31:23 takes quite a bit of expertise to be
31:24 able to push it to the limit. And we use
31:27 we use a ton of AI to create the kernels
31:30 that we have. And um I'm pretty sure
31:34 we're going to still be needed for quite
31:35 some time. And so our expertise um helps
31:38 our our our um uh our AI labs partners
31:43 get another 2x out of their stack
31:47 easily. Often times it's not unusual
31:50 that we you know by the time that we're
31:51 done optimizing their stack or
31:53 optimizing a particular kernel their
31:55 model sped up by 3x 2x 50%.
32:00 Um, that's a huge number, especially
32:04 when you're talking about the installed
32:05 base of the fleet that they have of all
32:07 the hoppers and black walls that they
32:09 have. When you increase it by a factor
32:11 of two, that doubles the revenues. That
32:15 directly translates to revenues.
32:17 Nvidia's computing stack is the best
32:20 performance per TCO in the world, bar
32:22 none.
32:24 Nobody can demonstrate to me that any
32:27 single platform in the world today has
32:30 better performance TCO ratio. Not one
32:33 company. And in fact in fact the the uh
32:36 the benchmarks are out there uh Dylan's
32:39 right inference max is sitting out there
32:41 for everybody to to use and not one TPU
32:44 won't come trrenium won't come. I I
32:47 encourage them to
32:50 use inference max and demonstrate their
32:53 incredible
32:54 inference cost. It's really really hard.
32:57 Uh not nobody wants to show up. Uh ML
33:00 Perf I would I would welcome Trrenium to
33:04 demonstrate their 40% that they claim
33:06 all the time. I would I would love to to
33:08 hear them demonstrate the the uh cost
33:11 advantage of TPUs. It makes no sense in
33:13 my mind. it makes absolutely zero sense
33:16 on first principles. It makes no sense.
33:18 And so I I think the I think the the the
33:21 reason why we're so successful is simply
33:24 because our TCO is so great. There's a
33:27 second you say um 60% of our customers
33:31 are the top five but most of that
33:34 business is external. For example, most
33:37 of AWS is most of Nvidia in AWS is for
33:40 external customers not internal use.
33:42 Most of our customers at Azure,
33:44 obviously all of our customers are
33:45 external. All of our customers at OCI
33:47 are external, not internal use. The
33:49 reason why they they favor us is because
33:52 our reach is so great. We can bring them
33:56 all of the great customers in the world.
33:57 They're all built on Nvidia. And the
33:59 reason why all these C companies are
34:00 built on Nvidia is because our reach and
34:02 our versatility is so great. And so so I
34:06 think I think the flywheel
34:08 is is really install base the
34:11 programmability of our architecture the
34:14 richness of our ecosystem and the fact
34:16 that there's so many AI companies in the
34:18 world there's tens of thousands of them
34:20 now
34:21 >> and if you were one of those AI startups
34:24 what architecture would you would you
34:25 choose you would choose an architecture
34:27 that's most abundant where the most
34:29 abundant in the world
34:31 >> the one has the largest installed base
34:33 where the
34:34 largest installed base and one that has
34:36 a rich ecosystem. And so that's the
34:38 flywheel that that's the reason why
34:39 between the combination of one, our perf
34:43 per dollar is so great um that that uh
34:47 uh they have the lowest cost tokens.
34:49 Second, our perf per watt is the highest
34:52 in the world. And so if if uh uh one of
34:55 these companies if our partners built a
34:58 1 gawatt data center that 1 gawatt data
35:01 center better deliver the maximum amount
35:04 of revenues that and number of tokens
35:07 which directly translates to revenues
35:09 you wanted to generate as many tokens as
35:10 possible maximize the revenues for that
35:12 data center. We have the highest tokens
35:15 per watt architecture in the world. And
35:17 then lastly if your goal is to rent the
35:19 infrastructure we have the most
35:21 customers in the world. M and so that's
35:23 the reason why the flywheel works.
35:25 >> Interesting. I I I guess the question
35:27 comes down to what is the actual market
35:30 structure here because even if there's
35:31 other companies there could have been a
35:33 world where there's tens of thousands of
35:34 AI companies uh that have roughly equal
35:37 share of compute but if even through
35:39 these five hyperscalers really the
35:41 people on Amazon using the computer
35:44 anthropic openai um and these big big
35:47 foundation labs who who can themselves
35:49 afford and have the ability to make
35:52 different accelerators work um
35:54 >> no I I I think your your your assumption
35:57 is is um premise is wrong.
35:58 >> Maybe um let me let me let me ask you a
36:00 slightly different question which is
36:01 >> come back and make me correct your your
36:03 your um your premise.
36:04 >> Okay, let me just ask a different
36:06 question which is okay if everything
36:08 >> but still make sure that make me come
36:09 back and okay and fix because it's just
36:11 too important to AI it's too important
36:14 to the future of science is too
36:16 important to the future of the industry
36:18 that that premise
36:20 >> the premise look let me just finish the
36:22 question and then we can address it
36:24 together. Yeah.
36:25 >> So what do you think if if all these
36:29 things are true about uh price
36:31 performance and performance per watt etc
36:33 are true why why do you think it is the
36:35 case that say um anthropic for example
36:38 just announced a couple days ago they
36:40 have a multi- gigawatt deal with
36:41 Broadcom and uh Google for TPUs and
36:44 majority of their compute obviously for
36:46 Google it's um TPU majority comput so if
36:48 I look at these big AI companies it
36:50 seems like a lot of their there was some
36:52 point where it was all Nvidia
36:54 and now it's not. And so I'm curious how
36:58 to square
37:00 if these things are true on paper, why
37:01 are they going with other accelerators?
37:03 >> Yeah, anthropic is is an is a unique
37:06 instance um and not a trend. Uh without
37:09 an anthropic, why would there be any TPU
37:12 growth at all?
37:14 It's 100% anthropic. Without anthropic,
37:17 why would there be any tranium growth at
37:19 all? It's 100% anthropic. And I think
37:21 that's fairly wellnown and well
37:23 understood. It's not that it's not that
37:25 there's an abundance of ASIC
37:27 opportunities.
37:29 There's only one anthropic,
37:31 >> but OpenAI deals with AMD. They're
37:33 building their own Titan accelerator.
37:35 >> Yeah. But they're mostly I we could all
37:36 acknowledge they're vastly Nvidia and
37:39 and we're going to still do a lot of
37:41 work together.
37:42 >> Yeah. And we're not we're not I'm not
37:46 offended by other people using something
37:48 else and trying things. If they don't
37:51 try these other things, how would they
37:52 know how good ours is, you know? And
37:55 sometimes you got to be reminded of it
37:57 and and um we we got to and we have to
38:00 continuously earn earn um uh the
38:02 position that we're in. Uh you there
38:05 always claims and look at the number of
38:07 AS6 that have been cancelled. Just
38:10 because you're going to build an ASIC,
38:11 you still have to build something
38:12 better. than Nvidia.
38:14 And it's not that easy building
38:16 something better than Nvidia. It's not
38:17 sensible actually, you know. It's we
38:20 Nvidia's got to be missing something.
38:22 Seriously, you know, and because our our
38:24 scale, our velocity, we're the only
38:27 company in the world that's cranking it
38:28 out every single year. Big leaps every
38:31 single year.
38:32 >> I guess their logic is that, hey, it
38:33 doesn't need to be better. It just needs
38:34 to be not more than 70% worse because
38:37 they're paying you 70% margins.
38:39 >> No, no, no. Don't forget uh even an AS6
38:42 margin is really quite high. Nvidia's
38:44 margin 6 70% let's say but an ASIC
38:47 margin is 65.
38:49 What are you really saving?
38:51 >> Oh, you mean from Broadcom or something?
38:52 >> Yeah, sure.
38:54 >> You got to pay somebody.
38:55 >> Yeah.
38:56 >> And so so I think the the ASIC margins
38:58 are are incredibly good from what I can
39:01 tell and and they believe it. They
39:03 believe it so too. And so they're
39:05 they're quite proud of their their
39:07 incredible ASIC margins. And so you ask
39:10 the question why.
39:12 A long time ago we just didn't have the
39:14 ability to do it.
39:17 And and this is this is this is and at
39:19 the time I at the time I didn't deeply
39:23 internalize how difficult it would be to
39:27 build a a foundation AI lab
39:30 >> like OpenAI and Anthropic.
39:33 uh and the the fact that they needed
39:37 huge investments from the supplier
39:39 themselves. Uh we just weren't in a
39:42 position to make the multi-billion
39:43 dollar investment into anthropic so that
39:46 they could use our use our compute but
39:49 Google and and AWS were and they put in
39:52 huge investments in the beginning so
39:54 that anthropic um in return use their
39:57 compute. uh we we just weren't in a
39:59 position to do so uh at the time. Nor
40:02 nor did I I would say my mistake is I
40:06 didn't deeply internalize that they they
40:08 really had no other options that that
40:11 that a VC would never put in 510 billion
40:15 of investment into an AI lab with the
40:18 with the hopes of it turning out to be
40:20 anthropic. And so that was my miss. Uh
40:24 but even if I understood it, I don't
40:26 think we would have been in a position
40:27 to do that at the time. But um I'm not
40:30 going to make that same mistake again.
40:32 And and um uh I'm delighted to invest in
40:35 OpenAI and and um I'm delighted to to uh
40:39 help them scale and I believe it's
40:41 essential to do so. And then and then
40:44 when um uh when I was able to uh anth
40:47 when Anthropic came to us, I'm delighted
40:49 to be an investor, delighted to help
40:52 them scale and um uh but we just weren't
40:55 at at the time able to do so.
40:57 >> If I if I could uh rewind everything, uh
41:01 Nvid Nvidia could have been as big back
41:03 then as we are now, I would have been
41:05 more than happy to do it. This is this
41:07 is actually quite interesting which is
41:09 um for many years Nvidia has been this
41:12 um the company in AI making money making
41:16 lots of money and um now you're
41:19 investing it it's been reported that
41:21 you've done up to 30 billion in open AI
41:23 and 10 billion in um anthropic um but
41:27 now their valuations have increased and
41:28 I'm sure they'll continue to increase um
41:30 and so if over overall these many years
41:33 you know you were giving them the
41:34 compute you saw where yeah was headed
41:36 and then they were worth like onetenth
41:38 what they are now a couple years ago or
41:39 even a year ago in some cases um and you
41:42 had all this cash
41:46 there there's a world where either
41:47 Nvidia themselves becomes a foundation
41:49 lab um does the huge investment to make
41:52 that possible or has made the deals
41:54 you've made now at current valuations
41:56 much earlier on um and you had the cash
41:58 to do it so I am curious actually why
42:00 not have done it earlier
42:02 >> we did it as soon as we could
42:05 We did it as soon as we could have and
42:07 and and um if I could have, I would have
42:10 done it even earlier. Um at the time
42:13 that Anthropic needed us to do it, we
42:15 just weren't in a position to do it. It
42:17 wasn't it wasn't, you know, it wasn't in
42:19 our sensibility to do so. How's that
42:21 like a cash thing or just
42:23 >> Yeah, the level of investment, you know,
42:25 we never invested outside the company at
42:27 the time and not that much and um
42:32 and we didn't realize we needed to,
42:35 you know, I always I always thought that
42:37 they could just go raise VCs for God's
42:39 sakes like like all companies do. Um but
42:43 but um uh what they were trying to what
42:46 they were were trying to do uh couldn't
42:49 have been done through VCs. What OpenAI
42:52 wanted to do couldn't have been done
42:53 through VCs. And and I recognize that
42:55 now. I didn't know it then, you know,
42:57 but that's their genius. That's why
42:59 they're smart,
43:00 >> you know, and so so they realized they
43:02 realized it then that they had to do
43:03 something like that. And I'm delighted
43:05 that they did, you know, and and even
43:07 though even though um we we caused
43:11 Anthropic to have to go to somebody
43:13 else, um I'm still happy that it
43:15 happened. Anthropic's existence is great
43:18 for the world. I'm I'm delighted for it.
43:21 >> Uh I guess you still are making a ton of
43:23 money and you're making way more money
43:24 um quarter after quarter.
43:25 >> It's still okay to have regrets. Um so
43:29 then the question still arises okay well
43:31 now that we're here and you have all
43:33 this money that you keep making um what
43:35 should Nvidia be doing with it and
43:37 there's one answer which says look
43:38 there's this whole middleman ecosystem
43:40 that has popped up for converting um
43:43 capex into opex for these labs so that
43:46 they can rent compute um because the
43:48 chips are really expensive they make a
43:50 lot of money over their lifetime through
43:51 because the models are getting better
43:53 the value that they generate their
43:54 tokens is increasing but they're
43:55 expensive to set up Nvidia has the money
43:58 to do the capex. So, and in fact, you
44:00 are
44:02 you're it's been reported you're back
44:03 stoping core. We have up to 6.3 billion
44:05 and have invested 2B. Um, but yeah, why
44:08 why doesn't Nvidia become
44:10 a cloud themselves? Why doesn't become a
44:12 hyperscaler themselves and run this
44:13 computer out? You have all this cash to
44:14 do it.
44:15 >> This is a philosophy of the company and
44:17 and I think is wise. We should do as
44:19 much as needed as little as possible.
44:23 And and what that means is the the work
44:26 that we do with building our our
44:28 computing platform. If we don't if we
44:30 don't do it, I genuinely believe it
44:33 doesn't get done. If we didn't take the
44:35 risk that we take, if we didn't build
44:37 MVLink the way we built, if we didn't
44:38 build the whole stack, if we didn't
44:40 create the ecosystem the way we did it,
44:42 if we didn't dedicate ourselves to 20
44:44 years of CUDA while losing money most of
44:47 that time, if we didn't do it, nobody
44:49 else would have done it.
44:52 If we didn't create all the CUDA X
44:53 libraries so that they're all domain
44:55 specific, you know, this is several a
44:58 decade and a half ago, we pushed into
45:01 domain specific libraries because we
45:03 realized that if we didn't create these
45:04 domain specific libraries, whether it's
45:06 for ray tracing or image generation or
45:09 even the early works of AI, these
45:11 models, if we didn't create them for
45:13 data processing, structure data
45:14 processing or vector data process, if we
45:16 didn't create them, nobody would. And I
45:19 am completely certain of that. We
45:21 created a a library for computational
45:24 lithography called KU litho. If we
45:26 didn't create it, nobody would have.
45:29 And so accelerated computing wouldn't
45:31 advance the way it has if we didn't do
45:33 what we did. And and so we should do
45:36 that. We should dedicate our company all
45:38 of our might wholeheartedly to go do
45:40 that. However, the world has lots of
45:42 clouds. If I didn't do it, somebody show
45:45 up. And so following the the recipe the
45:48 philosophy of doing as much as needed
45:51 but as little as possible as little as
45:54 possible that philosophy exists in our
45:57 company today and everything I do I do
45:59 it with that lens
46:02 in the case of clouds if we didn't
46:04 support coreweave to exist
46:07 these neo clouds these AI clouds
46:09 wouldn't exist if we didn't help
46:12 cororeweave exist they would not exist
46:15 If we didn't support Nscale, they
46:17 wouldn't be where they are today. If we
46:19 didn't support NBS, they wouldn't be
46:21 where they are today. Now, they are
46:23 they're doing fantastically. Is that a
46:26 business model where no, we should do as
46:28 much as needed as little as possible.
46:30 And so, we're trying we invest in our
46:32 ecosystem because I want our eco
46:35 ecosystem to thrive. And I want our our
46:38 I want I want the architecture and I
46:41 want AI to be able to connect with as
46:44 many industries as possible, as many
46:48 countries as possible and make it
46:50 possible for you know the planet to be
46:52 built on AI and to be built on the
46:54 American tech stack. And so so th that
46:56 vision I think is exactly what we're
46:59 pursuing. Now, one of the things that
47:00 that you mentioned, um, there are so
47:04 many great amazing foundation model
47:05 companies and we try to invest in all of
47:07 them. And this is this is another thing
47:09 that we do. We don't pick winners and we
47:12 we like we we we need to support
47:14 everyone and it's part of our part of
47:17 our our our joy of doing so. It's it's
47:19 an imperative to our business, but we
47:21 also go out of our way not to pick
47:23 winners. And so when I when I invest in
47:25 one of them, I invest in all of them.
47:27 Why do you go out of your arena not to
47:28 pick winners?
47:29 >> Because it's not our job to. Number one.
47:32 Number two, when Nvidia first started,
47:35 there were 60 graphics companies, 60 3D
47:38 graphics companies, uh we are the only
47:41 one that survived. If you would have
47:42 taken those 60 companies, 60 graphics
47:46 companies, and asked yourself which one
47:47 was going to make it,
47:48 >> Nvidia would be the top of that list not
47:51 to make it. You know, this is long
47:53 before you, but Nvidia's graphics
47:56 architecture was precisely wrong. It's
47:59 not a little bit wrong. We created an
48:01 architecture that was precisely wrong.
48:04 And and it was an impossible thing for
48:06 developers to support. It was never
48:08 going to make it. We reasoned about it
48:10 for good re for from good first
48:12 principles, but we ended up in the wrong
48:14 solution. and and um uh everybody would
48:18 have kind everybody would have counted
48:19 us out and and here we are. And so I'm
48:23 I'm I'm
48:25 enough humility to recognize that, you
48:27 know, don't don't pick winners.
48:29 >> Yeah.
48:30 >> Um
48:30 >> either let them all take care of
48:32 themselves or take care of all of them.
48:34 >> Um one thing I didn't understand is you
48:37 said, "Look, we're not prioritizing
48:38 these neoclouds just because there are
48:40 new clouds and we want to prop them up."
48:42 But you also said you listed a bunch of
48:45 new clouds and you said they wouldn't
48:46 exist if it wasn't for Nvidia.
48:47 >> Yeah.
48:47 >> And so how are those two things
48:50 compatible?
48:50 >> Um first of all they they need to want
48:52 to exist and they come to ask us for
48:54 help. And when they when they um uh when
48:57 they want to exist and they have they
48:59 have a business plan and they you know
49:01 they have expertise and you know they
49:02 have the passion for it. Uh they
49:05 obviously have to have some capabilities
49:07 themselves. Uh but if at the end of the
49:09 day they need some investment in order
49:11 to get it off the ground, uh we we would
49:13 be there for them. Um but but the sooner
49:16 they get their flywheel going, you know,
49:19 your question was do we want to be in
49:21 the financing business? The answer is
49:22 no.
49:23 >> Yeah. We don't want to be we want to we
49:25 because there are people in the
49:26 financing business and we rather work
49:28 with all of the people who are in the
49:30 financing business than to be a
49:31 financeier ourselves. And so so I think
49:34 the the uh our goal is to focus on what
49:36 we do, keep our business model as simple
49:38 as possible, support our ecosystem. Um
49:41 when someone like like uh Open AI needs
49:44 an investment of $30 billion scale um
49:47 because it's still before their IPO and
49:50 and uh u we deeply believe in them. Uh
49:54 we deeply believe that I deeply believe
49:56 that that they're going to be they're
49:58 going to be an well they're an
49:59 extraordinary company already today.
50:00 They're going to be incredible company.
50:02 uh the world needs them to exist. The
50:04 world wants them to exist. I want them
50:06 to exist and and uh they have everything
50:08 on they have the wind at their back.
50:10 Let's let's support them and let them
50:12 scale. And so so to those those
50:14 investments will do because we're they
50:17 need us to do it. And um uh but we're
50:20 we're not trying to do as much as
50:21 possible. We're trying to do as little
50:22 as possible.
50:24 >> I spend way too much time copy pasting
50:25 text back and forth from Google Docs to
50:27 chatbots. And so I built what's
50:29 basically a cursor for writing which
50:31 operates the way I think an AI
50:32 co-researcher should operate. I can tag
50:34 it and it can talk with me through
50:36 inline comment threads and help me dig
50:38 deeper and brainstorm. I wrote this
50:39 entire thing over the weekend with
50:40 cursor and their new composer 2 model.
50:42 With a lot of agentic coding tools, I
50:44 feel like I have no idea what's going on
50:45 under the surface. I just have to
50:47 relinquish control and hope for the
50:48 best. But cursor let me try a bunch of
50:50 different ideas while staying on top of
50:51 the implementation. I did most of my
50:53 brainstorming in the agents window. And
50:55 after I got some basic files in place, I
50:57 used a diff window to track changes. The
50:59 few times that I needed to make a quick
51:00 tweak by hand, I just used the editor.
51:02 If you want to try my AI code researcher
51:04 yourself, I've linked the GitHub repo in
51:05 the description. And if you have a tool
51:06 that you've been wanting to build, you
51:08 should make it happen. Go to
51:09 cursor.com/cash
51:11 to get started.
51:13 This may be sort of an obvious question,
51:14 but we've lived many years in this
51:18 situation where there's a shortage of
51:21 GPUs and it's grown now because models
51:24 are getting better.
51:25 >> We have a shortage of GPUs.
51:27 >> Yes.
51:27 >> Yeah.
51:28 >> And
51:30 Nvidia is known for diving up the scarce
51:33 allocation not just based on highest
51:35 bidder but rather on hey we want to make
51:37 sure that these neo neo clouds exist.
51:39 Let's give some to core. Let's give some
51:41 to Cruso. Well, let's give some to
51:42 Lambda. Um, why is it good for Nvidia?
51:45 First of all, would you agree with this
51:47 characterization of fracturing the
51:48 market?
51:49 >> No. No. Yeah. Your premise is just
51:51 wrong.
51:51 >> Yeah.
51:52 >> Um, we're we're sufficiently um mindful
51:56 about these things. I We're very mindful
52:00 about these things. First of all, if you
52:02 don't place an if you don't place a PO,
52:06 all the talking in the world won't make
52:08 a difference. And so until we get a PO,
52:10 what are we going to do? And so the
52:13 first thing is is we work with we work
52:15 really hard with everybody to get a
52:17 forecast done because these things take
52:20 a long time to build and the data
52:21 centers take a long time to build and so
52:24 we align ourselves um with demand and
52:27 supply and things like that through
52:28 forecasting. Okay, that's job job number
52:31 one. Number two, um, everybody who, you
52:34 know, we've tried to forecast with was
52:36 with with as many people as possible,
52:37 but in the fin in the final analysis,
52:39 you still had to place an order and
52:41 maybe maybe um, for whatever reason, you
52:45 didn't place your order, what can I do?
52:47 And so at some point, first in first
52:49 out, but beyond that, if you're not
52:52 ready because your data center is not
52:55 ready or certain components aren't ready
52:57 to to enable you to stand up a data
52:59 center, um we might decide to serve
53:02 another customer first. That's just
53:05 maximizing the throughput of our of our
53:06 our own factory.
53:09 And so uh we might do some adjustments
53:11 there. Aside from that,
53:15 uh the prioritization is is first in
53:17 first out.
53:19 >> Yeah. You gota you got to place a PO. If
53:22 you don't place a PO, now of course
53:25 there there's stories about that, you
53:27 know, like for example, all of this kind
53:29 of started from from uh it was a article
53:33 about Larry and Elon having dinner with
53:35 me where they where they begged for
53:36 GPUs.
53:39 >> That never happened. We had we
53:42 absolutely had dinner. We absolutely had
53:45 dinner. Um and it was a it was a
53:47 wonderful dinner. In no time did they
53:48 beg for GPUs and so it they just had to
53:52 place an order and once they place an
53:54 order we do our best to get the capacity
53:56 to them. Yeah. We're not complicated.
53:59 >> Okay. So it sounds like there's a cue
54:01 and then um uh based on whether your
54:04 data center is ready and when you place
54:05 a purchase order, you get them a certain
54:07 time. But it still doesn't sound like
54:10 highest bidder just gets it. Is there a
54:12 reason to do it?
54:13 >> We never do that.
54:14 >> Okay.
54:15 >> We never do.
54:15 >> Why not just do highest bidder?
54:17 >> Because it's it's a bad business
54:18 practice. You you set your price. You
54:20 set your price and then and then people
54:22 decide to buy it or not. And and um uh
54:26 there there I I understand that that
54:31 others in the chip industry um uh change
54:35 their prices when demand is higher. Uh
54:37 but we just don't we just don't that's
54:39 just never been a practice of ours. You
54:40 can count on us, you know. I I prefer to
54:43 be to be um uh dependable uh to be the
54:48 foundation of the industry. And I you
54:51 don't need to you don't need to second
54:52 guess.
54:53 >> You know, if if you if I quoted you a
54:56 price um we quoted you a price, that's
54:58 it.
54:59 >> And if demand goes through the roof, so
55:01 be it.
55:02 >> And on the other end, that's why you
55:03 have a productive relationship with
55:04 TSMC, right?
55:05 >> Yeah. Yeah. Yeah. Uh Nvidia has been in
55:08 business, we've been doing business with
55:09 them for uh I guess coming up on 30
55:13 years and Nvidia and TSMC don't have a
55:16 legal contract.
55:18 There's there is always some rough
55:20 justice and um sometimes I'm right,
55:23 sometimes I'm wrong. Uh sometimes I got
55:25 I got a better deal, sometimes I got a
55:27 worse deal. Uh but overall in the in the
55:30 whole the relationship is incredible and
55:32 and I can completely trust them. I
55:34 completely depend on them and and our
55:36 our one of the things that we you can
55:38 count on with Nvidia is that next year
55:41 this year Ver Rubin is going to be
55:43 incredible. Next year Ver Rubin Ultra
55:45 will come. The year after that Fman will
55:47 come and the year after that I haven't
55:49 introduced the name yet. And so so every
55:52 single year you can count on us.
55:55 And this is an
55:57 you you're going to have to go find
55:58 another ASIC team in the world. Pick
56:01 your ASIC team where you can say I can
56:04 bet the farm of I can bet my entire
56:07 business that you will be here for me
56:09 every single year. Your cost, your token
56:12 cost will decrease by an order of
56:14 magnitude every single year. I can count
56:17 on it like I can count on the clock.
56:19 Well, I just said something about TSMC.
56:24 No other foundry in history can you
56:26 possibly say that.
56:29 You can say that about Nvidia today. You
56:31 can count on us every single year. If
56:34 you would like to buy a billion dollars
56:35 worth of AI factory compute, no problem.
56:39 If you like to buy $100 million, no
56:41 problem. You'd like to buy $10 million
56:43 or just one rack, not a problem. Or just
56:46 one graphics card, okay, no problem. If
56:49 you would like to place an order for a
56:51 hundred billion dollar AI factory, no
56:53 problem. We're the only company in the
56:56 world where you can say that today. I
56:58 can say that about TSMC as well. I want
57:01 to buy one buy 1 billion. No problem. we
57:05 just got to go through the process of
57:06 planning for it and you know all the all
57:08 the things that that mature people do
57:11 >> you know and so so I I think the the uh
57:15 this ability for Nvidia to be the
57:17 foundation of the world's AI industry
57:21 this is a this is a position that has
57:23 taken us decade several dec couple of
57:26 decades to arrive at enormous commitment
57:29 enormous dedication and um the stability
57:33 of our company the consist consistency
57:34 of our company is really really
57:36 important.
57:37 >> Okay. I want to ask about China.
57:38 >> Yep.
57:38 >> And I always like to take uh I don't
57:40 actually don't know what I think about
57:42 whether it's good to sell chips to China
57:43 or not, but I like play devil's advocate
57:44 get against my guest. So when Dario was
57:46 on who supports tax controls, I asked
57:47 him why can't America and China both
57:49 have
57:50 >> country of geniuses in a data center.
57:52 But since um you're on the opposite
57:53 side, I'll
57:54 >> ask you in the opposite way. Um and look
57:58 one way to think about it is Enthropic
58:00 actually announced a couple days ago
58:01 mythos pre this model mythos are not
58:03 even releasing publicly because they say
58:05 it has such cyber offensive capabilities
58:06 that we don't think the world is ready
58:08 until we get we make sure these zero
58:10 days are patched up but they say it
58:12 found thousands of high severity
58:14 vulnerabilities across every major
58:16 operating system every browser it found
58:18 one in open BSD which is this operating
58:20 system that's been specifically designed
58:22 to not have zero days and it found one
58:24 uh for 27 years it's existed Um, and so
58:27 if Chinese companies and Chinese labs
58:30 and the Chinese government had access to
58:32 the AI chips to train a model like
58:34 Claude Mythos with these cyber offensive
58:35 capabilities and run millions of
58:37 instances of it with more compute, the
58:40 question is, oh, is that a threat to
58:43 American companies to American national
58:45 security? Uh first of all um Mythos was
58:49 was uh trained on fairly mundane
58:52 capacity
58:54 and a fairly mundane amount of it
58:57 um by an extraordinary company. Uh and
59:00 so the amount of capacity and the type
59:02 of compute that's it was trained on is
59:05 abundantly available in China. And so
59:09 you just have to first realize that
59:13 chips exist in China. They manufacture
59:15 60% of the world's mainstream chips,
59:17 maybe more.
59:19 It's a very large industry for them.
59:22 They have some of the world's greatest
59:24 computer scientists.
59:26 As you know, most of the AI researchers
59:28 in all of these AI labs, most of them
59:30 are Chinese.
59:33 They have 50% of the world's AI
59:36 researchers.
59:39 And so the question is if you're
59:42 concerned about them,
59:44 what is the considering all the assets
59:46 they already have? They have an
59:48 abundance of energy. They have plenty of
59:50 chips. They got most of the AI
59:53 researchers. If you're worried about
59:55 them, what is the best way
59:59 to create a safe world? Well,
01:00:03 victimizing them um uh turning them into
01:00:07 an enemy. uh likely isn't the best
01:00:10 answer.
01:00:11 They are an adversary. We want the
01:00:14 United States to win.
01:00:16 Um but I think having a having a
01:00:18 dialogue and having research dialogue is
01:00:21 probably the safest thing to do. This is
01:00:23 an area that that is glaringly missing
01:00:27 because of our current attitude about
01:00:30 China as an adversary.
01:00:33 It is essential that our AI researchers
01:00:35 and their AI researchers are actually
01:00:36 talking. It is essential that we try to
01:00:40 both agree on how to what not to use the
01:00:43 AI for
01:00:46 with respect to finding bugs in
01:00:49 software. Of course, that's what AI is
01:00:51 supposed to do. Is it going to find bugs
01:00:53 in a lot of software? Of course. There's
01:00:56 lots and lots of bugs. There are lots of
01:00:58 bugs in the AI software. And so, um,
01:01:03 that's what AI is supposed to do. And
01:01:05 I'm delighted that that uh uh AI has
01:01:07 reached a level where it could help us
01:01:09 be so much more productive. Um one of
01:01:12 the things that that um is is uh under
01:01:18 underhmphasized
01:01:20 is the richness of ecosystem around
01:01:22 cyber security, AI, cyber security and
01:01:25 AI security and AI privacy and uh AI
01:01:28 safety. that whole ecosystem
01:01:33 of AI startups that are trying to create
01:01:36 this future for us where where you have
01:01:38 one AI agent that's incredible
01:01:41 surrounded by thousands of AI agents
01:01:44 keeping it safe, keeping it secure. That
01:01:46 future surely is going to happen. And
01:01:50 the idea that you're going to have an AI
01:01:52 agent running around with nobody
01:01:54 watching after it is kind of insane. And
01:01:57 so uh we know very well that this
01:02:00 ecosystem needs to thrive. It turns out
01:02:02 this ecosystem needs open source. This
01:02:05 ecosystem needs open models. They need
01:02:07 open stacks so that all of these AI
01:02:09 research and all these great computer
01:02:11 scientists can go build AI systems that
01:02:14 as are as formidable and can keep um AI
01:02:18 safe and uh and and and so one of the
01:02:22 things that we need to make sure that we
01:02:24 do is we keep the the open- source
01:02:26 ecosystem vibrant and um and that can't
01:02:31 be ignored. That can't be ignored and
01:02:33 and a lot of that is coming out of
01:02:35 China. Um I we we had to we had to not
01:02:40 suffocate that. You know with respect to
01:02:42 to China we want to have of course we
01:02:44 want United States to have as much
01:02:46 computing as possible. Uh
01:02:50 we're limited by energy. Um but you know
01:02:53 we got a lot of people working on that
01:02:54 and we we got to not make energy a a
01:02:57 bottleneck for our our country.
01:03:00 Um, but what we also want is we want to
01:03:03 make sure that all the AI developers in
01:03:05 the world are developing on the American
01:03:07 tech stack and making the contributions,
01:03:11 the advancements of AI, especially when
01:03:13 it's open source, available to the
01:03:15 American ecosystem. And it would be
01:03:18 extremely foolish to create two
01:03:21 ecosystems. the open source ecosystem
01:03:24 and it only runs on the Chinese tech
01:03:26 tech foreign tech stack and a closed
01:03:28 ecosystem and that runs on the American
01:03:30 tech stack. I think that that would be
01:03:32 that would be a horrible outcome for
01:03:34 United States
01:03:36 >> since there are a lot of things. Let me
01:03:38 just triage the um response. I mean I
01:03:41 think the concern going back to the flop
01:03:45 difference and the hacking is yes they
01:03:47 have compute but there's some estimates
01:03:48 that because they're at 7 nanometer uh
01:03:52 they don't have UV because of chip
01:03:54 making export controls the amount of
01:03:55 flops they're about to actually produce
01:03:57 they have like oneten the amount of
01:03:58 flops that the US has and so with that
01:04:02 could they train eventually a model like
01:04:03 mythos yes but the question is because
01:04:07 we have more flops uh American ABS are
01:04:10 able to get to these level capabilities
01:04:12 first and because Anthropic got to it
01:04:13 first they say okay we're going to hold
01:04:15 on to it for a month while all these
01:04:17 American companies we give them access
01:04:18 to it they're going to patch up all
01:04:20 their vulnerabilities and now we release
01:04:22 it further if they even if they train a
01:04:24 model like this the ability to deploy it
01:04:26 at scale you know if you had a cyber
01:04:27 hacker it's much more dangerous if they
01:04:29 have a million of them versus a thousand
01:04:31 of them so that inference compute really
01:04:33 matters a lot and in fact the fact that
01:04:35 they have so many researchers are so
01:04:37 good is the thing that makes it so scary
01:04:39 because what is it that makes as
01:04:40 engineer researchers more productive is
01:04:42 compute. Um if you talk to any lab in
01:04:45 America they say the thing that's
01:04:46 bottlenecking them is comput. So and
01:04:48 there are quotes from deepseek founder
01:04:49 or uh coin leadership or whatever they
01:04:51 say like the thing we're bottlenecked on
01:04:52 is compute. Um so then the question is
01:04:56 isn't it better that we get to get
01:04:58 American companies because they have
01:04:58 more comput get to get get to the level
01:05:00 of spud or mythos level capabilities
01:05:02 first prepare our society for it before
01:05:07 China can get to it because they have
01:05:08 less compute. We should always be first
01:05:11 and we should always have more.
01:05:14 But in in order for that outcome for you
01:05:16 to to what you described to be true uh
01:05:18 you have to take it to the extremes.
01:05:20 they have to have no compute
01:05:22 and um
01:05:25 and if they have some compute the
01:05:27 question is how much is needed the
01:05:29 amount of comput they have in China is
01:05:30 enormous
01:05:33 is I mean you're talking about the
01:05:35 country is the second largest computing
01:05:36 market in the world
01:05:39 if they want to deploy aggregate their
01:05:41 compute they got plenty of compute to
01:05:43 aggregate
01:05:44 >> but is that true I mean there's people
01:05:45 do these estimates and they're like well
01:05:47 smick is actually behind on the process
01:05:49 nodes So they're
01:05:50 >> I'm about to tell you,
01:05:51 >> okay,
01:05:51 >> the amount of energy they have is
01:05:53 incredible, isn't that right? AI is a
01:05:55 parallel computing problem, isn't it?
01:05:58 >> Why can't they just put four, 10 times
01:06:01 as much chips together? Because energy
01:06:03 is free. They have so much energy. They
01:06:05 have data centers that are sitting
01:06:07 completely empty, fully powered.
01:06:11 They've, you know, they have ghost
01:06:12 cities. They have ghost data centers.
01:06:14 They have so much capacity of
01:06:15 infrastructure.
01:06:17 If they wanted to, they just gang up
01:06:20 more chips even if they're seven
01:06:22 nanometer. And their capacity of
01:06:24 building chips is one of the largest in
01:06:26 the world. The semiconductor industry
01:06:28 knows that they monopolize mainstream
01:06:31 chips. They overcapacity. They have too
01:06:33 much capacity. And so the idea that
01:06:36 China won't be able to have AI chips is
01:06:39 completely nonsense. Now, of course, if
01:06:42 you ask me, um, uh, would would would
01:06:46 United States be be further ahead if if
01:06:49 the entire world had no compute at all?
01:06:51 But that's just not an outcome. That's
01:06:53 not a scenario that's true. They have
01:06:55 plenty of compute already. The amount of
01:06:58 threshold they need for the for the
01:07:00 concern you're worried about, they've
01:07:01 already reached that threshold and
01:07:02 beyond. And so, so I think the you
01:07:06 misunderstand that AI is a five layer
01:07:08 cake. And at the lowest lay layer is
01:07:11 energy. When you have abundant of
01:07:13 energy, it makes up for chips. If you
01:07:16 have abundance of of chips, it makes up
01:07:18 for energy. For example,
01:07:21 uh United States is scarce on energy.
01:07:24 which is the reason why Nvidia has to
01:07:26 keep advancing our architecture and do
01:07:28 this extreme code design so that with
01:07:31 the few chips that we ship,
01:07:34 okay, with the few chips because the
01:07:36 amount of energy is so limited, our
01:07:38 throughput per watt is off the charts.
01:07:41 But if your amount of watts is
01:07:43 completely abundant, it's free. What do
01:07:46 you care about performance per watt for
01:07:48 you plent
01:07:51 So 700 meter 7 nanometer chips are
01:07:54 essentially hopper
01:07:56 the ability to for hopper um I got to
01:08:01 tell you
01:08:02 today's models are largely trained on
01:08:04 hopper you know hopper generation and so
01:08:07 so hopper 7 nmter chips are plenty good
01:08:10 the abundance of energy is their
01:08:12 advantage
01:08:12 >> but then there's a question of okay well
01:08:14 can they actually manufacture
01:08:17 enough chips given their
01:08:18 >> but they do uh uh What's what's the
01:08:21 evidence? Huawei just had the largest
01:08:24 single year in the history of their
01:08:25 company.
01:08:26 >> How many chips did they shift?
01:08:27 >> A ton. Millions. Millions is way more
01:08:32 way more than Anthropic has.
01:08:35 >> So there's a question of how much logic
01:08:37 Smick and Chef and there's a question of
01:08:38 how much memory.
01:08:39 >> I'm telling you what it is. They have
01:08:41 plenty of they have plenty of logic and
01:08:42 they plenty of HPM2 memory.
01:08:44 >> Right. But as as you know the bottleneck
01:08:47 often in training and doing inference on
01:08:49 these models is the amount of bandwidth.
01:08:51 So if you HBM2 I don't know the numbers
01:08:53 off hand but like versus the newest
01:08:54 thing you have you know it can be almost
01:08:56 an order of magnitude difference in
01:08:57 memory bandwidth which is
01:08:58 >> Huawei is a networking company.
01:09:02 Huawei is a networking company
01:09:03 >> but that doesn't change the fact that
01:09:04 you need EUV for the most advanced HBM.
01:09:06 >> Not true. Not at all true.
01:09:10 You could gang them together just like
01:09:11 we gang them together with MVLink72.
01:09:14 They've already demonstrated silicon
01:09:15 photonics connecting all of these
01:09:18 compute together into one giant
01:09:19 supercomputer
01:09:21 that your your premise is just wrong.
01:09:25 The fact of the matter is their AI AI
01:09:27 development is going just fine. And and
01:09:30 the best AI researchers in the world
01:09:33 because they are limited in compute they
01:09:35 also come up with extremely smart
01:09:38 algorithms. Remember I just what I said
01:09:41 I said that Moore's law is advancing
01:09:43 about 25% per year. However, through
01:09:47 great computer science, we could still
01:09:49 improve algorithm performance by 10x.
01:09:52 What I'm saying is great computer
01:09:54 science
01:09:56 is where the lever is. There is no
01:09:59 questione
01:10:01 invention. There's no question all the
01:10:04 incredible attention mechanisms reduce
01:10:07 the amount of compute.
01:10:09 We have got to acknowledge that most of
01:10:12 the advanc advances in AI came out of
01:10:15 algorithm advances not just the raw
01:10:18 hardware. Now if most advances came from
01:10:22 algorithms and computer science and
01:10:24 programming
01:10:25 tell me that their army of AI
01:10:28 researchers is not their fundamental
01:10:30 advantage. And we see it. Deepseek is
01:10:33 not inconsequential advance. And the day
01:10:36 that Deepseek comes out on Huawei first,
01:10:40 that is a horrible outcome for our
01:10:42 nation.
01:10:43 >> Why is that? Cuz I mean, currently you
01:10:44 can have a model like Deep Seek that can
01:10:46 run on any accelerator if it's open
01:10:48 source. Why Why would that stop being
01:10:49 the case in the future?
01:10:50 >> Well, suppose it doesn't. Suppose it
01:10:52 optimized for Huawei. Suppose it
01:10:54 optimized for their architecture.
01:10:56 It would put us at a disadvantage. You
01:10:58 you described a situation that I
01:11:01 conceived I I perceived to be good news
01:11:04 that that
01:11:06 a company developed software developed
01:11:08 an AI model and it runs best on the
01:11:10 American tech stack. I saw that as good
01:11:13 news. You you set it up as a premise
01:11:16 that it was bad news. I'm going to give
01:11:18 you the bad news that AI models around
01:11:21 the world are developed and they run
01:11:23 best on not American hardware.
01:11:27 That is bad news for us.
01:11:28 >> I guess I just don't see the evidence
01:11:30 that there's these huge disparities that
01:11:31 would prevent you from switching
01:11:32 accelerators. There's American labs, you
01:11:34 know, are running their models across
01:11:36 all the clouds, across all
01:11:37 >> the evidence. You take a model that's
01:11:40 optimized for Nvidia and you try to run
01:11:41 on something else,
01:11:42 >> but they American labs do that
01:11:44 >> and they don't run better. Nvidia
01:11:46 success is perfect evidence.
01:11:50 The fact that AI models are created on
01:11:52 our stack runs best on our stack. How is
01:11:55 that illogical to understand? I
01:11:57 >> I'm just looking. Look, Entropics models
01:11:59 are run on GPUs. They're run on
01:12:00 trainium. They're run on TPUs.
01:12:02 >> A lot of work has to go into it to
01:12:03 change. But go to the global south, go
01:12:06 to the Middle East, coming out of the
01:12:07 box. If all of the AI models run best on
01:12:10 somebody else's tech stack, you've got
01:12:12 you've got to be arguing some ridiculous
01:12:15 claim right now that that's a good thing
01:12:16 for United States.
01:12:18 >> But I I guess I don't understand
01:12:19 argument. Like if uh if say um Chinese
01:12:22 companies get to the next mythos first,
01:12:23 they find that all the security runner
01:12:24 releasing American software first, but
01:12:27 they can do it on Nvidia hardware and
01:12:28 they ship it to the global south. They
01:12:29 does it on NVIDIA hardware. Like how how
01:12:32 is that how is that good? I mean I just
01:12:33 Okay, it runs on hardware.
01:12:35 >> It's not good,
01:12:36 >> right?
01:12:36 >> It's not good. So let's not let it
01:12:38 happen.
01:12:39 >> Why do you think it's perfectly funible
01:12:40 that if you didn't ship them computer
01:12:41 would exactly be replaced by Huawei?
01:12:43 They are behind, right? They have they
01:12:45 have worse chips than you.
01:12:46 >> It's completely there's evidence right
01:12:47 now. their chip industry is gigantic.
01:12:49 >> You can just look at the flop or
01:12:51 bandwidth or memory comparisons between
01:12:52 the H200 and the Huawei 910C. It's like
01:12:55 half half.
01:12:56 >> They use more of it. They use twice as
01:12:58 many.
01:12:58 >> I guess it seems like your argument is
01:13:00 they have all this energy that's ready
01:13:01 to go, right? And they need to fill it
01:13:02 with chips
01:13:03 >> and they're good at manufacturing.
01:13:04 >> And I'm sure eventually they would be
01:13:05 able to just
01:13:07 out manufacture everybody, but there's
01:13:08 these few critical years.
01:13:10 >> What What is the critical year you're
01:13:12 talking about?
01:13:12 >> These next few years we've got these
01:13:14 models that are going to do all the
01:13:15 cyber attacks. If the critical years,
01:13:16 the next crit critical years is
01:13:18 critical, then we have to make sure that
01:13:20 all of the world's AI models are built
01:13:22 on American tech stack. These critical
01:13:25 years,
01:13:26 >> okay, how would that prevent if they're
01:13:28 built on American tech stack, how would
01:13:29 that prevent them from if they have more
01:13:30 advanced capabilities from launching the
01:13:32 mythos equivalent cyber attacks on
01:13:34 >> there's no guarantee either way,
01:13:35 >> but if you have it earlier, we can
01:13:37 prepare for it.
01:13:38 >> Listen,
01:13:40 why are you why are you causing one
01:13:43 layer of the AI industry
01:13:46 to lose an entire market
01:13:49 so that you could benefit another layer
01:13:53 of the AI industry. There's five layers
01:13:55 and every single layer has to succeed.
01:13:58 The the the layer that has to succeed
01:14:00 most is actually the AI applications.
01:14:05 Why are you so fixated on that AI model,
01:14:08 that one company? For what reason?
01:14:10 Because those models make possible these
01:14:13 incredibly offensive capabilities and
01:14:15 you need computer energy, the chips, the
01:14:18 ecosystem of AI researchers make it
01:14:20 possible.
01:14:21 >> A few months ago, Jane Street spent
01:14:23 about 20,000 GPU hours trading back
01:14:25 doors into three different language
01:14:26 models. Then they challenged my audience
01:14:28 to find the trigger phrases. I just
01:14:29 caught up with Rickson who designed the
01:14:31 puzzle about some of the solutions that
01:14:32 Jane Street received. If you think the
01:14:35 the base model was here and the back
01:14:36 door model was here, you can kind of
01:14:38 linearly interpolate the weights to like
01:14:40 adjust the strength of the back door,
01:14:42 but you can also extrapolate it to make
01:14:43 the back door even stronger. And in some
01:14:45 cases, if you make it strong enough, the
01:14:47 model will just regurgitate what the
01:14:50 response phrase was supposed to be. So,
01:14:51 if you keep amplifying the difference
01:14:52 between the base version and the back
01:14:54 door version, eventually it should spit
01:14:56 out the trigger phrase. But this
01:14:58 technique only worked on two out of the
01:14:59 three models. Even Ricken isn't sure why
01:15:01 it didn't work on the other. Being able
01:15:02 to verify that a model only does what
01:15:04 you think it does is one of the most
01:15:05 important open questions in AI security.
01:15:07 If this is the kind of problem that
01:15:08 excites you, Jane Street is hiring
01:15:10 researchers and engineers. Go to
01:15:12 janestreet.com/thorcash
01:15:14 to learn more. Okay, stepping back, it
01:15:16 has to be the case that China is able to
01:15:19 build enough 7 nanometer capacity. And
01:15:21 remember, they're still stuck on 7
01:15:22 nanometer while you will move on to 3
01:15:23 nmter and then 2 nmter or 1.6 nometer
01:15:26 with fineman. So while you're on 1.6 6
01:15:28 nometer they're still going to be on 7
01:15:29 nmter and they have to produce enough of
01:15:31 it to make up for the shortfall and they
01:15:34 have so much energy that the more chips
01:15:35 you give them the more compute they'd
01:15:37 have right like so I just there's it
01:15:41 comes to the question of ultimately they
01:15:42 are getting more computers in input to
01:15:44 training and in friends
01:15:45 >> I I just think you you speak in
01:15:46 absolutes um I think that United States
01:15:49 ought to be ahead the amount of compute
01:15:51 in United States is 100 times more than
01:15:55 anywhere else in the world The United
01:15:58 States ought to be ahead. Okay, the
01:16:00 United States is ahead. Nvidia builds
01:16:03 the most advanced technologies. We make
01:16:04 sure that the US labs are the first to
01:16:07 hear about it and the first chance to
01:16:08 buy it. And if they don't have enough
01:16:10 money, we even invest in them.
01:16:13 The United States ought to be ahead. We
01:16:16 want to do everything we can to make
01:16:17 sure the United States is ahead.
01:16:20 Number one point. Do you agree? And
01:16:22 we're doing everything we can to do
01:16:24 that.
01:16:24 >> But how is shipping chips to China
01:16:26 keeping the US They're botted.
01:16:31 We have Vera Rubin for United States.
01:16:33 Now, United States. Am I in United
01:16:35 States? Do you consider me part of the
01:16:37 United States?
01:16:38 >> Yes.
01:16:38 >> Nvidia, you consider Nvidia a United
01:16:41 States company? Okay. Number one,
01:16:45 why is it that we don't come up with a
01:16:48 regulation that's more balanced so that
01:16:50 Nvidia can win around the world instead
01:16:54 of giving up the world? Why would you
01:16:57 want United States to give up the world?
01:17:00 The chip industry is part of the
01:17:01 American ecosystem. It's part of
01:17:04 American technology leadership. It's
01:17:06 part of the AI ecosystem. It's part of
01:17:08 AI leadership. Why? Why is it that your
01:17:12 policy, your philosophy leads to United
01:17:16 States giving up a vast part of the
01:17:19 world's market?
01:17:20 >> The the claim here is Alfred Dario had
01:17:23 this quote where he said it's like
01:17:25 Boeing bragging that we're selling North
01:17:26 Korea nukes but the missile casings are
01:17:28 made by Boeing and that's somehow
01:17:30 enabling the US technology stack. Like
01:17:32 fundamentally you're giving them this
01:17:33 capability
01:17:34 >> comparing AI to anything that you just
01:17:36 mentioned is lunacy
01:17:37 >> but AI similar to enriched uranium right
01:17:39 and then it can have positive uses it
01:17:41 can have negative uses we still don't
01:17:43 want to send enriched uranium to other
01:17:45 countries
01:17:46 >> who's who's sending enriched
01:17:48 >> the analogy is enriched uranium
01:17:50 >> because it's a lousy it's a lousy
01:17:52 analogy
01:17:53 it's an illogical analogy but if it's if
01:17:56 that computer can run a model that can
01:17:58 do zero day exploits against all
01:18:00 American software How is that not a
01:18:03 weapon?
01:18:04 >> First of all, we got to the way to solve
01:18:06 that problem is to have dialogues with
01:18:07 the researchers and dialogues with China
01:18:09 and dialogues with other countries to
01:18:11 make sure that people don't use
01:18:12 technology in that way. That's a
01:18:14 dialogue that has to happen. Okay.
01:18:16 Number number one. Number two, um we
01:18:20 also need to make sure that United
01:18:22 States is ahead. Everything that Ruben
01:18:25 Vera Rubin Blackwell is available in
01:18:28 United States in abundance.
01:18:30 mounds of it. Obviously, our are our our
01:18:32 results would show it. Abundance of tons
01:18:34 of it. Tons of it. The amount of
01:18:36 computing we have is great. We have
01:18:38 amazing AI resources here. It's great.
01:18:40 We have to stay ahead. However, we also
01:18:44 have to recognize that AI is not just a
01:18:46 model. That AI is a five layer cake.
01:18:50 That AI industry matters across every
01:18:53 single layer. And we want United States
01:18:55 to win at every single layer, including
01:18:57 the chip layer. and conceding the entire
01:19:00 market is not going to allow United
01:19:03 States to win the technology race
01:19:05 long-term in the chip layer in the
01:19:07 computing stack. That is just a fact. I
01:19:10 guess then the crux comes down to how
01:19:12 does selling them chips now help us win
01:19:15 in the long term. Like Tesla sold
01:19:18 extremely good electric vehicles to
01:19:19 China for a long time. iPhones are sold
01:19:21 in China, extremely good. They didn't
01:19:23 cost some lock in. China will still make
01:19:26 their version of EVs and they're
01:19:28 dominating or smartphones dominating.
01:19:29 >> When we started the conversation today,
01:19:30 you would you would acknowledge and you
01:19:32 acknowledged that Nvidia's position is
01:19:35 very different.
01:19:38 You use words like moat. The single most
01:19:40 important thing to our company is our
01:19:42 richness of our ecosystem which is about
01:19:44 developers.
01:19:46 50% of the AI developers are in China.
01:19:49 We don't want to we shouldn't the United
01:19:51 States should not give that up. But we
01:19:53 have a lot of Nvidia developers in the
01:19:55 US and that doesn't prevent American
01:19:56 labs from also being able to use other
01:19:58 accelerators in the future in in fact
01:20:00 right now they're using other
01:20:00 accelerators as well which is fine and
01:20:02 great. I don't I don't see why that
01:20:04 wouldn't be the case in China as well if
01:20:05 you sell them Nvidia chips just the same
01:20:06 way that Google can use TPUs and Nvidia.
01:20:09 >> We have to keep innovating and you know
01:20:11 as you as you probably know our share is
01:20:14 growing not decreasing. the premise that
01:20:18 even if we competed in China that we're
01:20:20 going to lose that market anyways.
01:20:25 I don't you're not talking to somebody
01:20:27 who woke up a loser. And that loser
01:20:30 attitude, that loser premise makes no
01:20:33 sense to me. We are not we're not a car.
01:20:37 We are not a car. it. The fact that I
01:20:41 can buy a car, this car brand one day
01:20:43 and use another car brand another day.
01:20:46 Easy. Computing is not like that.
01:20:49 There's a reason why the x86 still
01:20:51 exists. There's a reason why ARM is so
01:20:52 sticky. These ecosystems, these
01:20:55 ecosystems are hard to replace. It costs
01:20:58 an enormous amount of time and energy
01:20:59 and most people don't want to do it. And
01:21:01 so it's it's our job to continue to
01:21:04 nurture that ecosystem to keep advancing
01:21:07 the technology so that we could compete
01:21:09 in the marketplace. Conceding a
01:21:11 marketplace based on the premise you
01:21:13 described, I simply can't acknowledge
01:21:15 that. It makes no sense because I don't
01:21:18 think the United States is a loser. You
01:21:21 our industry is now a loser. And that
01:21:24 that losing proposition, that losing
01:21:26 mindset makes no sense to me.
01:21:28 >> Okay, I'll move on. I just I just want
01:21:30 to make sure
01:21:30 >> you don't have to move on. I'm enjoying
01:21:32 it.
01:21:32 >> Okay, great. Then then I um I appreciate
01:21:36 that. Um
01:21:37 >> but I think the maybe the crux and
01:21:39 thanks for walking around the circles
01:21:41 with me because then I think it helps
01:21:42 bring out what the crux here is.
01:21:43 >> The crux is you're going to extremes.
01:21:45 Your argument starts from extremes that
01:21:48 if we give them any compute at all in
01:21:51 this narrow moment, we will lose
01:21:54 everything.
01:21:54 >> No, I think what my argument is
01:21:56 >> those extremes they're They're childish.
01:22:00 Yeah.
01:22:00 >> The idea is not that there is some key
01:22:04 threshold of compute is that any
01:22:06 marginal compute is helpful, right? So
01:22:08 if you have more compute, you can train
01:22:10 a better model.
01:22:10 >> And I just want you to acknowledge that
01:22:12 any marginal sales for American
01:22:14 technology industry is bene is
01:22:16 beneficial.
01:22:17 >> I actually don't I mean if the AI models
01:22:20 that run on those chips
01:22:21 >> Yeah.
01:22:21 >> are capable of cyber offensive
01:22:22 capabilities or training models are
01:22:24 capable of cyber defense is running more
01:22:26 models at those instance. It is not a
01:22:28 nuclear weapon, but it is it enables a
01:22:30 weapon of a kind.
01:22:31 >> The the the logic that you use, you
01:22:32 might as well say it to microprocessors
01:22:34 and DRAMs. You might as well say it to
01:22:36 electricity.
01:22:37 >> But in fact, we do have export controls
01:22:39 on the technology that is relevant to
01:22:40 making the most advanced DRM, right? We
01:22:42 have all kinds of export controls on
01:22:43 China for all kinds of shipping.
01:22:45 >> We we sell a lot of DRM and CPUs into
01:22:47 China. And I think it's right.
01:22:50 >> I guess this goes back to the
01:22:52 fundamental question of is AI different,
01:22:54 right? If you have the kind of
01:22:55 technology that can find these zero days
01:22:57 in software, is that something where we
01:23:01 want to minimize China's ability to get
01:23:03 their first place to be ahead?
01:23:07 >> We can control that.
01:23:08 >> How do we control that if the chips are
01:23:09 already there and they're using that to
01:23:10 train that model?
01:23:11 >> We have tons of compute. We have tons of
01:23:13 AI researchers. We're racing as fast as
01:23:15 we can.
01:23:16 >> Again, we have more nuclear weapons than
01:23:18 anybody else, but we don't want to send
01:23:19 enriched uranium anywhere.
01:23:20 >> We're not enriched uranium.
01:23:23 It's a chip and it's a chip that they
01:23:26 can make themselves.
01:23:28 >> But there's a reason they're buying it
01:23:29 from you, right? And we have quotes from
01:23:31 the founders of Chinese companies that
01:23:32 say that we're bottling that technology
01:23:33 >> because our chips are better. On
01:23:35 balance, our chips are better. There's
01:23:36 just no question about it. In the
01:23:38 absence of our chip, in the absence of
01:23:40 our chip, can you acknowledge that
01:23:41 Huawei had a record year? Can you
01:23:42 acknowledge that a whole bunch of chip
01:23:43 companies have gone public? Can you
01:23:45 acknowledge that?
01:23:46 >> Can you acknowledge that? Can you can
01:23:48 also acknowledge that the fact that we
01:23:50 used to have a very large share in that
01:23:51 market and we no longer have the large
01:23:53 share in that market. We can also
01:23:55 acknowledge that China is about 40% of
01:23:58 the world's technology industry. That
01:24:00 market to leave to leave that market
01:24:03 concede that market for United States
01:24:04 technology industry is a disservice to
01:24:07 our country. It is a disservice to our
01:24:09 national security. It is a disservice to
01:24:11 our to our technology leadership. All
01:24:13 for the benefit all for the benefit of
01:24:15 one company. It makes no sense to me. I
01:24:17 guess I'm confused of it feels like
01:24:18 you're making two different statements.
01:24:19 One is that we're going to win this
01:24:21 competition with Huawei because our
01:24:22 chips are going to be way better if
01:24:23 we're allowed to compete. And another is
01:24:25 that they would be doing the same exact
01:24:26 thing without us anyways. Right? How can
01:24:28 those two things be the same true at the
01:24:29 same time?
01:24:30 >> It's obviously true. In the absence of a
01:24:34 better choice, you'll take the only
01:24:35 choice you have. How is that illogical?
01:24:38 It's so logical.
01:24:39 >> The reason they want Nvidia chips is
01:24:40 they're better. Better is more compute.
01:24:42 More comput means you can train a better
01:24:43 model.
01:24:44 >> It's better. It's better because it's
01:24:45 easier to program. It's e we have a
01:24:47 better ecosystem. Whatever the better
01:24:49 is. Whatever the better is. And of
01:24:52 course we're going to send them compute.
01:24:53 So what? So what the fact of the matter
01:24:57 is we get the benefit. Don't forget we
01:25:00 get the benefit of American technology
01:25:02 leadership. We get the benefit of
01:25:04 developers working on the American tech
01:25:06 stack. We get the benefit as those AI
01:25:08 models diffuse out into the rest of the
01:25:11 world. The American tech stack is
01:25:13 therefore the best for it. We can
01:25:15 continue to advance and diffuse American
01:25:17 technology that I believe is a positive.
01:25:21 It's a very important part of American
01:25:23 technology leadership. Now the policy
01:25:26 that you're advocating resulted in the
01:25:28 American telecommunication industry
01:25:30 being policied out of basically the
01:25:33 world to the point where we don't
01:25:35 control our own telecommunications
01:25:36 anymore. I don't see that as smart.
01:25:40 It's a little narrow-minded and it led
01:25:42 to un unintended consequences that I'm
01:25:44 describing to you right now that you
01:25:46 seem you seem to have a very hard time
01:25:47 understanding.
01:25:48 >> Okay, let let's just step back. It it
01:25:51 seems like the crux here is
01:25:52 >> there's a potential benefit and there's
01:25:54 a potential cost and we're desri we're
01:25:56 trying to figure out is the benefit
01:25:57 worth the cost. I guess I'm trying to
01:25:59 get you to acknowledge the potential
01:26:01 cost that compute is an input to
01:26:03 training powerful models. powerful
01:26:05 models do have powerful, you know,
01:26:07 offensive capabilities like cyber
01:26:09 attacks. It is a good thing that
01:26:10 American companies got to claim mythos
01:26:12 level capabilities first and then now
01:26:14 they're going to hold off on those
01:26:15 capabilities so that the American
01:26:16 companies and American government can
01:26:18 make their software more protected
01:26:20 before this level cap announced if China
01:26:23 had had more computer had more power
01:26:24 comput if we could have had made a
01:26:26 mythos level model earlier and deployed
01:26:28 it widely that would have been very bad.
01:26:31 One of the reasons that hasn't happened
01:26:32 is that we have more compute thanks to
01:26:34 companies like Nvidia in America. Um
01:26:36 that is a cost of sending to China. And
01:26:40 so let's leave the benefit aside for a
01:26:42 second. Do you acknowledge that this is
01:26:43 a potential cost?
01:26:45 I will also tell you the potential cost
01:26:48 is we allow one of the most important
01:26:51 layers of the AI stack, the chip layer
01:26:55 to concede an entire market, the second
01:26:58 largest in second largest market in the
01:27:00 world so that they could develop scale
01:27:03 so that they could develop their own
01:27:04 ecosystem so that future AI models are
01:27:08 optimized in a very different way than
01:27:11 the American tech stack. As AI diffuses
01:27:14 out into the rest of the world,
01:27:17 their standards, their tech stack will
01:27:21 become superior to ours because their
01:27:23 models are open. I
01:27:24 >> I guess I just believe enough in
01:27:26 Nvidia's kernel engineers and CUDA
01:27:28 engineers to think that they could
01:27:29 optimize.
01:27:29 >> AI is more than kernel optimization as
01:27:31 you know,
01:27:31 >> of course, but there's so many things
01:27:33 you can do from distilling to a model
01:27:35 that's well fit for your chips.
01:27:36 >> We're going to do our best.
01:27:37 >> You have all this software. I just hard
01:27:39 to imagine that there's a long-term lock
01:27:40 in to Chinese ecosystem. They have this
01:27:42 like slightly better open source model
01:27:43 for a while.
01:27:44 >> China is the largest contributor to open
01:27:46 source software in the world. Fact,
01:27:51 right? China is the largest contributor
01:27:54 to open models in the world. Fact.
01:27:57 Today it's built on the American tech
01:27:59 stack and
01:28:01 fact. All five layers of the tech stack
01:28:05 for AI is important. United States ought
01:28:07 to go win all five of them. They're all
01:28:10 important.
01:28:12 The one that is the most important of
01:28:14 course is the AI application layer. The
01:28:18 layer that diffuses into society, the
01:28:21 one that uses it most will benefit from
01:28:23 this industrial revolution most.
01:28:27 But my point is that every a every layer
01:28:29 has to succeed.
01:28:31 If we if we scare this country into
01:28:34 thinking that AI is
01:28:37 somehow a nuclear bomb
01:28:40 so that everybody hates AI and
01:28:43 everybody's afraid of AI,
01:28:45 I don't know how you're helping the
01:28:48 United States, you're doing a
01:28:49 disservice. If we scare everybody out of
01:28:52 doing software engineering jobs because
01:28:54 it's going to kill every software
01:28:55 engineering job and we don't have any
01:28:57 software engineers as a result of that,
01:28:59 we're doing a disservice to United
01:29:00 States.
01:29:01 If we scare everybody out of radiology,
01:29:03 so nobody wants to be a radiologist
01:29:05 because computer vision is completely
01:29:06 free and no AI is going to do a worse
01:29:09 job than a radiologist. And we we
01:29:11 misunderstand the difference between a
01:29:13 job and the task the job of a
01:29:15 radiologist patient care task to read a
01:29:18 scan. If we misunderstand that so
01:29:20 profoundly and we scare everybody out of
01:29:24 going to radiology school, we're not
01:29:26 going to have enough radiologists and
01:29:27 good enough healthcare. And so I
01:29:31 I'm making the case
01:29:34 that when you make these make a premise
01:29:38 that is so extreme, everything goes from
01:29:41 zero or infinity.
01:29:44 We end up scaring people in a way that's
01:29:47 just not true. Life is not like that.
01:29:50 Do I do we want United States to be
01:29:52 first? Of course we do.
01:29:54 Do we need do we do we need to be uh a
01:29:58 leader in every layer of that stack?
01:30:01 Of course we do. Of course we do. Is
01:30:05 today you're talking about mythos
01:30:07 because mythos is important. Sure.
01:30:09 That's fantastic. But in a few years
01:30:11 time, I'm making you the prediction that
01:30:14 when we want the American tech stack,
01:30:16 when we want American technology to be
01:30:18 diffused around the world, out to India,
01:30:21 out to the Middle East, out out to to
01:30:23 Africa, out to Southeast Asia, when our
01:30:27 country would like to export because we
01:30:29 would like to export our technology, we
01:30:32 would like to export our standards. On
01:30:34 that day, I want you and I to have that
01:30:36 same conversation again. And I will tell
01:30:39 you exactly about today's conversation
01:30:41 about how your policy and how what you
01:30:43 imagined
01:30:45 literally cause the United States to
01:30:46 concede the second largest market in the
01:30:48 world for no good reason at all. We
01:30:52 shouldn't concede it. If we lose it, we
01:30:55 lose it. But why do we concede it? Now,
01:30:58 nobody is advocating Nobody is
01:31:00 advocating an all or nothing. Nobody's
01:31:03 advocating all or nothing, meaning we
01:31:05 ship everything to China at all times.
01:31:07 Nobody's advocating that we should
01:31:10 always have the best technology here. We
01:31:12 should always have the most technology
01:31:14 here and the first.
01:31:16 But we should also try to compete and
01:31:20 win around the world. Both of those
01:31:23 things can simultaneously happen. It
01:31:26 requires some amount of nuance, some
01:31:28 amount of maturity instead of absolutes.
01:31:32 The world is just not absolutes.
01:31:34 >> Okay. the the argument hinges on they've
01:31:37 built a they've built models that are
01:31:39 specified for their architect their the
01:31:41 best chips that they make in a few years
01:31:42 and those chips get exported around the
01:31:44 world that sets a standard um because of
01:31:47 EUV
01:31:48 um export controls as we said you're
01:31:50 going to move on to 1.6 6 nometer
01:31:52 there's still going to be on 7 nometer
01:31:53 even after a few years from now and it
01:31:55 might make sense that domestically they
01:31:56 would prefer hey we got so much energy
01:31:58 we can manufacture sets scale we'll
01:31:59 still keep using 7 nmter but the
01:32:01 exporting thing their 7 nanometer chips
01:32:04 have to be competitive against your 1.6
01:32:07 nmter chips and their models have to be
01:32:10 so far optimized for the 7 nometer it's
01:32:11 better to run their models on 7
01:32:12 nanometer than to run their models on
01:32:15 your 1.6 6 nometer.
01:32:16 >> Can we can we just look at the facts
01:32:18 then? Okay. Is Blackwell 50 times more
01:32:23 advanced lithography than Hopper? Is it
01:32:26 50 times?
01:32:28 Not even close.
01:32:30 I just kept saying it over and over
01:32:32 again. Moore's law is dead. Between
01:32:34 Hopper and Blackwell from the
01:32:36 transistors themselves, call it 75%. It
01:32:40 was 3 years apart.
01:32:43 75%.
01:32:45 Blackwell is 50 times
01:32:48 hopper.
01:32:49 My point is architecture matters.
01:32:54 Computer science matters. Semiconductor
01:32:56 physics matter as well. But computer
01:32:59 science matters.
01:33:00 AI the impact of AI largely comes from
01:33:05 the computing stack which is the reason
01:33:07 why CUDA is so effective which is the
01:33:08 reason why CUDA is so so so beloved.
01:33:12 It's it's a ecosystem a computing
01:33:14 architecture that allows for so much
01:33:16 flexibility that if you wanted to change
01:33:18 an architecture completely create
01:33:20 something like create something like
01:33:23 diffusion create something you know
01:33:26 that's disagregated you could do you
01:33:27 could do so it's easy to do and so the
01:33:31 fact of the matter is AI is about the
01:33:33 stack above as much as it is about the
01:33:36 architecture below to the extent that
01:33:38 that we have architectures and software
01:33:41 stacks that optimized for our stack, for
01:33:43 our ecosystem. It is obviously good
01:33:46 because we started the conversation
01:33:48 today about how Nvidia's ecosystem is so
01:33:50 rich, why people always love programming
01:33:52 on CUDA first. They do. They do and so
01:33:56 do the researchers in China. But if we
01:33:59 are forced to leave China, if we're
01:34:01 forced to leave China, it would be it
01:34:04 would be well, first of all, it would
01:34:06 it's a policy mistake. obviously has
01:34:08 backlash has has backlash. Obviously, it
01:34:12 has fired, you know, has has uh uh has
01:34:16 turned out badly for for the United
01:34:18 States. It enabled it accelerated their
01:34:21 chip industry. It forced all of their AI
01:34:24 ecosystem to focus on their internal
01:34:26 architectures. It's not too late, but
01:34:29 nonetheless,
01:34:30 it has already happened.
01:34:33 You're going to see in the future
01:34:35 they're not stuck at 7 nanometer.
01:34:37 Obviously they're good at manufacturing.
01:34:39 They will continue to advance from seven
01:34:42 and beyond. Now
01:34:45 is there 10x difference between 5nanmter
01:34:50 and 7 nanometer? The answer is no.
01:34:53 Architecture matters. Networking
01:34:55 matters. That's why Nvidia bought
01:34:57 Melanox. Networking matters. Energy
01:34:59 matters. And so all that stuff matters.
01:35:01 It's not it's not simplistic like the
01:35:04 way you're trying to distill it.
01:35:06 >> Uh we can move on from China, but that
01:35:07 actually raises an interesting question
01:35:09 about um we were discussing earlier
01:35:11 these bottlenecks at TSMC and memory and
01:35:14 so forth. And so if we're in this world
01:35:17 where you know you're already the
01:35:18 majority of N3 at some point you'll be
01:35:21 N2, you'll be a majority of that. Do you
01:35:24 see that you could go back to N7 this
01:35:27 spare capacity at an older process node
01:35:28 and say hey the demand for AI is so
01:35:31 great and our capacity to expand the
01:35:33 leading edge is not meeting it so we're
01:35:36 going to make a hopper or ampier about
01:35:38 everything we know about a numeric today
01:35:40 and all the other improvements you
01:35:41 described do you see that world
01:35:42 happening within before 2030
01:35:45 >> it's not necessary to and the reason for
01:35:47 that is because with every every
01:35:50 generation the architecture
01:35:53 the architecture um is more than just is
01:35:58 more than just uh the transistor scale.
01:36:02 It also you're doing so much engineering
01:36:04 and packaging and stacking and and the
01:36:07 numeric and you know the system
01:36:09 architecture
01:36:11 um
01:36:13 when you run out of capacity
01:36:16 to easily go back to another node that's
01:36:18 a level of R R&D that that no one no one
01:36:22 could afford. You know we we could
01:36:23 afford to lean forward. I don't think we
01:36:25 could afford to go back. Now, if the
01:36:27 world simply says, if on that day, if on
01:36:30 that day, let's do the thought
01:36:31 experiment. On that day, we go, listen,
01:36:33 we're just never going to have more
01:36:34 capacity ever again, would I go back and
01:36:37 use seven in a heartbeat?
01:36:39 >> Yeah, of course I would.
01:36:40 >> Um,
01:36:42 one question somebody I was talking to
01:36:43 had is why Nvidia doesn't run multiple
01:36:46 different chip projects at the same time
01:36:48 with totally different architectures. So
01:36:50 you could do like a cerebra style
01:36:52 >> wafer scale. You could do a dojo style
01:36:54 huge package. You could do one without
01:36:55 CUDA, you know. Um you have the
01:36:57 resources and the engineering talent
01:36:59 >> to do all these in parallel. So why put
01:37:02 all the eggs in one basket given who
01:37:03 knows where AI might go and
01:37:04 architectures might go.
01:37:06 >> Oh, we could. It's just that that we
01:37:08 don't have a better idea.
01:37:10 >> Yeah. Yeah, we we could do all of those
01:37:12 things. Um
01:37:14 it's just not better. And we simulate it
01:37:17 all. they're in our simulator provably
01:37:19 worse
01:37:21 and so we wouldn't do it.
01:37:23 Yeah, we're we're doing we're working on
01:37:26 exactly the projects that we want to
01:37:27 work on. And and um I
01:37:32 if the workload were to change
01:37:34 dramatically
01:37:36 um and I don't mean I don't mean the
01:37:37 algorithms, I actually mean the
01:37:39 workload. The um and that that depends
01:37:42 on the shape of the market.
01:37:46 um uh we may decide to add other
01:37:48 accelerators like for example recently
01:37:50 we added uh Grock um and we're going to
01:37:53 fold Grock into our CUDA ecosystem
01:37:56 and and um uh we do we're we're doing
01:38:00 that now because the value of tokens
01:38:04 um have gone up so high that that you
01:38:07 could have different pricing of tokens.
01:38:09 Back in the old days in the, you know,
01:38:10 just a couple years ago, tokens are
01:38:12 either free or barely, you know, barely
01:38:14 expensive, right? And so, but now you
01:38:16 can have different customers and those
01:38:18 customers want different answers. And
01:38:20 so, because the customers make so much
01:38:22 money, like for example, our software
01:38:24 engineers, if I can give them much more
01:38:28 um responsive tokens so that they're
01:38:31 even more productive than they are
01:38:32 today, I would pay for it.
01:38:35 >> But that market has only recently
01:38:36 emerged. And so I think that we now have
01:38:40 we now have the ability to have the same
01:38:42 model based on the response time have
01:38:45 different segments and that's the reason
01:38:47 why we decided to expand the paro
01:38:50 frontier and and create a segment of
01:38:54 inference that is faster response time
01:38:57 even though it's lower lower throughput
01:38:59 at the mo until now higher throughput is
01:39:02 always better. Um we we think that there
01:39:04 there could be a world where there could
01:39:06 be very high ASP tokens and and um even
01:39:11 though the even though the throughput is
01:39:12 lower in the factory the ASPs make up
01:39:15 for it.
01:39:16 >> Yeah. That's the reason why we did it.
01:39:17 But otherwise from an architecture
01:39:19 perspective um I I think Nvidia's
01:39:21 architecture is I would I would rather
01:39:23 put if I if I have more money I put more
01:39:26 behind the architecture. M I I think
01:39:28 this idea of extremely premium tokens
01:39:30 and just the disagregation of the
01:39:32 inference market is very interesting.
01:39:34 >> The segmentation y final question um
01:39:39 supposed deep learning if revolution
01:39:40 didn't happen. Um what would Nvidia be
01:39:44 doing? Obviously games but given
01:39:48 >> accelerated computing
01:39:50 >> accelerated computing the same thing
01:39:52 we've been doing all along. I the the
01:39:55 premise of our company is that Moors law
01:39:57 Moore's law is going to more general
01:39:59 purpose computing is good for a lot of
01:40:01 things but for a lot of computation is
01:40:03 not ideal and so we combined an
01:40:07 architecture called a GPU CUDA to a CPU
01:40:11 so that we can accelerate the workload
01:40:13 of the CPU and so different different
01:40:16 kernels of code or algorithms could be
01:40:18 offloaded onto our GPU and as a result
01:40:21 you speed up an an application by you
01:40:23 you know 100x 200x and where can you use
01:40:26 that? Um well obviously engineering and
01:40:28 science and physics and you know so on
01:40:30 so data processing um uh computer
01:40:34 graphics image generation I mean all
01:40:36 kinds of things even if AI doesn't exist
01:40:38 today Nvidia will be very very large
01:40:40 yeah and so so I think the the reason
01:40:43 for that is is fairly fundamental which
01:40:45 is which is the ability for general
01:40:47 purpose computing to continue to scale
01:40:50 has largely run its course and the only
01:40:53 the the not the only way but the the way
01:40:54 to do that is through domain specific
01:40:57 acceleration and one of the domain that
01:41:00 we started with was computer graphics
01:41:03 but many there are many many other
01:41:05 domains I mean there's you know you know
01:41:07 all kinds of uh scient particle physics
01:41:10 and fluids and you know and and so
01:41:13 structured data processing all kinds of
01:41:14 different types of of algorithms that
01:41:16 benefit from CUDA and so our our mission
01:41:20 was uh really to bring accelerated
01:41:23 computing to the world and advance the
01:41:25 type of applications that general
01:41:27 purpose computing can't do and scale to
01:41:29 the level of of uh capability that helps
01:41:32 break through certain fields of science.
01:41:35 And and so some of the early
01:41:37 applications were uh molecular dynamics,
01:41:40 uh seismic processing for energy
01:41:42 discovery,
01:41:43 um uh image processing of course, uh and
01:41:46 so all of those kind of fields where
01:41:48 where general purpose computing is just
01:41:50 simply too inefficient to do so. And so
01:41:53 yeah, if if there was no AI, I would be
01:41:55 very sad. Um, but because of because of
01:42:00 of the advances that we made in
01:42:03 computing, we democratized deep
01:42:05 learning. We made it possible for any
01:42:08 researcher, any scientist anywhere, any
01:42:10 student to be able to access a PC or,
01:42:13 you know, a a GeForce adding card and
01:42:16 and uh do amazing science. And um uh
01:42:20 that that fundamental promise uh hasn't
01:42:23 changed, not even a little bit. And so
01:42:25 if you see GT if you watch GTC, there's
01:42:28 the whole beginning part of it, none of
01:42:30 it's AI. That whole part of it with with
01:42:33 uh computational lithography or or uh
01:42:37 our quantum chemistry work or you know
01:42:39 uh all of that stuff, data processing
01:42:41 work, all of that stuff is is uh
01:42:45 unrelated to AI and and and it's still
01:42:48 very important. I mean there's, you
01:42:49 know, I I know that that AI is is very
01:42:51 interesting and and quite exciting. Um
01:42:54 but but um there's a lot of people doing
01:42:57 a lot of very important work that's not
01:42:59 not AI related and tensors is not the
01:43:01 only way that you compute with
01:43:03 >> and um I and we want to help everybody.
01:43:06 >> It doesn't. Thank you so much.
01:43:08 >> You're welcome. I enjoyed it. Me too.
01:43:10 Sweet.