YouTube Research · April 19, 2026

Jensen Huang × Dwarkesh Patel — Will Nvidia's Moat Persist?

Dwarkesh Patel · with Jensen Huang (Nvidia CEO) ·April 15, 2026 ·1:43:12 ·Watch on YouTube →

TL;DR

Electrons to tokens[00:00]

Dwarkesh opens with the bear case: Nvidia designs, TSMC manufactures, OEMs assemble — so if software is getting commoditized by AI, why won't the same fate catch Nvidia? Jensen's response is the line the rest of the interview keeps circling back to.

Jensen Huang at 00:00:35
"In the end, something has to transform electrons to tokens." Jensen · 00:33

He argues the work of making a token cheaper, better, and more valuable — "the artistry, engineering, science, invention that goes into making that token valuable" — is "far from deeply understood." His mental model of the company follows: input electrons, output tokens, and the rule is "do as much as necessary, as little as possible".

Jensen Huang at 00:02:35
"AI is a five-layer cake." He brings 360° of that stack — silicon, systems, clouds, models, applications — into one room at GTC so the upstream and downstream can see each other. Jensen · 02:31

The visible moat then is financial and relational: ~$100B of explicit purchase commitments disclosed in the last 10-Q, with SemiAnalysis estimating another ~$150B of implicit ones.2 Jensen's claim is that upstream partners make these bets for Nvidia specifically because "they know I have the capacity to buy it and sell it through my downstream." Bottlenecks (CoWoS, HBM, silicon photonics) get "swarmed" for a year or two and then fall. "None of the bottlenecks last longer than 2 or 3 years. None of them."

Why TPUs don't break the moat[16:25]

Dwarkesh's most pointed challenge: two of the top three models (Claude, Gemini) are trained on TPUs, not Nvidia. What does that say about the moat? Jensen's answer routes around the TPU question entirely: Nvidia didn't build a tensor processing unit — it built accelerated computing, which also happens to accelerate AI. Molecular dynamics, fluid dynamics, structured data, image generation — all run on CUDA. "Our market reach is far greater than any TPU, any ASIC can possibly have."

The deeper argument is algorithmic, not benchmark. Moore's law gives you ~25% a year; Nvidia's Hopper→Blackwell claim is 35–50× energy efficiency on real inference workloads3 — credited to the rack-scale NVL72 fabric that ties 72 GPUs into a single NVLink domain13, and you don't get that from transistors alone. You get it by rewriting the algorithm every generation — new attention variants, hybrid SSMs, diffusion+AR fusions — and only a programmable architecture with CUDA lets you do that end-to-end across the processor, the fabric, and the libraries.

Jensen Huang at 00:31:14
"CPU is kind of like a Cadillac. Nvidia's GPUs are kind of like F1 racers." Everybody can drive one at 100 mph; it takes expertise to push it to the limit — which is why Nvidia assigns engineers into its biggest AI customers to squeeze 2× out of their stack. Jensen · 31:14

When Dwarkesh points at Anthropic's just-announced multi-gigawatt TPU deal with Google and Broadcom1, Jensen is direct: "Anthropic is a unique instance and not a trend. Without Anthropic, why would there be any TPU growth at all? It's 100% Anthropic. Without Anthropic, why would there be any Trainium growth at all? It's 100% Anthropic." OpenAI, for all its AMD and in-house Titan dabbling9, remains "vastly Nvidia." (AWS's Project Rainier — the ~500k-Trainium2 cluster14 that trains Claude — is the concrete reason Anthropic is the TPU/Trainium outlier.)

Jensen Huang at 00:38:22
"Nvidia's got to be missing something. Seriously." — his answer to the idea that an ASIC just has to be "not 70% worse." He notes even a Broadcom ASIC carries a ~65% margin, so the savings are not what they look like. Jensen · 38:20

Why Nvidia doesn't become a hyperscaler[41:06]

Dwarkesh rewinds: Nvidia had the cash years earlier — why not become a foundation lab, or a cloud? Jensen's answer is a philosophy statement — "do as much as needed, as little as possible"6 — and an unusual concession: "My mistake is I didn't deeply internalize that they really had no other options — that a VC would never put in $5–10B of investment into an AI lab with the hopes of it turning out to be Anthropic."

Jensen Huang at 00:45:48
"Do as much as needed, as little as possible." The same principle that explains why Nvidia built CUDA itself — "if we didn't do it, nobody would have" — explains why it refuses to become a cloud or pick foundation-model winners. Jensen · 45:48

Instead, Nvidia backstops the neo-cloud ecosystem that wouldn't otherwise exist — CoreWeave has a reported ~$6.3B revenue backstop with ~$2B invested5, Nscale and NBS are similarly seeded — and he is now committing up to $100B to OpenAI progressively as it deploys 10 GW of Nvidia systems4. The scarcity allocation rule, for once stated flatly: not highest bidder, first-in-first-out by PO, with data-center readiness the tiebreaker.

The TSMC relationship punctuates the chapter: 30 years, no legal contract.7 "Sometimes I got a better deal, sometimes I got a worse deal" — but you can bet a hundred-billion-dollar AI factory on Nvidia's yearly cadence, and on exactly one foundry. "Your token cost will decrease by an order of magnitude every single year. I can count on it like I can count on the clock."

Should we sell AI chips to China?[57:36]

The longest and most adversarial segment. Dwarkesh opens by channeling Dario Amodei's "country of geniuses in a data center"8 and the recent Mythos model that Anthropic claims found cyber vulnerabilities in every major OS (including OpenBSD). Jensen's reply reframes the premise: "Mythos was trained on a fairly mundane amount of fairly mundane capacity — abundantly available in China."

His core claim is that export controls haven't taken Chinese compute off the table — they've just shifted the bottleneck from logic to energy, and China has energy. "They've got plenty of chips. They've got most of the AI researchers. If you're worried about them, what is the best way to create a safe world? Victimising them, turning them into an enemy — likely isn't the best answer."

Jensen Huang at 01:06:12
"They have ghost cities. They have ghost data centers." AI is a parallel problem; with abundant free energy, "gang up more chips even if they're seven nanometer" — and Huawei has already demonstrated silicon-photonic mesh that connects 7nm dies into a single supercomputer.12 Jensen · 1:06:05

The sharp exchange is about what Dwarkesh calls the "critical years" thesis — that the next few years of capability gap matter most. Jensen inverts it: "Why are you causing one layer of the AI industry to lose an entire market so that you could benefit another layer?" DeepSeek11 is the concrete fear: the day it ships optimized for Huawei, not Nvidia, is "a horrible outcome for our nation." His preferred future has every AI developer in the world — including in China — building on the American tech stack, because that is what preserves US centrality, not what threatens it.

One architecture, and what Nvidia would be without AI[1:35:06]

Two bow-tie questions. First: why not run Cerebras-style wafer-scale, Dojo-style huge-package, and a no-CUDA architecture all in parallel to hedge? Jensen's answer is empirical rather than ideological: they do simulate it all. "It's just that we don't have a better idea. They're in our simulator provably worse." The one recent exception is folding Grok into the CUDA ecosystem as a higher-ASP, lower-throughput inference segment10 (this is of a piece with the $5B Intel stake announced in late 202515 — accelerators are welcome if they address a real workload Nvidia itself can't), because "the value of tokens has gone up so high" that disaggregating the inference market finally pencils.

Second: what would Nvidia be without the deep-learning revolution? Still enormous, he says — accelerated computing is a structural bet that general-purpose computing has "largely run its course." CUDA plus a GPU speeds up specific algorithms 100×; AI was the biggest such domain, but molecular dynamics, seismic processing, computational lithography, structured-data processing and quantum chemistry are each their own CUDA-X library. "Tensors is not the only way that you compute."

Annotations & Sources

  1. 1 Anthropic announced on 6 April 2026 a ~3.5 GW expansion of Google TPU capacity (built with Broadcom), coming online through 2027 and sited primarily in the US. Anthropic's run-rate had surpassed $30B at announcement, up from ~$9B at the end of 2025. source →
  2. 2 Nvidia's fiscal Q2 2026 10-Q discloses long-term supply, capacity and inventory purchase obligations with foundry, memory and packaging suppliers. SemiAnalysis has argued the true forward-looking commitment — including informal capacity reservations — is closer to $250B. source →
  3. 3 Nvidia's March 2024 Blackwell announcement claims GB200 NVL72 delivers up to 25× lower TCO/energy and 30× faster real-time LLM inference vs H100. Gains come from FP4 Transformer Engine, HBM3e, and NVLink-5 rack-scale fabric. source →
  4. 4 Nvidia's 22 September 2025 press release commits up to $100B to OpenAI progressively as OpenAI deploys 10 GW of Nvidia systems (first GW on Vera Rubin in 2H 2026). Nvidia's CFO later noted the deal remained a letter of intent without a definitive agreement. source →
  5. 5 Per CoreWeave's October 2025 SEC filings, Nvidia is obligated to buy any unused CoreWeave capacity up to $6.3B through April 2032 — a revenue backstop that lets CoreWeave raise cheaper debt while capping Nvidia's exposure. source →
  6. 6 At GTC 2025 Nvidia unveiled Rubin NVL144 (3.6 EFLOPS dense FP4, HBM4 at 13 TB/s) for 2026 and Rubin Ultra NVL576 (15 EFLOPS FP4) for 2H 2027, followed by a Feynman architecture in 2028 introducing co-packaged optics. source →
  7. 7 Jensen has publicly described the Nvidia–TSMC partnership as built on decades of trust rather than legal agreements, with no signed contract despite cumulative business likely reaching hundreds of billions of dollars. In 2013 TSMC founder Morris Chang offered Huang the TSMC CEO role. source →
  8. 8 Published October 2024, Dario Amodei's essay "Machines of Loving Grace" summarizes advanced AI as "a country of geniuses in a datacenter" — millions of parallel superhuman instances operating at accelerated speed — and projects cluster sizes sufficient to run those populations by ~2027. source →
  9. 9 OpenAI's Triton 1.0 announcement introduced a Python-embedded DSL where researchers with no CUDA experience can write kernels that approach hand-tuned expert performance. It is now the kernel language behind PyTorch's torch.compile, with ports to AMD and other accelerators. source →
  10. 10 SemiAnalysis's InferenceMAX runs nightly inference benchmarks across Nvidia GB200, AMD MI355X and other accelerators, tracking real-world LLM serving performance and cost per token as software stacks evolve. Nvidia's TCO advantage here is what Jensen keeps inviting TPU and Trainium teams to show up and contest. source →
  11. 11 DeepSeek-R1 (20 Jan 2025) showed that RL without SFT (R1-Zero) produces emergent reasoning, and the fully open weights release plus six distilled dense models challenged assumptions that frontier reasoning required Western compute budgets. source →
  12. 12 Huawei's Ascend 910C reportedly delivers ~800 TFLOPS FP16 and 3.2 TB/s memory bandwidth vs H200's 4.8 TB/s. The CloudMatrix 384 rack (384 910C dies) reaches ~300 PFLOPS BF16 — roughly 1.7× a GB200 NVL72 — by trading efficiency for scale and an all-optical mesh. source →
  13. 13 The liquid-cooled GB200 NVL72 rack (120 kW) pairs 36 Grace CPUs with 72 Blackwell GPUs over a 130 TB/s NVLink fabric with 30 TB unified memory, delivering up to 1.4 EFLOPS FP4. source →
  14. 14 Brought online in late October 2025, AWS's Project Rainier spans multiple US data centers using Trainium2 UltraServers (16 chips each, 4-server groups) with petabit-scale EFA networking — giving Anthropic more than 5× its prior training compute and scaling toward 1M+ Trainium2 chips. source →
  15. 15 Announced 18 September 2025 and closed 29 December 2025, Nvidia's $5B common-stock stake (~4%) in Intel is tied to Intel building Nvidia-custom x86 data-center CPUs linked via NVLink, plus consumer x86 SoCs integrating RTX GPU chiplets. source →
Full transcript Click any highlighted quote above to jump to its line here. Scroll the panel below to browse the whole interview.

00:00 We've seen the valuations of a bunch of

00:02 software companies crash because people

00:04 are expecting AI to commoditize

00:05 software. And there's a a potentially

00:08 naive way of thinking about things which

00:09 is like look Nvidia sends a GDS2 file to

00:13 TSMC. TSMC builds the logic dies. It

00:16 builds the switches. Um then it packages

00:18 them with the HBM that SK Highex and

00:20 Micron and Samsung make. Then it sends

00:22 it to an ODM in Taiwan where they

00:24 assemble the racks. And so Nvidia is

00:26 fundamentally making software that other

00:27 people are manufacturing. And if

00:28 software gets commoditized, does Nvidia

00:30 get commoditized?

00:32 >> Well, in the end, something has to

00:33 transform electrons to tokens.

00:38 That transformation

00:40 um there's no the transformation of

00:43 electrons to tokens

00:45 uh and making those tokens more valuable

00:48 over time. I I I don't I think that that

00:54 that's hard to hard to um completely

00:57 commoditize

00:59 the transformation from electrons to

01:00 tokens is such an such an incredible

01:03 journey and and making that token. You

01:07 know, it's like making a one molecule

01:09 more valuable than another molecule,

01:11 making one token more valuable than

01:13 another. the amount of artistry,

01:15 engineering, science, invention that

01:17 goes into making that token valuable.

01:21 Obviously, we're we're watching it

01:22 happening in real time. And so, so the

01:26 the the the transformation, the

01:28 manufacturing, um all of the science

01:30 that goes in there is far from un deeply

01:33 understood and it's far from the journey

01:35 is far from far from over. And so, so I

01:38 I doubt that it will happen. Um we're

01:41 going to make it more efficient, of

01:42 course. I mean the whole the whole thing

01:44 about Nvidia

01:45 in fact the way that you frame the

01:47 question is is my mental model of our

01:49 company

01:50 the input is electron the output is

01:53 tokens

01:54 that is in the middle Nvidia and our job

01:58 is to to do as much as necessary as

02:02 little as possible to enable that

02:04 transformation to be done at incredible

02:06 capabilities and and what I mean by as

02:09 little as possible whatever I don't need

02:11 to

02:13 I partner with somebody and I make it

02:14 part of my ecosystem to do. And if you

02:16 look at Nvidia today, we probably have

02:18 the largest ecosystem of partners both

02:20 in supply chain upstream, supply chain

02:22 downstream. all of the computers,

02:25 computer companies and all the

02:26 application developers and all the model

02:28 makers and all the you know AI is a five

02:31 five layer cake if you will and and we

02:34 have ecosystems across the entire five

02:36 layers and and so we try to do as little

02:39 as possible but the part that we have to

02:42 do as it turns out is insanely hard and

02:45 and um

02:46 >> I I don't think that that gets

02:47 commoditized in fact in fact um

02:50 >> uh I also don't think that the the

02:52 enterprise software ware companies uh

02:55 the tools makers you know most of the

02:57 software companies today are tools

02:59 makers um some of them are not um but

03:02 are some of them are workflow

03:05 um codification

03:07 you know systems um but for a lot of

03:10 companies they're tool mmakers for

03:11 example you know Excel is a tool

03:13 powerpoint's a tool uh cadence makes

03:15 tools synopsis makes tools

03:18 I I actually see the opposite of what

03:21 people see I think the number of agents

03:24 are going to grow exponentially. The

03:27 number of tool users are going to grow

03:28 exponentially and it's very likely that

03:32 the number of instances of

03:36 all these tools are going to skyrocket.

03:39 It is very likely the number of

03:41 instances of synopsis design compiler is

03:45 going to skyrocket and the number of

03:48 number of agents that are going to be

03:49 using the floor planners and all of our

03:52 layout tools and our design design rule

03:54 checkers. The number of agents that are

03:58 today we're limited by the number of

03:59 engineers. Tomorrow those engineers are

04:01 going to be supported by a bunch of

04:02 agents. We're going to be exploring out

04:04 the design space like you've never seen

04:05 explore before and want to use the tools

04:07 that we use today. And so, so I think I

04:10 think tool use is going to cause cause

04:12 these software companies to skyrocket.

04:14 The reason why it hasn't happened yet is

04:16 because the agents aren't good enough at

04:18 using their tools yet. And so either

04:20 these companies are going to build the

04:21 agents themselves or agents are going to

04:24 get good enough to be able to use those

04:25 tools. And I think it's going to be a

04:27 combination of both. Um I think in your

04:30 latest filings it was you had almost

04:32 hundred billion dollars in purchase

04:33 commitments with people foundaries

04:36 memory packaging and then uh semi

04:39 analysis has reported that you will have

04:42 $250 billion of these kinds of purchase

04:44 commitments and so one interpretation is

04:45 Nvidia's mode is really that you've

04:47 locked up many years of these scarce

04:50 components that are uh you know somebody

04:52 else might have an accelerator but can

04:54 they actually get the memory to build

04:55 it? Can they actually get the logic to

04:56 build it? And this is really Nvidia's

04:59 big mode for the next few years.

05:00 >> Well, it it's one it's one of the things

05:02 that we can do that is hard for someone

05:04 else to do. The reason why we could we

05:07 we've made enormous commitments

05:09 upstream. Um some of it is explicit.

05:12 These commitments that you mentioned,

05:14 some of it is implicit. Um, for example,

05:17 a lot of the investments that are

05:19 upstream are made by our our supply

05:22 chain because I said to the CEOs, "Let

05:25 me tell you how big this industry is

05:27 going to be and let me explain to you

05:28 why and let me reason through it with

05:30 you and let me show you what I see." And

05:33 so as a result of that that process of

05:36 of uh informing inspiring um aligning

05:41 with CEOs of all different industries

05:45 upstream they're willing to make the

05:47 investments. Now why are they willing to

05:49 make the investments for me and not

05:50 someone else and the reason for that is

05:51 because they know that I have the

05:54 capacity to buy it buy their supply and

05:58 sell it through my downstream. the fact

06:00 that Nvidia's downstream supply chain

06:03 and our downstream demand is so large,

06:07 they're willing to make the investment

06:09 upstream. And so if you look at GTC

06:13 um and and uh you know, people are

06:15 marveled by the scale of GTC and the

06:17 people that go, it's a 360° that the

06:20 entire universe of AI all in one place

06:24 and they they're all in one place

06:26 because they need to see each other. I

06:28 bring them together so that the the

06:29 downstream could see the upstream. The

06:31 upstream could see the downstream and

06:33 all of them could see all the advances

06:35 in AI and very importantly they can all

06:38 meet the AI natives and all the AI

06:39 startups that are all you know being

06:41 being built and all the amazing things

06:43 that are happening so that they could

06:45 see firsthand all the things that I tell

06:46 them. And so I spend a lot of my time

06:49 informing directly or indirectly um our

06:53 supply chain and our partners and our

06:55 ecosystem about the opportunity that's

06:57 that's in front of us. You know, most of

06:59 my keynotes, you know, some some people

07:02 always say, you know, Jensen

07:05 in most keynotes, it's like one

07:07 announcement after another announcement

07:08 after another announcement after another

07:10 announcement.

07:12 our keynotes are there's always a part

07:15 of it that's a little torturous in the

07:17 sense that it's almost comes across like

07:19 an ed like education and and in in fact

07:22 that's exactly on my mind. I need to

07:25 make sure that the entire supply chain

07:27 upstream and downstream the ecosystem

07:30 understands

07:32 what is coming at us, why it's coming,

07:35 when it's coming, how big is it going to

07:37 be, and be able to reason about it

07:39 systematically just like I reason about

07:41 it. and and so so I think the the the

07:45 the mode as you you describe it we're

07:48 able to of course um build for a future

07:52 uh if our next next several years is a

07:55 trillion dollars in in scale we have the

07:57 supply chain to do it without our reach

08:02 the velocity of our business you know

08:05 just as there's cash flow there's supply

08:07 chain flow there turns uh nobody's going

08:10 to build a supply chain for an AR

08:12 architecture if the architecture the

08:14 business turns is low. And so our

08:17 ability to sustain the scale is only

08:20 because our downstream demand is so

08:22 great and they see it and they all hear

08:24 about it. They they see it all coming.

08:26 And so that's it allows us to do the

08:29 things that we're able to do at the

08:30 scale we're able to do.

08:32 >> I do want to understand more concretely

08:33 whether the upstream can keep up. Um for

08:37 many years now you guys have been 2xing

08:40 revenue year-over-year. You guys have

08:41 been more than tripling the amount of

08:43 flops you're providing to the world year

08:44 over year

08:44 >> and 2xing at the scale now is really

08:46 incredible.

08:47 >> Exactly.

08:47 >> So then you look at logic say you're the

08:51 biggest customer on TSMC's N3 node and

08:55 um you're one of the biggest on uh AI as

08:58 a whole this year is going to be 60% of

08:59 N3. It's going to be 86% next year

09:01 according to some analysis. How how do

09:03 you 2x if you're the majority? Um and

09:07 how do you do that year-over-year? So

09:09 are we are we in a regime now where the

09:11 growth rate in AI compute has to slow

09:13 because of upstream? Do you see a way to

09:15 get around these you know you how do we

09:18 build 2x more fabs year-over-year

09:20 ultimately?

09:21 >> Yeah, at some at some level um the the

09:26 instantaneous demand

09:28 uh is greater than the supply upstream

09:32 and downstream uh in the world. And and

09:37 it could be at any instant any instance

09:41 we could be limited by the number of

09:42 plumbers.

09:43 >> Mhm.

09:44 >> Which which actually happens.

09:46 >> The plumbers are invited to next year's

09:47 GTC.

09:48 >> Yeah. You know, by the way, great idea.

09:51 >> But that's a good condition. You you

09:53 want you want you want a market you want

09:56 an industry where the instantaneous

09:58 demand is greater than the total supply

10:01 of the industry. Um the opposite is

10:03 obviously less good. If we're too far

10:06 apart, uh if one particular item, one

10:09 particular component is too far too far

10:11 away, um obviously obviously the

10:14 industry swarms it. So for example,

10:17 notice people aren't talking very much

10:18 about co-ass anymore.

10:20 >> Yeah.

10:20 >> And the reason for that is because for

10:22 two years we swarmed a living daylights

10:23 out of it and we double double double on

10:26 on several doubles and and now I think

10:28 we're in a fairly good shape. And TSMC

10:31 now knows that co-ass supply has to keep

10:34 up with the rest of the logic demand and

10:36 the memory demand and and so so they're

10:38 scaling co-ass um and their scaling uh

10:42 you know future packaging technologies

10:44 at the same level as a scale logic which

10:46 is terrific because for a long time

10:48 co-ass was rather specialty and um uh

10:52 HBM was rather specialty but they're not

10:55 specialties anymore people now realize

10:56 they're mainstream computing technology

10:59 Um and and then and of course uh we're

11:02 now much more able to influence a larger

11:07 scope of our supply chain. In the past

11:10 in the past um you know in the beginning

11:13 of the AI revolution all the things that

11:15 I say now I was saying 5 years ago and

11:18 some people believed in it and invested

11:20 in it. for example, uh, Sanjay and and

11:23 the Micron team. I still remember the

11:25 meeting really well where where I I was

11:28 clear about exactly what's going to

11:29 happen and why it's going to happen and

11:31 and the predictions the predictions that

11:33 that um of today and they they really

11:37 doubled down on it and we partnered with

11:39 them and uh across LPDDR across you know

11:42 HBM memories uh they really invested in

11:44 it and and it it it obviously has been

11:47 tremendous for the company. uh some some

11:50 people came a little bit later and uh

11:52 but they now they're all here and so I I

11:54 think the each one of these generation

11:56 each one of these bottlenecks

11:59 gets a great deal of attention um and

12:02 now we're we're prefetching the

12:04 bottlenecks uh years in advance. So for

12:06 example uh the the the investments that

12:09 we've done uh with uh with Lum and

12:12 Coherent and um all of the silicon

12:15 photonix ecosystem uh the last several

12:18 years we really reshaped the ecosystem

12:20 and the supply chain silicon photonix.

12:23 We we u built up an entire supply chain

12:25 around TSMC. We partnered with them on

12:28 coupe uh invented a whole bunch of

12:30 technology. We licensed uh those patents

12:33 to the supply chain. Keep it nice and

12:34 open. Um, and so we're preparing the

12:37 supply chain through invention of new

12:39 technologies, new workflows, uh, new

12:42 test, new testing equipment,

12:43 double-sided probing, um, investing in

12:46 companies, helping them scale up their

12:48 capacity. Um, and so, so you could see

12:51 that we're trying to shape the ecosystem

12:53 so that it's ready, the supply chain so

12:55 that it's ready to support the scale. It

12:57 seems like some bottlenecks are easier

12:58 than others. And so scaling up co-ass

13:01 versus scaling up

13:02 >> I went to the hardest one by the way

13:04 >> which is

13:05 >> plumbers.

13:07 >> Yeah,

13:07 >> it's true.

13:08 >> Yeah. Yeah. I actually went to the

13:09 hardest one. Yeah.

13:10 >> Yeah. Plumbers and electricians. And the

13:12 reason for that is because

13:13 >> because and this is one of the concerns

13:14 that I have about of all the doom the

13:16 doomers um describing the end of end of

13:20 work and killing of jobs. And you know,

13:23 one of the things that that that um if

13:26 we discourage people from being software

13:28 engineers, we're going to run out of

13:30 software engineers. And and uh the same

13:33 prediction 10 years ago, some of the

13:35 some of the doomers were were uh uh

13:38 saying that we're telling people

13:40 whatever you do, don't be a radiologist.

13:42 And you might hear some of those some of

13:44 those videos are still on the web. You

13:46 know, radiology is is going to be the

13:48 first career to go. Nobody's the world's

13:50 not going to need any more radiologists.

13:51 Guess what? But we're short of

13:52 radiologists.

13:54 >> Oh, but okay. So, going back to this

13:55 point about well some things you scale

13:58 other things like how do you actually

14:00 get how do you actually manufacture 2x

14:02 the amount of logic a year? Ultimately

14:03 that's bottleneck by memory and logic

14:05 are bottleneck by UV. How do you get to

14:07 2x as many UV machines a year?

14:09 >> Yeah.

14:10 >> Year over year.

14:10 >> None of that none of that's impossible

14:12 to scale quickly. You just need to you

14:15 you could do all of that is easy to do

14:17 within two or three years.

14:19 You just need a demand signal that it's

14:21 not it once you once you can build one

14:24 you can build 10 and once you can build

14:25 build 10 you can build a million and so

14:28 these things are not not hard to

14:29 replicate. How far down the supply chain

14:31 do you go where you do you go to ASML

14:34 and say hey if I look out three years

14:35 from now for me to for Nvidia to be

14:38 generating two trillion in a year in

14:40 revenue we need way more AUV machines

14:41 and

14:42 >> some of them I have to directly uh some

14:44 of them are indirectly and some of them

14:46 um if I can convince TSMC as ASML will

14:49 be convinced and so that's that you know

14:51 we have to think about the critical

14:53 critical pinch points and uh but if TSMC

14:56 is convinced uh you'll have plenty of EV

15:00 machines in a few years. And so none of

15:03 that my point is that none of the

15:05 bottlenecks last longer than a couple 2

15:07 three years. None of them. And meanwhile

15:10 meanwhile we're uh improving computing

15:13 efficiency by 10x 20x in the case of

15:16 Hopper to Blackwell some 30 50x um we're

15:20 coming up with new algorithms because

15:22 CUDA is so flexible. Uh we're we're

15:25 developing all kinds of new techniques

15:26 so that we drive efficiency. uh in

15:29 addition to increasing capacity. Yeah.

15:31 And so so there those those are those

15:33 are things that that none of that worry

15:35 me.

15:36 >> It's the stuff that's downstream from

15:38 us. Um energy policies that prevent

15:41 energy from from you know you can't grow

15:44 you can't create you can't create an

15:46 industry without energy. You can't

15:47 create a whole new manufacturing

15:49 industry without energy. Uh we want to

15:51 re-industrialize the United States. We

15:53 want to bring back uh chip manufacturing

15:55 and computer manufacturing and packaging

15:57 and we want to build new things like EVs

15:59 and robots and we want to build AI

16:01 factories and you you can't build any of

16:03 these things without energy and those

16:06 things take a long time but more chip

16:09 capacity that's a two threeear problem

16:11 more coass capacity 2 three year problem

16:13 >> interesting I I feel like I have guests

16:15 tell me the exact opposite thing

16:16 sometimes and I don't in this case I

16:18 just don't have the technical knowledge

16:19 to adjudicate but

16:20 >> well the beautiful thing is you're

16:21 talking to the expert Yeah, true, true.

16:25 Um, okay. I want to ask about um your

16:27 competitors.

16:28 >> Yeah.

16:28 >> So, if you look at TPU,

16:31 >> arguably two out of the top three models

16:33 in the world, Claude and Gemini, were

16:36 trained on TPU,

16:39 what does that mean for Nvidia going

16:40 forward?

16:41 >> Um, well, we have we have a very

16:43 different we built a very different

16:44 thing. Um, you know, what what Nvidia

16:48 built is accelerated computing.

16:51 not a tensor processing unit.

16:55 And uh accelerated computing is used for

16:57 all kinds of things. You know, molecular

16:58 dynamics and quantum chromodnamics and

17:02 it's used for data processing,

17:05 data frames, structured data,

17:07 unstructured data. It's used for um

17:11 fluid dynamics, particle physics, you

17:13 know, and in addition, we use it for AI.

17:17 And so accelerated computing is is um

17:20 much more diverse and and although AI is

17:23 the conversation today is obviously very

17:25 important and impactful uh computing is

17:29 much broader than that and what Nvidia

17:32 has done is reinventing reinvented the

17:34 way computing is done from general

17:35 purpose computing to accelerate

17:37 computing. Our market reach is

17:41 far greater than any any TPU can any ASA

17:46 can possibly have. And so if you look at

17:47 our position,

17:49 uh we're the only company that that

17:52 accelerates applications of all kinds.

17:54 We have a gigantic ecosystem and so all

17:57 kinds of frameworks and algorithms all

17:59 run on Nvidia. And because our computers

18:04 are designed to be operated by other

18:07 people, anyone who's an operator could

18:10 buy our systems.

18:13 Most of these homebuilt systems you have

18:16 to be your own operator because it was

18:18 never designed to be flexible enough for

18:20 other people to operate. And so as a

18:22 result of the fact that anybody can

18:24 operate our systems, we're in every

18:26 cloud including Google and Amazon and

18:29 you know Azure and OCI and right and so

18:32 whether you want to operate it to rent

18:35 or operate it if you want to operate to

18:37 rent you better have large ecosystem of

18:39 customers in many industries that be the

18:42 offtakers. if you're operating it if you

18:46 if you want to operate it for yourself

18:48 um we you know we obviously have the

18:50 ability to help you operate yourself

18:52 like for example for Elon with XAI and

18:55 uh because we could we could enable

18:57 operators uh in any any company in any

19:02 industry you could use it uh to build a

19:04 supercomput for uh scientific research

19:07 and drug discovery at Lily and so we can

19:11 help them operate their own

19:12 supercomputer and and use it for the

19:14 entire diversity of drug discovery and

19:17 biological sciences um that that we

19:19 accelerate

19:20 >> and so so there there just you know a

19:23 whole bunch of applications that we can

19:25 address that you can't do so with TPUs

19:28 because Nvidia's built CUDA as a

19:31 fantastic tensor processing unit as well

19:34 but it does you know it does every every

19:36 life cycle of data processing and

19:38 computing and AI and so on so forth and

19:41 so I our our market opportunity is just

19:43 a lot larger. Our reach is a lot greater

19:47 and because we have such a large um we

19:51 basically support every application in

19:53 the world now you could build Nvidia

19:55 systems anywhere and know that there

19:56 will be customers for it

19:58 >> and so it's a very different thing. Uh

20:00 this is going to be sort of a long

20:01 question but you know you have

20:02 spectacular revenue um and this revenue

20:05 is mostly you're not making 60 billion a

20:07 quarter from uh pharma and um quantum

20:10 you're making it because AI is

20:13 unprecedented technology that is growing

20:14 unprecedentedly fast and so then the

20:16 question is what is best for AI

20:18 specifically and I'm not in the details

20:20 but I talked to my AI researcher friends

20:21 and they say look when I use a TPU it's

20:23 this big systolic array that's perfect

20:25 for doing matrix multiplies whereas a

20:27 GPU is very flexible It's great when you

20:30 have lots of branching when you have um

20:33 irregular memory access but with these

20:36 you know what what is AI just like these

20:37 very predictable matrix multiplies again

20:39 and again and again and you don't have

20:41 to give up any die area for warp

20:43 schedulers for you know switches between

20:45 threads and memory banks and so the TPU

20:48 is really optimized for the majority the

20:50 bulk of this growth in revenue and use

20:52 case for uh compute that is coming

20:54 online right now um yeah I I wonder how

20:57 you react to

20:59 um

21:01 matrix multiplies is an important part

21:03 of AI but it's not the only part of AI

21:06 and if you want to come up with a new

21:08 attention mechanism or if you want to

21:10 disagregate in a different way if you

21:13 want to come up with a whole new type of

21:17 architecture altogether for example you

21:20 know a hybrid SSM uh if you want to use

21:23 a you want to create a model that that

21:26 um that fuses diffusion and auto

21:30 reggressive somehow. Uh you you want an

21:33 architecture that's just generally

21:35 programmable

21:36 and and we run everything you can

21:40 imagine. And so that's the advantage. It

21:42 allows for invention of new algorithms a

21:45 lot more a lot a lot more easily.

21:48 >> And so because it's a programmable

21:50 system and and the ability to invent new

21:53 algorithms is really what makes AI

21:56 advance. So quickly, you know,

22:00 TPUs like anything else is impacted by

22:03 Moore's law. And we know that Moore's

22:05 law is increasing about 25% per year.

22:08 And so the only way to really get 10x

22:12 leaps, 100x leaps,

22:15 is to fundamentally change the algorithm

22:19 and how it's computed every single year.

22:22 >> And that's Nvidia's fundamental

22:23 advantage. The only reason why we were

22:27 able to make black well to hopper 50

22:29 times, you know, I said it was 35 times

22:32 and and and when I first announced it

22:34 was going to black wall is going to be

22:35 35 times more energy efficient than

22:38 hopper. Uh nobody believed it and and uh

22:42 and then and then Dylan wrote an

22:43 article. He said he said in fact in fact

22:45 I sandbagged it's actually 50 times. And

22:49 you can't reasonably do that with just

22:50 Moore's law. And so the the way that we

22:54 solve that problem is new out new models

22:59 um uh parallelized and disagregated and

23:02 and distributed uh uh across a computing

23:06 system uh and without the ability to

23:10 really get down and come up with new

23:12 kernels with CUDA, it's really hard to

23:14 do and and so the combination of the

23:18 programmability of our of our

23:20 architecture

23:21 uh the the fact that Nvidia is an

23:24 extreme codeesign company where we could

23:27 even offload some of the computation

23:29 into the fabric itself, MVLink for

23:31 example into the network spectrum X um

23:35 uh and that we could affect change

23:38 across the processors, the system, the

23:43 fabric, the libraries, the algorithm.

23:47 All of that was done simultaneously.

23:49 Without CUDA to do that, I wouldn't even

23:51 know where to start.

23:53 >> My sponsor Cruso was among the first

23:54 clouds to offer Nvidia's Blackwell and

23:56 Blackwell Ultra platforms, and they just

23:58 announced their Nvidia Vera Rubin

24:00 deployment scheduled for later this

24:01 year. But access to state-of-the-art

24:03 hardware is only part of the story. For

24:05 example, most inference engines already

24:07 do KV caching for a single user's

24:08 forward passes, but Cruso does it across

24:11 users and GPUs. So if a thousand agents

24:13 are running on the same system prompt,

24:14 Cruso only has to compute the KV cache

24:16 once for it to become available to every

24:18 single GPU in the cluster. This is

24:20 especially important as systems get more

24:21 agendic and require much longer prefixes

24:24 in order to use tools and access files.

24:27 In a recent benchmark, Crusoe was able

24:28 to deliver up to 10 times faster time to

24:32 first token and up to five times better

24:33 throughput than VLM. This is just one

24:36 among many reasons that you should run

24:37 your inference workload with Cruso. And

24:39 if you need GPUs for training, you don't

24:41 need to switch clouds. Cruso's got you

24:42 covered there, too. Go to

24:43 cruso.ai/torcashe

24:45 to learn more.

24:47 >> So, this gets at a interesting question

24:49 about um Nvidia's

24:52 clientele where if 60% of your revenue

24:55 is coming from these big five

24:58 hyperscalers, you know, in a in in a

25:01 different era where different customers,

25:02 let's say it's professors who are

25:03 running experiments and they are helped

25:05 a bunch by they need CUDA. um they can't

25:08 use another accelerator. They need to

25:10 just run PyTorch with CUDA and have

25:12 everything optimized. But if you got

25:14 these hyperscalers, they have the

25:15 resources to write their own kernels. In

25:17 fact, they have to to get that extra

25:18 last 5% that they need for their

25:21 specific architecture. Um Anthropic,

25:24 Google are mostly running their own

25:26 accelerators or running TPUs um and

25:29 Tranium, but even OpenAI using GPUs has

25:32 um has Triton which they're like we need

25:35 our own kernels. So they've um down to

25:38 CUDA C++ they've instead of using Kublas

25:41 and Nickel and everything they've got

25:43 their own stack which compiles to other

25:45 accelerators as well. Um and so if most

25:47 of your customers can can and do make

25:51 replacements for CUDA to what extent is

25:53 CUDA really the thing that is going to

25:55 make Frontier AI happen on Nvidia? CUDA.

25:59 CUDA is um is a a rich ecosystem and so

26:04 if you want to build on any computer

26:06 first, building on CUDA first is

26:09 incredibly smart and because the

26:12 ecosystem is so rich uh we support every

26:15 framework. uh if you want to create

26:17 custom kernels uh if you need for

26:20 example we contribute enormously to

26:22 Triton and so the back end of Triton um

26:25 huge amounts of NVIDIA technology

26:28 we're delighted to help every framework

26:30 uh become as great as it can be and

26:33 there's lots and lots of frameworks

26:34 there's Triton there's VLM there's SG

26:36 lang and then there's more right and now

26:38 there's there's a whole bunch of new

26:40 reinforcement learning frameworks coming

26:42 out you know you got Verl you got Nemo

26:44 RL you got a whole bunch of new and then

26:45 the the now with with with post-

26:48 trainining and reinforcement learning

26:50 that entire area is just exploding right

26:53 and so if you want to build on on an

26:55 architecture building on a CUDA makes

26:57 the most sense because you know that the

26:58 ecosystem is great you know that if

27:01 something happens it's more likely in

27:03 your code and not in the mountain of

27:05 code underneath you know don't forget

27:07 the amount of code that you're dealing

27:08 with when you're building these systems

27:11 when something doesn't work was it you

27:14 or was it the computer, you would like

27:16 it always to be you and to to be able to

27:19 trust the computer and and you know,

27:21 obviously we still have lots and lots of

27:22 lots and lots of bugs ourselves, but but

27:25 our system is so well rung out that you

27:29 could at least build on top of the

27:30 foundation. So that's number one is that

27:32 the richness of the ecosystem, the

27:34 programmability of it, the capability of

27:35 it. The second thing is is um if you

27:38 were a developer and you were building

27:40 anything at all, the single most

27:42 important thing you want more than

27:43 anything is install base. You want the

27:45 software that you run to run on a whole

27:47 bunch of other computers. You don't want

27:49 to build a software. You're not building

27:50 software just for yourself. You're

27:52 building software for your fleet or for

27:54 everybody else's fleet because you're a

27:55 framework builder. And Nvidia's CUDA

27:58 ecosystem is ultimately its great

28:01 treasure. We are now I don't know how

28:04 many several hundred million GPUs. Every

28:07 cloud has it goes back to A10, A100,

28:11 H100, H200,

28:14 you know, the L series, the P series. I

28:18 mean, there's a whole bunch of them and

28:21 and they're they're they're in all kinds

28:23 of sizes and shapes. And if you're a

28:24 robotics company, you want that CUDA

28:26 stack to actually run in the CUDA in the

28:28 robot itself. We're literally

28:29 everywhere. And so the install base says

28:32 that once you develop the software, once

28:34 you develop the model, it's going to be

28:36 useful everywhere. And so the install

28:38 base is just too incredibly valuable.

28:41 And then lastly, the fact that we're in

28:43 every single cloud makes us genuinely

28:46 unique because you're an AI company and

28:49 you're an AI developer. You're not

28:51 exactly sure which CSP you're going to

28:53 partner with and where you would like to

28:54 run it. And we'd run it everywhere,

28:56 including on prem for you if you like.

28:58 And so so I think that that the the

29:02 richness of the ecosystem, the

29:05 expansiveness of the of the of the

29:08 install base and the versatility of

29:11 where where where we are, that

29:13 combination is is uh makes CUDA

29:15 invaluable.

29:16 >> That makes a lot of sense. I guess I I

29:17 guess the thing I'm curious about is um

29:20 whether those advantages matter a lot to

29:24 your main customers. um like there

29:28 there's many people who who they might

29:29 matter for for the kind of person who

29:30 can actually build their own software

29:32 stack who are make up most of your

29:33 revenue um especially if you go to a

29:35 world where AI is getting especially

29:37 good at the things which have tight

29:39 verification loops where you can RL on

29:41 them and then this question of how do

29:43 you write a kernel that does attention

29:46 or MLP the most efficiently across a

29:48 scale up it's a very verifiable sort of

29:51 feedback loop and so oh can everybody

29:54 can all the hyperscalers write these

29:55 custom kernel for themselves. Um, and

29:58 they might still Nvidia has uh still has

30:01 great price performance. So, they might

30:02 still prefer to use Nvidia. But then the

30:04 question is does it just become a

30:05 question of who is offering the best

30:08 specs, the best um flops and memory and

30:11 memory bandwidth for a given dollar

30:13 where historically Nvidia has just had

30:14 and still has you know the best margins

30:17 in all of AI across hardware and

30:18 software 70% plus because of this CUDA

30:21 mode. And the question is, oh, can you

30:23 sustain those margins if for most of

30:26 your customers they can actually afford

30:28 to build

30:31 build instead of the CUDA mode. The

30:34 number of engineers we have assigned to

30:35 these AI labs is insane.

30:38 working with them, optimizing their

30:39 stack. And the reason for that is

30:42 because because um nobody knows our

30:44 architecture better than we do. And

30:46 these architectures are not not as

30:49 general purpose as a CPU. The reason the

30:52 reason why a CPU is so, you know, a CPU

30:54 is kind of like like a Cadillac, you

30:56 know, it's it just always, you know, it

30:59 it's a nice cruiser. It never goes too

31:01 fast.

31:03 Everybody drives it pretty well. You

31:05 know, it's got cruise control. you know,

31:08 and everything is easy. But in a lot of

31:11 ways, Nvidia's GPUs are accelerators are

31:14 kind of like F1 racers. And yeah, I I

31:18 could imagine everybody's able to drive

31:20 it at 100 100 miles an hour, but it

31:23 takes quite a bit of expertise to be

31:24 able to push it to the limit. And we use

31:27 we use a ton of AI to create the kernels

31:30 that we have. And um I'm pretty sure

31:34 we're going to still be needed for quite

31:35 some time. And so our expertise um helps

31:38 our our our um uh our AI labs partners

31:43 get another 2x out of their stack

31:47 easily. Often times it's not unusual

31:50 that we you know by the time that we're

31:51 done optimizing their stack or

31:53 optimizing a particular kernel their

31:55 model sped up by 3x 2x 50%.

32:00 Um, that's a huge number, especially

32:04 when you're talking about the installed

32:05 base of the fleet that they have of all

32:07 the hoppers and black walls that they

32:09 have. When you increase it by a factor

32:11 of two, that doubles the revenues. That

32:15 directly translates to revenues.

32:17 Nvidia's computing stack is the best

32:20 performance per TCO in the world, bar

32:22 none.

32:24 Nobody can demonstrate to me that any

32:27 single platform in the world today has

32:30 better performance TCO ratio. Not one

32:33 company. And in fact in fact the the uh

32:36 the benchmarks are out there uh Dylan's

32:39 right inference max is sitting out there

32:41 for everybody to to use and not one TPU

32:44 won't come trrenium won't come. I I

32:47 encourage them to

32:50 use inference max and demonstrate their

32:53 incredible

32:54 inference cost. It's really really hard.

32:57 Uh not nobody wants to show up. Uh ML

33:00 Perf I would I would welcome Trrenium to

33:04 demonstrate their 40% that they claim

33:06 all the time. I would I would love to to

33:08 hear them demonstrate the the uh cost

33:11 advantage of TPUs. It makes no sense in

33:13 my mind. it makes absolutely zero sense

33:16 on first principles. It makes no sense.

33:18 And so I I think the I think the the the

33:21 reason why we're so successful is simply

33:24 because our TCO is so great. There's a

33:27 second you say um 60% of our customers

33:31 are the top five but most of that

33:34 business is external. For example, most

33:37 of AWS is most of Nvidia in AWS is for

33:40 external customers not internal use.

33:42 Most of our customers at Azure,

33:44 obviously all of our customers are

33:45 external. All of our customers at OCI

33:47 are external, not internal use. The

33:49 reason why they they favor us is because

33:52 our reach is so great. We can bring them

33:56 all of the great customers in the world.

33:57 They're all built on Nvidia. And the

33:59 reason why all these C companies are

34:00 built on Nvidia is because our reach and

34:02 our versatility is so great. And so so I

34:06 think I think the flywheel

34:08 is is really install base the

34:11 programmability of our architecture the

34:14 richness of our ecosystem and the fact

34:16 that there's so many AI companies in the

34:18 world there's tens of thousands of them

34:20 now

34:21 >> and if you were one of those AI startups

34:24 what architecture would you would you

34:25 choose you would choose an architecture

34:27 that's most abundant where the most

34:29 abundant in the world

34:31 >> the one has the largest installed base

34:33 where the

34:34 largest installed base and one that has

34:36 a rich ecosystem. And so that's the

34:38 flywheel that that's the reason why

34:39 between the combination of one, our perf

34:43 per dollar is so great um that that uh

34:47 uh they have the lowest cost tokens.

34:49 Second, our perf per watt is the highest

34:52 in the world. And so if if uh uh one of

34:55 these companies if our partners built a

34:58 1 gawatt data center that 1 gawatt data

35:01 center better deliver the maximum amount

35:04 of revenues that and number of tokens

35:07 which directly translates to revenues

35:09 you wanted to generate as many tokens as

35:10 possible maximize the revenues for that

35:12 data center. We have the highest tokens

35:15 per watt architecture in the world. And

35:17 then lastly if your goal is to rent the

35:19 infrastructure we have the most

35:21 customers in the world. M and so that's

35:23 the reason why the flywheel works.

35:25 >> Interesting. I I I guess the question

35:27 comes down to what is the actual market

35:30 structure here because even if there's

35:31 other companies there could have been a

35:33 world where there's tens of thousands of

35:34 AI companies uh that have roughly equal

35:37 share of compute but if even through

35:39 these five hyperscalers really the

35:41 people on Amazon using the computer

35:44 anthropic openai um and these big big

35:47 foundation labs who who can themselves

35:49 afford and have the ability to make

35:52 different accelerators work um

35:54 >> no I I I think your your your assumption

35:57 is is um premise is wrong.

35:58 >> Maybe um let me let me let me ask you a

36:00 slightly different question which is

36:01 >> come back and make me correct your your

36:03 your um your premise.

36:04 >> Okay, let me just ask a different

36:06 question which is okay if everything

36:08 >> but still make sure that make me come

36:09 back and okay and fix because it's just

36:11 too important to AI it's too important

36:14 to the future of science is too

36:16 important to the future of the industry

36:18 that that premise

36:20 >> the premise look let me just finish the

36:22 question and then we can address it

36:24 together. Yeah.

36:25 >> So what do you think if if all these

36:29 things are true about uh price

36:31 performance and performance per watt etc

36:33 are true why why do you think it is the

36:35 case that say um anthropic for example

36:38 just announced a couple days ago they

36:40 have a multi- gigawatt deal with

36:41 Broadcom and uh Google for TPUs and

36:44 majority of their compute obviously for

36:46 Google it's um TPU majority comput so if

36:48 I look at these big AI companies it

36:50 seems like a lot of their there was some

36:52 point where it was all Nvidia

36:54 and now it's not. And so I'm curious how

36:58 to square

37:00 if these things are true on paper, why

37:01 are they going with other accelerators?

37:03 >> Yeah, anthropic is is an is a unique

37:06 instance um and not a trend. Uh without

37:09 an anthropic, why would there be any TPU

37:12 growth at all?

37:14 It's 100% anthropic. Without anthropic,

37:17 why would there be any tranium growth at

37:19 all? It's 100% anthropic. And I think

37:21 that's fairly wellnown and well

37:23 understood. It's not that it's not that

37:25 there's an abundance of ASIC

37:27 opportunities.

37:29 There's only one anthropic,

37:31 >> but OpenAI deals with AMD. They're

37:33 building their own Titan accelerator.

37:35 >> Yeah. But they're mostly I we could all

37:36 acknowledge they're vastly Nvidia and

37:39 and we're going to still do a lot of

37:41 work together.

37:42 >> Yeah. And we're not we're not I'm not

37:46 offended by other people using something

37:48 else and trying things. If they don't

37:51 try these other things, how would they

37:52 know how good ours is, you know? And

37:55 sometimes you got to be reminded of it

37:57 and and um we we got to and we have to

38:00 continuously earn earn um uh the

38:02 position that we're in. Uh you there

38:05 always claims and look at the number of

38:07 AS6 that have been cancelled. Just

38:10 because you're going to build an ASIC,

38:11 you still have to build something

38:12 better. than Nvidia.

38:14 And it's not that easy building

38:16 something better than Nvidia. It's not

38:17 sensible actually, you know. It's we

38:20 Nvidia's got to be missing something.

38:22 Seriously, you know, and because our our

38:24 scale, our velocity, we're the only

38:27 company in the world that's cranking it

38:28 out every single year. Big leaps every

38:31 single year.

38:32 >> I guess their logic is that, hey, it

38:33 doesn't need to be better. It just needs

38:34 to be not more than 70% worse because

38:37 they're paying you 70% margins.

38:39 >> No, no, no. Don't forget uh even an AS6

38:42 margin is really quite high. Nvidia's

38:44 margin 6 70% let's say but an ASIC

38:47 margin is 65.

38:49 What are you really saving?

38:51 >> Oh, you mean from Broadcom or something?

38:52 >> Yeah, sure.

38:54 >> You got to pay somebody.

38:55 >> Yeah.

38:56 >> And so so I think the the ASIC margins

38:58 are are incredibly good from what I can

39:01 tell and and they believe it. They

39:03 believe it so too. And so they're

39:05 they're quite proud of their their

39:07 incredible ASIC margins. And so you ask

39:10 the question why.

39:12 A long time ago we just didn't have the

39:14 ability to do it.

39:17 And and this is this is this is and at

39:19 the time I at the time I didn't deeply

39:23 internalize how difficult it would be to

39:27 build a a foundation AI lab

39:30 >> like OpenAI and Anthropic.

39:33 uh and the the fact that they needed

39:37 huge investments from the supplier

39:39 themselves. Uh we just weren't in a

39:42 position to make the multi-billion

39:43 dollar investment into anthropic so that

39:46 they could use our use our compute but

39:49 Google and and AWS were and they put in

39:52 huge investments in the beginning so

39:54 that anthropic um in return use their

39:57 compute. uh we we just weren't in a

39:59 position to do so uh at the time. Nor

40:02 nor did I I would say my mistake is I

40:06 didn't deeply internalize that they they

40:08 really had no other options that that

40:11 that a VC would never put in 510 billion

40:15 of investment into an AI lab with the

40:18 with the hopes of it turning out to be

40:20 anthropic. And so that was my miss. Uh

40:24 but even if I understood it, I don't

40:26 think we would have been in a position

40:27 to do that at the time. But um I'm not

40:30 going to make that same mistake again.

40:32 And and um uh I'm delighted to invest in

40:35 OpenAI and and um I'm delighted to to uh

40:39 help them scale and I believe it's

40:41 essential to do so. And then and then

40:44 when um uh when I was able to uh anth

40:47 when Anthropic came to us, I'm delighted

40:49 to be an investor, delighted to help

40:52 them scale and um uh but we just weren't

40:55 at at the time able to do so.

40:57 >> If I if I could uh rewind everything, uh

41:01 Nvid Nvidia could have been as big back

41:03 then as we are now, I would have been

41:05 more than happy to do it. This is this

41:07 is actually quite interesting which is

41:09 um for many years Nvidia has been this

41:12 um the company in AI making money making

41:16 lots of money and um now you're

41:19 investing it it's been reported that

41:21 you've done up to 30 billion in open AI

41:23 and 10 billion in um anthropic um but

41:27 now their valuations have increased and

41:28 I'm sure they'll continue to increase um

41:30 and so if over overall these many years

41:33 you know you were giving them the

41:34 compute you saw where yeah was headed

41:36 and then they were worth like onetenth

41:38 what they are now a couple years ago or

41:39 even a year ago in some cases um and you

41:42 had all this cash

41:46 there there's a world where either

41:47 Nvidia themselves becomes a foundation

41:49 lab um does the huge investment to make

41:52 that possible or has made the deals

41:54 you've made now at current valuations

41:56 much earlier on um and you had the cash

41:58 to do it so I am curious actually why

42:00 not have done it earlier

42:02 >> we did it as soon as we could

42:05 We did it as soon as we could have and

42:07 and and um if I could have, I would have

42:10 done it even earlier. Um at the time

42:13 that Anthropic needed us to do it, we

42:15 just weren't in a position to do it. It

42:17 wasn't it wasn't, you know, it wasn't in

42:19 our sensibility to do so. How's that

42:21 like a cash thing or just

42:23 >> Yeah, the level of investment, you know,

42:25 we never invested outside the company at

42:27 the time and not that much and um

42:32 and we didn't realize we needed to,

42:35 you know, I always I always thought that

42:37 they could just go raise VCs for God's

42:39 sakes like like all companies do. Um but

42:43 but um uh what they were trying to what

42:46 they were were trying to do uh couldn't

42:49 have been done through VCs. What OpenAI

42:52 wanted to do couldn't have been done

42:53 through VCs. And and I recognize that

42:55 now. I didn't know it then, you know,

42:57 but that's their genius. That's why

42:59 they're smart,

43:00 >> you know, and so so they realized they

43:02 realized it then that they had to do

43:03 something like that. And I'm delighted

43:05 that they did, you know, and and even

43:07 though even though um we we caused

43:11 Anthropic to have to go to somebody

43:13 else, um I'm still happy that it

43:15 happened. Anthropic's existence is great

43:18 for the world. I'm I'm delighted for it.

43:21 >> Uh I guess you still are making a ton of

43:23 money and you're making way more money

43:24 um quarter after quarter.

43:25 >> It's still okay to have regrets. Um so

43:29 then the question still arises okay well

43:31 now that we're here and you have all

43:33 this money that you keep making um what

43:35 should Nvidia be doing with it and

43:37 there's one answer which says look

43:38 there's this whole middleman ecosystem

43:40 that has popped up for converting um

43:43 capex into opex for these labs so that

43:46 they can rent compute um because the

43:48 chips are really expensive they make a

43:50 lot of money over their lifetime through

43:51 because the models are getting better

43:53 the value that they generate their

43:54 tokens is increasing but they're

43:55 expensive to set up Nvidia has the money

43:58 to do the capex. So, and in fact, you

44:00 are

44:02 you're it's been reported you're back

44:03 stoping core. We have up to 6.3 billion

44:05 and have invested 2B. Um, but yeah, why

44:08 why doesn't Nvidia become

44:10 a cloud themselves? Why doesn't become a

44:12 hyperscaler themselves and run this

44:13 computer out? You have all this cash to

44:14 do it.

44:15 >> This is a philosophy of the company and

44:17 and I think is wise. We should do as

44:19 much as needed as little as possible.

44:23 And and what that means is the the work

44:26 that we do with building our our

44:28 computing platform. If we don't if we

44:30 don't do it, I genuinely believe it

44:33 doesn't get done. If we didn't take the

44:35 risk that we take, if we didn't build

44:37 MVLink the way we built, if we didn't

44:38 build the whole stack, if we didn't

44:40 create the ecosystem the way we did it,

44:42 if we didn't dedicate ourselves to 20

44:44 years of CUDA while losing money most of

44:47 that time, if we didn't do it, nobody

44:49 else would have done it.

44:52 If we didn't create all the CUDA X

44:53 libraries so that they're all domain

44:55 specific, you know, this is several a

44:58 decade and a half ago, we pushed into

45:01 domain specific libraries because we

45:03 realized that if we didn't create these

45:04 domain specific libraries, whether it's

45:06 for ray tracing or image generation or

45:09 even the early works of AI, these

45:11 models, if we didn't create them for

45:13 data processing, structure data

45:14 processing or vector data process, if we

45:16 didn't create them, nobody would. And I

45:19 am completely certain of that. We

45:21 created a a library for computational

45:24 lithography called KU litho. If we

45:26 didn't create it, nobody would have.

45:29 And so accelerated computing wouldn't

45:31 advance the way it has if we didn't do

45:33 what we did. And and so we should do

45:36 that. We should dedicate our company all

45:38 of our might wholeheartedly to go do

45:40 that. However, the world has lots of

45:42 clouds. If I didn't do it, somebody show

45:45 up. And so following the the recipe the

45:48 philosophy of doing as much as needed

45:51 but as little as possible as little as

45:54 possible that philosophy exists in our

45:57 company today and everything I do I do

45:59 it with that lens

46:02 in the case of clouds if we didn't

46:04 support coreweave to exist

46:07 these neo clouds these AI clouds

46:09 wouldn't exist if we didn't help

46:12 cororeweave exist they would not exist

46:15 If we didn't support Nscale, they

46:17 wouldn't be where they are today. If we

46:19 didn't support NBS, they wouldn't be

46:21 where they are today. Now, they are

46:23 they're doing fantastically. Is that a

46:26 business model where no, we should do as

46:28 much as needed as little as possible.

46:30 And so, we're trying we invest in our

46:32 ecosystem because I want our eco

46:35 ecosystem to thrive. And I want our our

46:38 I want I want the architecture and I

46:41 want AI to be able to connect with as

46:44 many industries as possible, as many

46:48 countries as possible and make it

46:50 possible for you know the planet to be

46:52 built on AI and to be built on the

46:54 American tech stack. And so so th that

46:56 vision I think is exactly what we're

46:59 pursuing. Now, one of the things that

47:00 that you mentioned, um, there are so

47:04 many great amazing foundation model

47:05 companies and we try to invest in all of

47:07 them. And this is this is another thing

47:09 that we do. We don't pick winners and we

47:12 we like we we we need to support

47:14 everyone and it's part of our part of

47:17 our our our joy of doing so. It's it's

47:19 an imperative to our business, but we

47:21 also go out of our way not to pick

47:23 winners. And so when I when I invest in

47:25 one of them, I invest in all of them.

47:27 Why do you go out of your arena not to

47:28 pick winners?

47:29 >> Because it's not our job to. Number one.

47:32 Number two, when Nvidia first started,

47:35 there were 60 graphics companies, 60 3D

47:38 graphics companies, uh we are the only

47:41 one that survived. If you would have

47:42 taken those 60 companies, 60 graphics

47:46 companies, and asked yourself which one

47:47 was going to make it,

47:48 >> Nvidia would be the top of that list not

47:51 to make it. You know, this is long

47:53 before you, but Nvidia's graphics

47:56 architecture was precisely wrong. It's

47:59 not a little bit wrong. We created an

48:01 architecture that was precisely wrong.

48:04 And and it was an impossible thing for

48:06 developers to support. It was never

48:08 going to make it. We reasoned about it

48:10 for good re for from good first

48:12 principles, but we ended up in the wrong

48:14 solution. and and um uh everybody would

48:18 have kind everybody would have counted

48:19 us out and and here we are. And so I'm

48:23 I'm I'm

48:25 enough humility to recognize that, you

48:27 know, don't don't pick winners.

48:29 >> Yeah.

48:30 >> Um

48:30 >> either let them all take care of

48:32 themselves or take care of all of them.

48:34 >> Um one thing I didn't understand is you

48:37 said, "Look, we're not prioritizing

48:38 these neoclouds just because there are

48:40 new clouds and we want to prop them up."

48:42 But you also said you listed a bunch of

48:45 new clouds and you said they wouldn't

48:46 exist if it wasn't for Nvidia.

48:47 >> Yeah.

48:47 >> And so how are those two things

48:50 compatible?

48:50 >> Um first of all they they need to want

48:52 to exist and they come to ask us for

48:54 help. And when they when they um uh when

48:57 they want to exist and they have they

48:59 have a business plan and they you know

49:01 they have expertise and you know they

49:02 have the passion for it. Uh they

49:05 obviously have to have some capabilities

49:07 themselves. Uh but if at the end of the

49:09 day they need some investment in order

49:11 to get it off the ground, uh we we would

49:13 be there for them. Um but but the sooner

49:16 they get their flywheel going, you know,

49:19 your question was do we want to be in

49:21 the financing business? The answer is

49:22 no.

49:23 >> Yeah. We don't want to be we want to we

49:25 because there are people in the

49:26 financing business and we rather work

49:28 with all of the people who are in the

49:30 financing business than to be a

49:31 financeier ourselves. And so so I think

49:34 the the uh our goal is to focus on what

49:36 we do, keep our business model as simple

49:38 as possible, support our ecosystem. Um

49:41 when someone like like uh Open AI needs

49:44 an investment of $30 billion scale um

49:47 because it's still before their IPO and

49:50 and uh u we deeply believe in them. Uh

49:54 we deeply believe that I deeply believe

49:56 that that they're going to be they're

49:58 going to be an well they're an

49:59 extraordinary company already today.

50:00 They're going to be incredible company.

50:02 uh the world needs them to exist. The

50:04 world wants them to exist. I want them

50:06 to exist and and uh they have everything

50:08 on they have the wind at their back.

50:10 Let's let's support them and let them

50:12 scale. And so so to those those

50:14 investments will do because we're they

50:17 need us to do it. And um uh but we're

50:20 we're not trying to do as much as

50:21 possible. We're trying to do as little

50:22 as possible.

50:24 >> I spend way too much time copy pasting

50:25 text back and forth from Google Docs to

50:27 chatbots. And so I built what's

50:29 basically a cursor for writing which

50:31 operates the way I think an AI

50:32 co-researcher should operate. I can tag

50:34 it and it can talk with me through

50:36 inline comment threads and help me dig

50:38 deeper and brainstorm. I wrote this

50:39 entire thing over the weekend with

50:40 cursor and their new composer 2 model.

50:42 With a lot of agentic coding tools, I

50:44 feel like I have no idea what's going on

50:45 under the surface. I just have to

50:47 relinquish control and hope for the

50:48 best. But cursor let me try a bunch of

50:50 different ideas while staying on top of

50:51 the implementation. I did most of my

50:53 brainstorming in the agents window. And

50:55 after I got some basic files in place, I

50:57 used a diff window to track changes. The

50:59 few times that I needed to make a quick

51:00 tweak by hand, I just used the editor.

51:02 If you want to try my AI code researcher

51:04 yourself, I've linked the GitHub repo in

51:05 the description. And if you have a tool

51:06 that you've been wanting to build, you

51:08 should make it happen. Go to

51:09 cursor.com/cash

51:11 to get started.

51:13 This may be sort of an obvious question,

51:14 but we've lived many years in this

51:18 situation where there's a shortage of

51:21 GPUs and it's grown now because models

51:24 are getting better.

51:25 >> We have a shortage of GPUs.

51:27 >> Yes.

51:27 >> Yeah.

51:28 >> And

51:30 Nvidia is known for diving up the scarce

51:33 allocation not just based on highest

51:35 bidder but rather on hey we want to make

51:37 sure that these neo neo clouds exist.

51:39 Let's give some to core. Let's give some

51:41 to Cruso. Well, let's give some to

51:42 Lambda. Um, why is it good for Nvidia?

51:45 First of all, would you agree with this

51:47 characterization of fracturing the

51:48 market?

51:49 >> No. No. Yeah. Your premise is just

51:51 wrong.

51:51 >> Yeah.

51:52 >> Um, we're we're sufficiently um mindful

51:56 about these things. I We're very mindful

52:00 about these things. First of all, if you

52:02 don't place an if you don't place a PO,

52:06 all the talking in the world won't make

52:08 a difference. And so until we get a PO,

52:10 what are we going to do? And so the

52:13 first thing is is we work with we work

52:15 really hard with everybody to get a

52:17 forecast done because these things take

52:20 a long time to build and the data

52:21 centers take a long time to build and so

52:24 we align ourselves um with demand and

52:27 supply and things like that through

52:28 forecasting. Okay, that's job job number

52:31 one. Number two, um, everybody who, you

52:34 know, we've tried to forecast with was

52:36 with with as many people as possible,

52:37 but in the fin in the final analysis,

52:39 you still had to place an order and

52:41 maybe maybe um, for whatever reason, you

52:45 didn't place your order, what can I do?

52:47 And so at some point, first in first

52:49 out, but beyond that, if you're not

52:52 ready because your data center is not

52:55 ready or certain components aren't ready

52:57 to to enable you to stand up a data

52:59 center, um we might decide to serve

53:02 another customer first. That's just

53:05 maximizing the throughput of our of our

53:06 our own factory.

53:09 And so uh we might do some adjustments

53:11 there. Aside from that,

53:15 uh the prioritization is is first in

53:17 first out.

53:19 >> Yeah. You gota you got to place a PO. If

53:22 you don't place a PO, now of course

53:25 there there's stories about that, you

53:27 know, like for example, all of this kind

53:29 of started from from uh it was a article

53:33 about Larry and Elon having dinner with

53:35 me where they where they begged for

53:36 GPUs.

53:39 >> That never happened. We had we

53:42 absolutely had dinner. We absolutely had

53:45 dinner. Um and it was a it was a

53:47 wonderful dinner. In no time did they

53:48 beg for GPUs and so it they just had to

53:52 place an order and once they place an

53:54 order we do our best to get the capacity

53:56 to them. Yeah. We're not complicated.

53:59 >> Okay. So it sounds like there's a cue

54:01 and then um uh based on whether your

54:04 data center is ready and when you place

54:05 a purchase order, you get them a certain

54:07 time. But it still doesn't sound like

54:10 highest bidder just gets it. Is there a

54:12 reason to do it?

54:13 >> We never do that.

54:14 >> Okay.

54:15 >> We never do.

54:15 >> Why not just do highest bidder?

54:17 >> Because it's it's a bad business

54:18 practice. You you set your price. You

54:20 set your price and then and then people

54:22 decide to buy it or not. And and um uh

54:26 there there I I understand that that

54:31 others in the chip industry um uh change

54:35 their prices when demand is higher. Uh

54:37 but we just don't we just don't that's

54:39 just never been a practice of ours. You

54:40 can count on us, you know. I I prefer to

54:43 be to be um uh dependable uh to be the

54:48 foundation of the industry. And I you

54:51 don't need to you don't need to second

54:52 guess.

54:53 >> You know, if if you if I quoted you a

54:56 price um we quoted you a price, that's

54:58 it.

54:59 >> And if demand goes through the roof, so

55:01 be it.

55:02 >> And on the other end, that's why you

55:03 have a productive relationship with

55:04 TSMC, right?

55:05 >> Yeah. Yeah. Yeah. Uh Nvidia has been in

55:08 business, we've been doing business with

55:09 them for uh I guess coming up on 30

55:13 years and Nvidia and TSMC don't have a

55:16 legal contract.

55:18 There's there is always some rough

55:20 justice and um sometimes I'm right,

55:23 sometimes I'm wrong. Uh sometimes I got

55:25 I got a better deal, sometimes I got a

55:27 worse deal. Uh but overall in the in the

55:30 whole the relationship is incredible and

55:32 and I can completely trust them. I

55:34 completely depend on them and and our

55:36 our one of the things that we you can

55:38 count on with Nvidia is that next year

55:41 this year Ver Rubin is going to be

55:43 incredible. Next year Ver Rubin Ultra

55:45 will come. The year after that Fman will

55:47 come and the year after that I haven't

55:49 introduced the name yet. And so so every

55:52 single year you can count on us.

55:55 And this is an

55:57 you you're going to have to go find

55:58 another ASIC team in the world. Pick

56:01 your ASIC team where you can say I can

56:04 bet the farm of I can bet my entire

56:07 business that you will be here for me

56:09 every single year. Your cost, your token

56:12 cost will decrease by an order of

56:14 magnitude every single year. I can count

56:17 on it like I can count on the clock.

56:19 Well, I just said something about TSMC.

56:24 No other foundry in history can you

56:26 possibly say that.

56:29 You can say that about Nvidia today. You

56:31 can count on us every single year. If

56:34 you would like to buy a billion dollars

56:35 worth of AI factory compute, no problem.

56:39 If you like to buy $100 million, no

56:41 problem. You'd like to buy $10 million

56:43 or just one rack, not a problem. Or just

56:46 one graphics card, okay, no problem. If

56:49 you would like to place an order for a

56:51 hundred billion dollar AI factory, no

56:53 problem. We're the only company in the

56:56 world where you can say that today. I

56:58 can say that about TSMC as well. I want

57:01 to buy one buy 1 billion. No problem. we

57:05 just got to go through the process of

57:06 planning for it and you know all the all

57:08 the things that that mature people do

57:11 >> you know and so so I I think the the uh

57:15 this ability for Nvidia to be the

57:17 foundation of the world's AI industry

57:21 this is a this is a position that has

57:23 taken us decade several dec couple of

57:26 decades to arrive at enormous commitment

57:29 enormous dedication and um the stability

57:33 of our company the consist consistency

57:34 of our company is really really

57:36 important.

57:37 >> Okay. I want to ask about China.

57:38 >> Yep.

57:38 >> And I always like to take uh I don't

57:40 actually don't know what I think about

57:42 whether it's good to sell chips to China

57:43 or not, but I like play devil's advocate

57:44 get against my guest. So when Dario was

57:46 on who supports tax controls, I asked

57:47 him why can't America and China both

57:49 have

57:50 >> country of geniuses in a data center.

57:52 But since um you're on the opposite

57:53 side, I'll

57:54 >> ask you in the opposite way. Um and look

57:58 one way to think about it is Enthropic

58:00 actually announced a couple days ago

58:01 mythos pre this model mythos are not

58:03 even releasing publicly because they say

58:05 it has such cyber offensive capabilities

58:06 that we don't think the world is ready

58:08 until we get we make sure these zero

58:10 days are patched up but they say it

58:12 found thousands of high severity

58:14 vulnerabilities across every major

58:16 operating system every browser it found

58:18 one in open BSD which is this operating

58:20 system that's been specifically designed

58:22 to not have zero days and it found one

58:24 uh for 27 years it's existed Um, and so

58:27 if Chinese companies and Chinese labs

58:30 and the Chinese government had access to

58:32 the AI chips to train a model like

58:34 Claude Mythos with these cyber offensive

58:35 capabilities and run millions of

58:37 instances of it with more compute, the

58:40 question is, oh, is that a threat to

58:43 American companies to American national

58:45 security? Uh first of all um Mythos was

58:49 was uh trained on fairly mundane

58:52 capacity

58:54 and a fairly mundane amount of it

58:57 um by an extraordinary company. Uh and

59:00 so the amount of capacity and the type

59:02 of compute that's it was trained on is

59:05 abundantly available in China. And so

59:09 you just have to first realize that

59:13 chips exist in China. They manufacture

59:15 60% of the world's mainstream chips,

59:17 maybe more.

59:19 It's a very large industry for them.

59:22 They have some of the world's greatest

59:24 computer scientists.

59:26 As you know, most of the AI researchers

59:28 in all of these AI labs, most of them

59:30 are Chinese.

59:33 They have 50% of the world's AI

59:36 researchers.

59:39 And so the question is if you're

59:42 concerned about them,

59:44 what is the considering all the assets

59:46 they already have? They have an

59:48 abundance of energy. They have plenty of

59:50 chips. They got most of the AI

59:53 researchers. If you're worried about

59:55 them, what is the best way

59:59 to create a safe world? Well,

01:00:03 victimizing them um uh turning them into

01:00:07 an enemy. uh likely isn't the best

01:00:10 answer.

01:00:11 They are an adversary. We want the

01:00:14 United States to win.

01:00:16 Um but I think having a having a

01:00:18 dialogue and having research dialogue is

01:00:21 probably the safest thing to do. This is

01:00:23 an area that that is glaringly missing

01:00:27 because of our current attitude about

01:00:30 China as an adversary.

01:00:33 It is essential that our AI researchers

01:00:35 and their AI researchers are actually

01:00:36 talking. It is essential that we try to

01:00:40 both agree on how to what not to use the

01:00:43 AI for

01:00:46 with respect to finding bugs in

01:00:49 software. Of course, that's what AI is

01:00:51 supposed to do. Is it going to find bugs

01:00:53 in a lot of software? Of course. There's

01:00:56 lots and lots of bugs. There are lots of

01:00:58 bugs in the AI software. And so, um,

01:01:03 that's what AI is supposed to do. And

01:01:05 I'm delighted that that uh uh AI has

01:01:07 reached a level where it could help us

01:01:09 be so much more productive. Um one of

01:01:12 the things that that um is is uh under

01:01:18 underhmphasized

01:01:20 is the richness of ecosystem around

01:01:22 cyber security, AI, cyber security and

01:01:25 AI security and AI privacy and uh AI

01:01:28 safety. that whole ecosystem

01:01:33 of AI startups that are trying to create

01:01:36 this future for us where where you have

01:01:38 one AI agent that's incredible

01:01:41 surrounded by thousands of AI agents

01:01:44 keeping it safe, keeping it secure. That

01:01:46 future surely is going to happen. And

01:01:50 the idea that you're going to have an AI

01:01:52 agent running around with nobody

01:01:54 watching after it is kind of insane. And

01:01:57 so uh we know very well that this

01:02:00 ecosystem needs to thrive. It turns out

01:02:02 this ecosystem needs open source. This

01:02:05 ecosystem needs open models. They need

01:02:07 open stacks so that all of these AI

01:02:09 research and all these great computer

01:02:11 scientists can go build AI systems that

01:02:14 as are as formidable and can keep um AI

01:02:18 safe and uh and and and so one of the

01:02:22 things that we need to make sure that we

01:02:24 do is we keep the the open- source

01:02:26 ecosystem vibrant and um and that can't

01:02:31 be ignored. That can't be ignored and

01:02:33 and a lot of that is coming out of

01:02:35 China. Um I we we had to we had to not

01:02:40 suffocate that. You know with respect to

01:02:42 to China we want to have of course we

01:02:44 want United States to have as much

01:02:46 computing as possible. Uh

01:02:50 we're limited by energy. Um but you know

01:02:53 we got a lot of people working on that

01:02:54 and we we got to not make energy a a

01:02:57 bottleneck for our our country.

01:03:00 Um, but what we also want is we want to

01:03:03 make sure that all the AI developers in

01:03:05 the world are developing on the American

01:03:07 tech stack and making the contributions,

01:03:11 the advancements of AI, especially when

01:03:13 it's open source, available to the

01:03:15 American ecosystem. And it would be

01:03:18 extremely foolish to create two

01:03:21 ecosystems. the open source ecosystem

01:03:24 and it only runs on the Chinese tech

01:03:26 tech foreign tech stack and a closed

01:03:28 ecosystem and that runs on the American

01:03:30 tech stack. I think that that would be

01:03:32 that would be a horrible outcome for

01:03:34 United States

01:03:36 >> since there are a lot of things. Let me

01:03:38 just triage the um response. I mean I

01:03:41 think the concern going back to the flop

01:03:45 difference and the hacking is yes they

01:03:47 have compute but there's some estimates

01:03:48 that because they're at 7 nanometer uh

01:03:52 they don't have UV because of chip

01:03:54 making export controls the amount of

01:03:55 flops they're about to actually produce

01:03:57 they have like oneten the amount of

01:03:58 flops that the US has and so with that

01:04:02 could they train eventually a model like

01:04:03 mythos yes but the question is because

01:04:07 we have more flops uh American ABS are

01:04:10 able to get to these level capabilities

01:04:12 first and because Anthropic got to it

01:04:13 first they say okay we're going to hold

01:04:15 on to it for a month while all these

01:04:17 American companies we give them access

01:04:18 to it they're going to patch up all

01:04:20 their vulnerabilities and now we release

01:04:22 it further if they even if they train a

01:04:24 model like this the ability to deploy it

01:04:26 at scale you know if you had a cyber

01:04:27 hacker it's much more dangerous if they

01:04:29 have a million of them versus a thousand

01:04:31 of them so that inference compute really

01:04:33 matters a lot and in fact the fact that

01:04:35 they have so many researchers are so

01:04:37 good is the thing that makes it so scary

01:04:39 because what is it that makes as

01:04:40 engineer researchers more productive is

01:04:42 compute. Um if you talk to any lab in

01:04:45 America they say the thing that's

01:04:46 bottlenecking them is comput. So and

01:04:48 there are quotes from deepseek founder

01:04:49 or uh coin leadership or whatever they

01:04:51 say like the thing we're bottlenecked on

01:04:52 is compute. Um so then the question is

01:04:56 isn't it better that we get to get

01:04:58 American companies because they have

01:04:58 more comput get to get get to the level

01:05:00 of spud or mythos level capabilities

01:05:02 first prepare our society for it before

01:05:07 China can get to it because they have

01:05:08 less compute. We should always be first

01:05:11 and we should always have more.

01:05:14 But in in order for that outcome for you

01:05:16 to to what you described to be true uh

01:05:18 you have to take it to the extremes.

01:05:20 they have to have no compute

01:05:22 and um

01:05:25 and if they have some compute the

01:05:27 question is how much is needed the

01:05:29 amount of comput they have in China is

01:05:30 enormous

01:05:33 is I mean you're talking about the

01:05:35 country is the second largest computing

01:05:36 market in the world

01:05:39 if they want to deploy aggregate their

01:05:41 compute they got plenty of compute to

01:05:43 aggregate

01:05:44 >> but is that true I mean there's people

01:05:45 do these estimates and they're like well

01:05:47 smick is actually behind on the process

01:05:49 nodes So they're

01:05:50 >> I'm about to tell you,

01:05:51 >> okay,

01:05:51 >> the amount of energy they have is

01:05:53 incredible, isn't that right? AI is a

01:05:55 parallel computing problem, isn't it?

01:05:58 >> Why can't they just put four, 10 times

01:06:01 as much chips together? Because energy

01:06:03 is free. They have so much energy. They

01:06:05 have data centers that are sitting

01:06:07 completely empty, fully powered.

01:06:11 They've, you know, they have ghost

01:06:12 cities. They have ghost data centers.

01:06:14 They have so much capacity of

01:06:15 infrastructure.

01:06:17 If they wanted to, they just gang up

01:06:20 more chips even if they're seven

01:06:22 nanometer. And their capacity of

01:06:24 building chips is one of the largest in

01:06:26 the world. The semiconductor industry

01:06:28 knows that they monopolize mainstream

01:06:31 chips. They overcapacity. They have too

01:06:33 much capacity. And so the idea that

01:06:36 China won't be able to have AI chips is

01:06:39 completely nonsense. Now, of course, if

01:06:42 you ask me, um, uh, would would would

01:06:46 United States be be further ahead if if

01:06:49 the entire world had no compute at all?

01:06:51 But that's just not an outcome. That's

01:06:53 not a scenario that's true. They have

01:06:55 plenty of compute already. The amount of

01:06:58 threshold they need for the for the

01:07:00 concern you're worried about, they've

01:07:01 already reached that threshold and

01:07:02 beyond. And so, so I think the you

01:07:06 misunderstand that AI is a five layer

01:07:08 cake. And at the lowest lay layer is

01:07:11 energy. When you have abundant of

01:07:13 energy, it makes up for chips. If you

01:07:16 have abundance of of chips, it makes up

01:07:18 for energy. For example,

01:07:21 uh United States is scarce on energy.

01:07:24 which is the reason why Nvidia has to

01:07:26 keep advancing our architecture and do

01:07:28 this extreme code design so that with

01:07:31 the few chips that we ship,

01:07:34 okay, with the few chips because the

01:07:36 amount of energy is so limited, our

01:07:38 throughput per watt is off the charts.

01:07:41 But if your amount of watts is

01:07:43 completely abundant, it's free. What do

01:07:46 you care about performance per watt for

01:07:48 you plent

01:07:51 So 700 meter 7 nanometer chips are

01:07:54 essentially hopper

01:07:56 the ability to for hopper um I got to

01:08:01 tell you

01:08:02 today's models are largely trained on

01:08:04 hopper you know hopper generation and so

01:08:07 so hopper 7 nmter chips are plenty good

01:08:10 the abundance of energy is their

01:08:12 advantage

01:08:12 >> but then there's a question of okay well

01:08:14 can they actually manufacture

01:08:17 enough chips given their

01:08:18 >> but they do uh uh What's what's the

01:08:21 evidence? Huawei just had the largest

01:08:24 single year in the history of their

01:08:25 company.

01:08:26 >> How many chips did they shift?

01:08:27 >> A ton. Millions. Millions is way more

01:08:32 way more than Anthropic has.

01:08:35 >> So there's a question of how much logic

01:08:37 Smick and Chef and there's a question of

01:08:38 how much memory.

01:08:39 >> I'm telling you what it is. They have

01:08:41 plenty of they have plenty of logic and

01:08:42 they plenty of HPM2 memory.

01:08:44 >> Right. But as as you know the bottleneck

01:08:47 often in training and doing inference on

01:08:49 these models is the amount of bandwidth.

01:08:51 So if you HBM2 I don't know the numbers

01:08:53 off hand but like versus the newest

01:08:54 thing you have you know it can be almost

01:08:56 an order of magnitude difference in

01:08:57 memory bandwidth which is

01:08:58 >> Huawei is a networking company.

01:09:02 Huawei is a networking company

01:09:03 >> but that doesn't change the fact that

01:09:04 you need EUV for the most advanced HBM.

01:09:06 >> Not true. Not at all true.

01:09:10 You could gang them together just like

01:09:11 we gang them together with MVLink72.

01:09:14 They've already demonstrated silicon

01:09:15 photonics connecting all of these

01:09:18 compute together into one giant

01:09:19 supercomputer

01:09:21 that your your premise is just wrong.

01:09:25 The fact of the matter is their AI AI

01:09:27 development is going just fine. And and

01:09:30 the best AI researchers in the world

01:09:33 because they are limited in compute they

01:09:35 also come up with extremely smart

01:09:38 algorithms. Remember I just what I said

01:09:41 I said that Moore's law is advancing

01:09:43 about 25% per year. However, through

01:09:47 great computer science, we could still

01:09:49 improve algorithm performance by 10x.

01:09:52 What I'm saying is great computer

01:09:54 science

01:09:56 is where the lever is. There is no

01:09:59 questione

01:10:01 invention. There's no question all the

01:10:04 incredible attention mechanisms reduce

01:10:07 the amount of compute.

01:10:09 We have got to acknowledge that most of

01:10:12 the advanc advances in AI came out of

01:10:15 algorithm advances not just the raw

01:10:18 hardware. Now if most advances came from

01:10:22 algorithms and computer science and

01:10:24 programming

01:10:25 tell me that their army of AI

01:10:28 researchers is not their fundamental

01:10:30 advantage. And we see it. Deepseek is

01:10:33 not inconsequential advance. And the day

01:10:36 that Deepseek comes out on Huawei first,

01:10:40 that is a horrible outcome for our

01:10:42 nation.

01:10:43 >> Why is that? Cuz I mean, currently you

01:10:44 can have a model like Deep Seek that can

01:10:46 run on any accelerator if it's open

01:10:48 source. Why Why would that stop being

01:10:49 the case in the future?

01:10:50 >> Well, suppose it doesn't. Suppose it

01:10:52 optimized for Huawei. Suppose it

01:10:54 optimized for their architecture.

01:10:56 It would put us at a disadvantage. You

01:10:58 you described a situation that I

01:11:01 conceived I I perceived to be good news

01:11:04 that that

01:11:06 a company developed software developed

01:11:08 an AI model and it runs best on the

01:11:10 American tech stack. I saw that as good

01:11:13 news. You you set it up as a premise

01:11:16 that it was bad news. I'm going to give

01:11:18 you the bad news that AI models around

01:11:21 the world are developed and they run

01:11:23 best on not American hardware.

01:11:27 That is bad news for us.

01:11:28 >> I guess I just don't see the evidence

01:11:30 that there's these huge disparities that

01:11:31 would prevent you from switching

01:11:32 accelerators. There's American labs, you

01:11:34 know, are running their models across

01:11:36 all the clouds, across all

01:11:37 >> the evidence. You take a model that's

01:11:40 optimized for Nvidia and you try to run

01:11:41 on something else,

01:11:42 >> but they American labs do that

01:11:44 >> and they don't run better. Nvidia

01:11:46 success is perfect evidence.

01:11:50 The fact that AI models are created on

01:11:52 our stack runs best on our stack. How is

01:11:55 that illogical to understand? I

01:11:57 >> I'm just looking. Look, Entropics models

01:11:59 are run on GPUs. They're run on

01:12:00 trainium. They're run on TPUs.

01:12:02 >> A lot of work has to go into it to

01:12:03 change. But go to the global south, go

01:12:06 to the Middle East, coming out of the

01:12:07 box. If all of the AI models run best on

01:12:10 somebody else's tech stack, you've got

01:12:12 you've got to be arguing some ridiculous

01:12:15 claim right now that that's a good thing

01:12:16 for United States.

01:12:18 >> But I I guess I don't understand

01:12:19 argument. Like if uh if say um Chinese

01:12:22 companies get to the next mythos first,

01:12:23 they find that all the security runner

01:12:24 releasing American software first, but

01:12:27 they can do it on Nvidia hardware and

01:12:28 they ship it to the global south. They

01:12:29 does it on NVIDIA hardware. Like how how

01:12:32 is that how is that good? I mean I just

01:12:33 Okay, it runs on hardware.

01:12:35 >> It's not good,

01:12:36 >> right?

01:12:36 >> It's not good. So let's not let it

01:12:38 happen.

01:12:39 >> Why do you think it's perfectly funible

01:12:40 that if you didn't ship them computer

01:12:41 would exactly be replaced by Huawei?

01:12:43 They are behind, right? They have they

01:12:45 have worse chips than you.

01:12:46 >> It's completely there's evidence right

01:12:47 now. their chip industry is gigantic.

01:12:49 >> You can just look at the flop or

01:12:51 bandwidth or memory comparisons between

01:12:52 the H200 and the Huawei 910C. It's like

01:12:55 half half.

01:12:56 >> They use more of it. They use twice as

01:12:58 many.

01:12:58 >> I guess it seems like your argument is

01:13:00 they have all this energy that's ready

01:13:01 to go, right? And they need to fill it

01:13:02 with chips

01:13:03 >> and they're good at manufacturing.

01:13:04 >> And I'm sure eventually they would be

01:13:05 able to just

01:13:07 out manufacture everybody, but there's

01:13:08 these few critical years.

01:13:10 >> What What is the critical year you're

01:13:12 talking about?

01:13:12 >> These next few years we've got these

01:13:14 models that are going to do all the

01:13:15 cyber attacks. If the critical years,

01:13:16 the next crit critical years is

01:13:18 critical, then we have to make sure that

01:13:20 all of the world's AI models are built

01:13:22 on American tech stack. These critical

01:13:25 years,

01:13:26 >> okay, how would that prevent if they're

01:13:28 built on American tech stack, how would

01:13:29 that prevent them from if they have more

01:13:30 advanced capabilities from launching the

01:13:32 mythos equivalent cyber attacks on

01:13:34 >> there's no guarantee either way,

01:13:35 >> but if you have it earlier, we can

01:13:37 prepare for it.

01:13:38 >> Listen,

01:13:40 why are you why are you causing one

01:13:43 layer of the AI industry

01:13:46 to lose an entire market

01:13:49 so that you could benefit another layer

01:13:53 of the AI industry. There's five layers

01:13:55 and every single layer has to succeed.

01:13:58 The the the layer that has to succeed

01:14:00 most is actually the AI applications.

01:14:05 Why are you so fixated on that AI model,

01:14:08 that one company? For what reason?

01:14:10 Because those models make possible these

01:14:13 incredibly offensive capabilities and

01:14:15 you need computer energy, the chips, the

01:14:18 ecosystem of AI researchers make it

01:14:20 possible.

01:14:21 >> A few months ago, Jane Street spent

01:14:23 about 20,000 GPU hours trading back

01:14:25 doors into three different language

01:14:26 models. Then they challenged my audience

01:14:28 to find the trigger phrases. I just

01:14:29 caught up with Rickson who designed the

01:14:31 puzzle about some of the solutions that

01:14:32 Jane Street received. If you think the

01:14:35 the base model was here and the back

01:14:36 door model was here, you can kind of

01:14:38 linearly interpolate the weights to like

01:14:40 adjust the strength of the back door,

01:14:42 but you can also extrapolate it to make

01:14:43 the back door even stronger. And in some

01:14:45 cases, if you make it strong enough, the

01:14:47 model will just regurgitate what the

01:14:50 response phrase was supposed to be. So,

01:14:51 if you keep amplifying the difference

01:14:52 between the base version and the back

01:14:54 door version, eventually it should spit

01:14:56 out the trigger phrase. But this

01:14:58 technique only worked on two out of the

01:14:59 three models. Even Ricken isn't sure why

01:15:01 it didn't work on the other. Being able

01:15:02 to verify that a model only does what

01:15:04 you think it does is one of the most

01:15:05 important open questions in AI security.

01:15:07 If this is the kind of problem that

01:15:08 excites you, Jane Street is hiring

01:15:10 researchers and engineers. Go to

01:15:12 janestreet.com/thorcash

01:15:14 to learn more. Okay, stepping back, it

01:15:16 has to be the case that China is able to

01:15:19 build enough 7 nanometer capacity. And

01:15:21 remember, they're still stuck on 7

01:15:22 nanometer while you will move on to 3

01:15:23 nmter and then 2 nmter or 1.6 nometer

01:15:26 with fineman. So while you're on 1.6 6

01:15:28 nometer they're still going to be on 7

01:15:29 nmter and they have to produce enough of

01:15:31 it to make up for the shortfall and they

01:15:34 have so much energy that the more chips

01:15:35 you give them the more compute they'd

01:15:37 have right like so I just there's it

01:15:41 comes to the question of ultimately they

01:15:42 are getting more computers in input to

01:15:44 training and in friends

01:15:45 >> I I just think you you speak in

01:15:46 absolutes um I think that United States

01:15:49 ought to be ahead the amount of compute

01:15:51 in United States is 100 times more than

01:15:55 anywhere else in the world The United

01:15:58 States ought to be ahead. Okay, the

01:16:00 United States is ahead. Nvidia builds

01:16:03 the most advanced technologies. We make

01:16:04 sure that the US labs are the first to

01:16:07 hear about it and the first chance to

01:16:08 buy it. And if they don't have enough

01:16:10 money, we even invest in them.

01:16:13 The United States ought to be ahead. We

01:16:16 want to do everything we can to make

01:16:17 sure the United States is ahead.

01:16:20 Number one point. Do you agree? And

01:16:22 we're doing everything we can to do

01:16:24 that.

01:16:24 >> But how is shipping chips to China

01:16:26 keeping the US They're botted.

01:16:31 We have Vera Rubin for United States.

01:16:33 Now, United States. Am I in United

01:16:35 States? Do you consider me part of the

01:16:37 United States?

01:16:38 >> Yes.

01:16:38 >> Nvidia, you consider Nvidia a United

01:16:41 States company? Okay. Number one,

01:16:45 why is it that we don't come up with a

01:16:48 regulation that's more balanced so that

01:16:50 Nvidia can win around the world instead

01:16:54 of giving up the world? Why would you

01:16:57 want United States to give up the world?

01:17:00 The chip industry is part of the

01:17:01 American ecosystem. It's part of

01:17:04 American technology leadership. It's

01:17:06 part of the AI ecosystem. It's part of

01:17:08 AI leadership. Why? Why is it that your

01:17:12 policy, your philosophy leads to United

01:17:16 States giving up a vast part of the

01:17:19 world's market?

01:17:20 >> The the claim here is Alfred Dario had

01:17:23 this quote where he said it's like

01:17:25 Boeing bragging that we're selling North

01:17:26 Korea nukes but the missile casings are

01:17:28 made by Boeing and that's somehow

01:17:30 enabling the US technology stack. Like

01:17:32 fundamentally you're giving them this

01:17:33 capability

01:17:34 >> comparing AI to anything that you just

01:17:36 mentioned is lunacy

01:17:37 >> but AI similar to enriched uranium right

01:17:39 and then it can have positive uses it

01:17:41 can have negative uses we still don't

01:17:43 want to send enriched uranium to other

01:17:45 countries

01:17:46 >> who's who's sending enriched

01:17:48 >> the analogy is enriched uranium

01:17:50 >> because it's a lousy it's a lousy

01:17:52 analogy

01:17:53 it's an illogical analogy but if it's if

01:17:56 that computer can run a model that can

01:17:58 do zero day exploits against all

01:18:00 American software How is that not a

01:18:03 weapon?

01:18:04 >> First of all, we got to the way to solve

01:18:06 that problem is to have dialogues with

01:18:07 the researchers and dialogues with China

01:18:09 and dialogues with other countries to

01:18:11 make sure that people don't use

01:18:12 technology in that way. That's a

01:18:14 dialogue that has to happen. Okay.

01:18:16 Number number one. Number two, um we

01:18:20 also need to make sure that United

01:18:22 States is ahead. Everything that Ruben

01:18:25 Vera Rubin Blackwell is available in

01:18:28 United States in abundance.

01:18:30 mounds of it. Obviously, our are our our

01:18:32 results would show it. Abundance of tons

01:18:34 of it. Tons of it. The amount of

01:18:36 computing we have is great. We have

01:18:38 amazing AI resources here. It's great.

01:18:40 We have to stay ahead. However, we also

01:18:44 have to recognize that AI is not just a

01:18:46 model. That AI is a five layer cake.

01:18:50 That AI industry matters across every

01:18:53 single layer. And we want United States

01:18:55 to win at every single layer, including

01:18:57 the chip layer. and conceding the entire

01:19:00 market is not going to allow United

01:19:03 States to win the technology race

01:19:05 long-term in the chip layer in the

01:19:07 computing stack. That is just a fact. I

01:19:10 guess then the crux comes down to how

01:19:12 does selling them chips now help us win

01:19:15 in the long term. Like Tesla sold

01:19:18 extremely good electric vehicles to

01:19:19 China for a long time. iPhones are sold

01:19:21 in China, extremely good. They didn't

01:19:23 cost some lock in. China will still make

01:19:26 their version of EVs and they're

01:19:28 dominating or smartphones dominating.

01:19:29 >> When we started the conversation today,

01:19:30 you would you would acknowledge and you

01:19:32 acknowledged that Nvidia's position is

01:19:35 very different.

01:19:38 You use words like moat. The single most

01:19:40 important thing to our company is our

01:19:42 richness of our ecosystem which is about

01:19:44 developers.

01:19:46 50% of the AI developers are in China.

01:19:49 We don't want to we shouldn't the United

01:19:51 States should not give that up. But we

01:19:53 have a lot of Nvidia developers in the

01:19:55 US and that doesn't prevent American

01:19:56 labs from also being able to use other

01:19:58 accelerators in the future in in fact

01:20:00 right now they're using other

01:20:00 accelerators as well which is fine and

01:20:02 great. I don't I don't see why that

01:20:04 wouldn't be the case in China as well if

01:20:05 you sell them Nvidia chips just the same

01:20:06 way that Google can use TPUs and Nvidia.

01:20:09 >> We have to keep innovating and you know

01:20:11 as you as you probably know our share is

01:20:14 growing not decreasing. the premise that

01:20:18 even if we competed in China that we're

01:20:20 going to lose that market anyways.

01:20:25 I don't you're not talking to somebody

01:20:27 who woke up a loser. And that loser

01:20:30 attitude, that loser premise makes no

01:20:33 sense to me. We are not we're not a car.

01:20:37 We are not a car. it. The fact that I

01:20:41 can buy a car, this car brand one day

01:20:43 and use another car brand another day.

01:20:46 Easy. Computing is not like that.

01:20:49 There's a reason why the x86 still

01:20:51 exists. There's a reason why ARM is so

01:20:52 sticky. These ecosystems, these

01:20:55 ecosystems are hard to replace. It costs

01:20:58 an enormous amount of time and energy

01:20:59 and most people don't want to do it. And

01:21:01 so it's it's our job to continue to

01:21:04 nurture that ecosystem to keep advancing

01:21:07 the technology so that we could compete

01:21:09 in the marketplace. Conceding a

01:21:11 marketplace based on the premise you

01:21:13 described, I simply can't acknowledge

01:21:15 that. It makes no sense because I don't

01:21:18 think the United States is a loser. You

01:21:21 our industry is now a loser. And that

01:21:24 that losing proposition, that losing

01:21:26 mindset makes no sense to me.

01:21:28 >> Okay, I'll move on. I just I just want

01:21:30 to make sure

01:21:30 >> you don't have to move on. I'm enjoying

01:21:32 it.

01:21:32 >> Okay, great. Then then I um I appreciate

01:21:36 that. Um

01:21:37 >> but I think the maybe the crux and

01:21:39 thanks for walking around the circles

01:21:41 with me because then I think it helps

01:21:42 bring out what the crux here is.

01:21:43 >> The crux is you're going to extremes.

01:21:45 Your argument starts from extremes that

01:21:48 if we give them any compute at all in

01:21:51 this narrow moment, we will lose

01:21:54 everything.

01:21:54 >> No, I think what my argument is

01:21:56 >> those extremes they're They're childish.

01:22:00 Yeah.

01:22:00 >> The idea is not that there is some key

01:22:04 threshold of compute is that any

01:22:06 marginal compute is helpful, right? So

01:22:08 if you have more compute, you can train

01:22:10 a better model.

01:22:10 >> And I just want you to acknowledge that

01:22:12 any marginal sales for American

01:22:14 technology industry is bene is

01:22:16 beneficial.

01:22:17 >> I actually don't I mean if the AI models

01:22:20 that run on those chips

01:22:21 >> Yeah.

01:22:21 >> are capable of cyber offensive

01:22:22 capabilities or training models are

01:22:24 capable of cyber defense is running more

01:22:26 models at those instance. It is not a

01:22:28 nuclear weapon, but it is it enables a

01:22:30 weapon of a kind.

01:22:31 >> The the the logic that you use, you

01:22:32 might as well say it to microprocessors

01:22:34 and DRAMs. You might as well say it to

01:22:36 electricity.

01:22:37 >> But in fact, we do have export controls

01:22:39 on the technology that is relevant to

01:22:40 making the most advanced DRM, right? We

01:22:42 have all kinds of export controls on

01:22:43 China for all kinds of shipping.

01:22:45 >> We we sell a lot of DRM and CPUs into

01:22:47 China. And I think it's right.

01:22:50 >> I guess this goes back to the

01:22:52 fundamental question of is AI different,

01:22:54 right? If you have the kind of

01:22:55 technology that can find these zero days

01:22:57 in software, is that something where we

01:23:01 want to minimize China's ability to get

01:23:03 their first place to be ahead?

01:23:07 >> We can control that.

01:23:08 >> How do we control that if the chips are

01:23:09 already there and they're using that to

01:23:10 train that model?

01:23:11 >> We have tons of compute. We have tons of

01:23:13 AI researchers. We're racing as fast as

01:23:15 we can.

01:23:16 >> Again, we have more nuclear weapons than

01:23:18 anybody else, but we don't want to send

01:23:19 enriched uranium anywhere.

01:23:20 >> We're not enriched uranium.

01:23:23 It's a chip and it's a chip that they

01:23:26 can make themselves.

01:23:28 >> But there's a reason they're buying it

01:23:29 from you, right? And we have quotes from

01:23:31 the founders of Chinese companies that

01:23:32 say that we're bottling that technology

01:23:33 >> because our chips are better. On

01:23:35 balance, our chips are better. There's

01:23:36 just no question about it. In the

01:23:38 absence of our chip, in the absence of

01:23:40 our chip, can you acknowledge that

01:23:41 Huawei had a record year? Can you

01:23:42 acknowledge that a whole bunch of chip

01:23:43 companies have gone public? Can you

01:23:45 acknowledge that?

01:23:46 >> Can you acknowledge that? Can you can

01:23:48 also acknowledge that the fact that we

01:23:50 used to have a very large share in that

01:23:51 market and we no longer have the large

01:23:53 share in that market. We can also

01:23:55 acknowledge that China is about 40% of

01:23:58 the world's technology industry. That

01:24:00 market to leave to leave that market

01:24:03 concede that market for United States

01:24:04 technology industry is a disservice to

01:24:07 our country. It is a disservice to our

01:24:09 national security. It is a disservice to

01:24:11 our to our technology leadership. All

01:24:13 for the benefit all for the benefit of

01:24:15 one company. It makes no sense to me. I

01:24:17 guess I'm confused of it feels like

01:24:18 you're making two different statements.

01:24:19 One is that we're going to win this

01:24:21 competition with Huawei because our

01:24:22 chips are going to be way better if

01:24:23 we're allowed to compete. And another is

01:24:25 that they would be doing the same exact

01:24:26 thing without us anyways. Right? How can

01:24:28 those two things be the same true at the

01:24:29 same time?

01:24:30 >> It's obviously true. In the absence of a

01:24:34 better choice, you'll take the only

01:24:35 choice you have. How is that illogical?

01:24:38 It's so logical.

01:24:39 >> The reason they want Nvidia chips is

01:24:40 they're better. Better is more compute.

01:24:42 More comput means you can train a better

01:24:43 model.

01:24:44 >> It's better. It's better because it's

01:24:45 easier to program. It's e we have a

01:24:47 better ecosystem. Whatever the better

01:24:49 is. Whatever the better is. And of

01:24:52 course we're going to send them compute.

01:24:53 So what? So what the fact of the matter

01:24:57 is we get the benefit. Don't forget we

01:25:00 get the benefit of American technology

01:25:02 leadership. We get the benefit of

01:25:04 developers working on the American tech

01:25:06 stack. We get the benefit as those AI

01:25:08 models diffuse out into the rest of the

01:25:11 world. The American tech stack is

01:25:13 therefore the best for it. We can

01:25:15 continue to advance and diffuse American

01:25:17 technology that I believe is a positive.

01:25:21 It's a very important part of American

01:25:23 technology leadership. Now the policy

01:25:26 that you're advocating resulted in the

01:25:28 American telecommunication industry

01:25:30 being policied out of basically the

01:25:33 world to the point where we don't

01:25:35 control our own telecommunications

01:25:36 anymore. I don't see that as smart.

01:25:40 It's a little narrow-minded and it led

01:25:42 to un unintended consequences that I'm

01:25:44 describing to you right now that you

01:25:46 seem you seem to have a very hard time

01:25:47 understanding.

01:25:48 >> Okay, let let's just step back. It it

01:25:51 seems like the crux here is

01:25:52 >> there's a potential benefit and there's

01:25:54 a potential cost and we're desri we're

01:25:56 trying to figure out is the benefit

01:25:57 worth the cost. I guess I'm trying to

01:25:59 get you to acknowledge the potential

01:26:01 cost that compute is an input to

01:26:03 training powerful models. powerful

01:26:05 models do have powerful, you know,

01:26:07 offensive capabilities like cyber

01:26:09 attacks. It is a good thing that

01:26:10 American companies got to claim mythos

01:26:12 level capabilities first and then now

01:26:14 they're going to hold off on those

01:26:15 capabilities so that the American

01:26:16 companies and American government can

01:26:18 make their software more protected

01:26:20 before this level cap announced if China

01:26:23 had had more computer had more power

01:26:24 comput if we could have had made a

01:26:26 mythos level model earlier and deployed

01:26:28 it widely that would have been very bad.

01:26:31 One of the reasons that hasn't happened

01:26:32 is that we have more compute thanks to

01:26:34 companies like Nvidia in America. Um

01:26:36 that is a cost of sending to China. And

01:26:40 so let's leave the benefit aside for a

01:26:42 second. Do you acknowledge that this is

01:26:43 a potential cost?

01:26:45 I will also tell you the potential cost

01:26:48 is we allow one of the most important

01:26:51 layers of the AI stack, the chip layer

01:26:55 to concede an entire market, the second

01:26:58 largest in second largest market in the

01:27:00 world so that they could develop scale

01:27:03 so that they could develop their own

01:27:04 ecosystem so that future AI models are

01:27:08 optimized in a very different way than

01:27:11 the American tech stack. As AI diffuses

01:27:14 out into the rest of the world,

01:27:17 their standards, their tech stack will

01:27:21 become superior to ours because their

01:27:23 models are open. I

01:27:24 >> I guess I just believe enough in

01:27:26 Nvidia's kernel engineers and CUDA

01:27:28 engineers to think that they could

01:27:29 optimize.

01:27:29 >> AI is more than kernel optimization as

01:27:31 you know,

01:27:31 >> of course, but there's so many things

01:27:33 you can do from distilling to a model

01:27:35 that's well fit for your chips.

01:27:36 >> We're going to do our best.

01:27:37 >> You have all this software. I just hard

01:27:39 to imagine that there's a long-term lock

01:27:40 in to Chinese ecosystem. They have this

01:27:42 like slightly better open source model

01:27:43 for a while.

01:27:44 >> China is the largest contributor to open

01:27:46 source software in the world. Fact,

01:27:51 right? China is the largest contributor

01:27:54 to open models in the world. Fact.

01:27:57 Today it's built on the American tech

01:27:59 stack and

01:28:01 fact. All five layers of the tech stack

01:28:05 for AI is important. United States ought

01:28:07 to go win all five of them. They're all

01:28:10 important.

01:28:12 The one that is the most important of

01:28:14 course is the AI application layer. The

01:28:18 layer that diffuses into society, the

01:28:21 one that uses it most will benefit from

01:28:23 this industrial revolution most.

01:28:27 But my point is that every a every layer

01:28:29 has to succeed.

01:28:31 If we if we scare this country into

01:28:34 thinking that AI is

01:28:37 somehow a nuclear bomb

01:28:40 so that everybody hates AI and

01:28:43 everybody's afraid of AI,

01:28:45 I don't know how you're helping the

01:28:48 United States, you're doing a

01:28:49 disservice. If we scare everybody out of

01:28:52 doing software engineering jobs because

01:28:54 it's going to kill every software

01:28:55 engineering job and we don't have any

01:28:57 software engineers as a result of that,

01:28:59 we're doing a disservice to United

01:29:00 States.

01:29:01 If we scare everybody out of radiology,

01:29:03 so nobody wants to be a radiologist

01:29:05 because computer vision is completely

01:29:06 free and no AI is going to do a worse

01:29:09 job than a radiologist. And we we

01:29:11 misunderstand the difference between a

01:29:13 job and the task the job of a

01:29:15 radiologist patient care task to read a

01:29:18 scan. If we misunderstand that so

01:29:20 profoundly and we scare everybody out of

01:29:24 going to radiology school, we're not

01:29:26 going to have enough radiologists and

01:29:27 good enough healthcare. And so I

01:29:31 I'm making the case

01:29:34 that when you make these make a premise

01:29:38 that is so extreme, everything goes from

01:29:41 zero or infinity.

01:29:44 We end up scaring people in a way that's

01:29:47 just not true. Life is not like that.

01:29:50 Do I do we want United States to be

01:29:52 first? Of course we do.

01:29:54 Do we need do we do we need to be uh a

01:29:58 leader in every layer of that stack?

01:30:01 Of course we do. Of course we do. Is

01:30:05 today you're talking about mythos

01:30:07 because mythos is important. Sure.

01:30:09 That's fantastic. But in a few years

01:30:11 time, I'm making you the prediction that

01:30:14 when we want the American tech stack,

01:30:16 when we want American technology to be

01:30:18 diffused around the world, out to India,

01:30:21 out to the Middle East, out out to to

01:30:23 Africa, out to Southeast Asia, when our

01:30:27 country would like to export because we

01:30:29 would like to export our technology, we

01:30:32 would like to export our standards. On

01:30:34 that day, I want you and I to have that

01:30:36 same conversation again. And I will tell

01:30:39 you exactly about today's conversation

01:30:41 about how your policy and how what you

01:30:43 imagined

01:30:45 literally cause the United States to

01:30:46 concede the second largest market in the

01:30:48 world for no good reason at all. We

01:30:52 shouldn't concede it. If we lose it, we

01:30:55 lose it. But why do we concede it? Now,

01:30:58 nobody is advocating Nobody is

01:31:00 advocating an all or nothing. Nobody's

01:31:03 advocating all or nothing, meaning we

01:31:05 ship everything to China at all times.

01:31:07 Nobody's advocating that we should

01:31:10 always have the best technology here. We

01:31:12 should always have the most technology

01:31:14 here and the first.

01:31:16 But we should also try to compete and

01:31:20 win around the world. Both of those

01:31:23 things can simultaneously happen. It

01:31:26 requires some amount of nuance, some

01:31:28 amount of maturity instead of absolutes.

01:31:32 The world is just not absolutes.

01:31:34 >> Okay. the the argument hinges on they've

01:31:37 built a they've built models that are

01:31:39 specified for their architect their the

01:31:41 best chips that they make in a few years

01:31:42 and those chips get exported around the

01:31:44 world that sets a standard um because of

01:31:47 EUV

01:31:48 um export controls as we said you're

01:31:50 going to move on to 1.6 6 nometer

01:31:52 there's still going to be on 7 nometer

01:31:53 even after a few years from now and it

01:31:55 might make sense that domestically they

01:31:56 would prefer hey we got so much energy

01:31:58 we can manufacture sets scale we'll

01:31:59 still keep using 7 nmter but the

01:32:01 exporting thing their 7 nanometer chips

01:32:04 have to be competitive against your 1.6

01:32:07 nmter chips and their models have to be

01:32:10 so far optimized for the 7 nometer it's

01:32:11 better to run their models on 7

01:32:12 nanometer than to run their models on

01:32:15 your 1.6 6 nometer.

01:32:16 >> Can we can we just look at the facts

01:32:18 then? Okay. Is Blackwell 50 times more

01:32:23 advanced lithography than Hopper? Is it

01:32:26 50 times?

01:32:28 Not even close.

01:32:30 I just kept saying it over and over

01:32:32 again. Moore's law is dead. Between

01:32:34 Hopper and Blackwell from the

01:32:36 transistors themselves, call it 75%. It

01:32:40 was 3 years apart.

01:32:43 75%.

01:32:45 Blackwell is 50 times

01:32:48 hopper.

01:32:49 My point is architecture matters.

01:32:54 Computer science matters. Semiconductor

01:32:56 physics matter as well. But computer

01:32:59 science matters.

01:33:00 AI the impact of AI largely comes from

01:33:05 the computing stack which is the reason

01:33:07 why CUDA is so effective which is the

01:33:08 reason why CUDA is so so so beloved.

01:33:12 It's it's a ecosystem a computing

01:33:14 architecture that allows for so much

01:33:16 flexibility that if you wanted to change

01:33:18 an architecture completely create

01:33:20 something like create something like

01:33:23 diffusion create something you know

01:33:26 that's disagregated you could do you

01:33:27 could do so it's easy to do and so the

01:33:31 fact of the matter is AI is about the

01:33:33 stack above as much as it is about the

01:33:36 architecture below to the extent that

01:33:38 that we have architectures and software

01:33:41 stacks that optimized for our stack, for

01:33:43 our ecosystem. It is obviously good

01:33:46 because we started the conversation

01:33:48 today about how Nvidia's ecosystem is so

01:33:50 rich, why people always love programming

01:33:52 on CUDA first. They do. They do and so

01:33:56 do the researchers in China. But if we

01:33:59 are forced to leave China, if we're

01:34:01 forced to leave China, it would be it

01:34:04 would be well, first of all, it would

01:34:06 it's a policy mistake. obviously has

01:34:08 backlash has has backlash. Obviously, it

01:34:12 has fired, you know, has has uh uh has

01:34:16 turned out badly for for the United

01:34:18 States. It enabled it accelerated their

01:34:21 chip industry. It forced all of their AI

01:34:24 ecosystem to focus on their internal

01:34:26 architectures. It's not too late, but

01:34:29 nonetheless,

01:34:30 it has already happened.

01:34:33 You're going to see in the future

01:34:35 they're not stuck at 7 nanometer.

01:34:37 Obviously they're good at manufacturing.

01:34:39 They will continue to advance from seven

01:34:42 and beyond. Now

01:34:45 is there 10x difference between 5nanmter

01:34:50 and 7 nanometer? The answer is no.

01:34:53 Architecture matters. Networking

01:34:55 matters. That's why Nvidia bought

01:34:57 Melanox. Networking matters. Energy

01:34:59 matters. And so all that stuff matters.

01:35:01 It's not it's not simplistic like the

01:35:04 way you're trying to distill it.

01:35:06 >> Uh we can move on from China, but that

01:35:07 actually raises an interesting question

01:35:09 about um we were discussing earlier

01:35:11 these bottlenecks at TSMC and memory and

01:35:14 so forth. And so if we're in this world

01:35:17 where you know you're already the

01:35:18 majority of N3 at some point you'll be

01:35:21 N2, you'll be a majority of that. Do you

01:35:24 see that you could go back to N7 this

01:35:27 spare capacity at an older process node

01:35:28 and say hey the demand for AI is so

01:35:31 great and our capacity to expand the

01:35:33 leading edge is not meeting it so we're

01:35:36 going to make a hopper or ampier about

01:35:38 everything we know about a numeric today

01:35:40 and all the other improvements you

01:35:41 described do you see that world

01:35:42 happening within before 2030

01:35:45 >> it's not necessary to and the reason for

01:35:47 that is because with every every

01:35:50 generation the architecture

01:35:53 the architecture um is more than just is

01:35:58 more than just uh the transistor scale.

01:36:02 It also you're doing so much engineering

01:36:04 and packaging and stacking and and the

01:36:07 numeric and you know the system

01:36:09 architecture

01:36:11 um

01:36:13 when you run out of capacity

01:36:16 to easily go back to another node that's

01:36:18 a level of R R&D that that no one no one

01:36:22 could afford. You know we we could

01:36:23 afford to lean forward. I don't think we

01:36:25 could afford to go back. Now, if the

01:36:27 world simply says, if on that day, if on

01:36:30 that day, let's do the thought

01:36:31 experiment. On that day, we go, listen,

01:36:33 we're just never going to have more

01:36:34 capacity ever again, would I go back and

01:36:37 use seven in a heartbeat?

01:36:39 >> Yeah, of course I would.

01:36:40 >> Um,

01:36:42 one question somebody I was talking to

01:36:43 had is why Nvidia doesn't run multiple

01:36:46 different chip projects at the same time

01:36:48 with totally different architectures. So

01:36:50 you could do like a cerebra style

01:36:52 >> wafer scale. You could do a dojo style

01:36:54 huge package. You could do one without

01:36:55 CUDA, you know. Um you have the

01:36:57 resources and the engineering talent

01:36:59 >> to do all these in parallel. So why put

01:37:02 all the eggs in one basket given who

01:37:03 knows where AI might go and

01:37:04 architectures might go.

01:37:06 >> Oh, we could. It's just that that we

01:37:08 don't have a better idea.

01:37:10 >> Yeah. Yeah, we we could do all of those

01:37:12 things. Um

01:37:14 it's just not better. And we simulate it

01:37:17 all. they're in our simulator provably

01:37:19 worse

01:37:21 and so we wouldn't do it.

01:37:23 Yeah, we're we're doing we're working on

01:37:26 exactly the projects that we want to

01:37:27 work on. And and um I

01:37:32 if the workload were to change

01:37:34 dramatically

01:37:36 um and I don't mean I don't mean the

01:37:37 algorithms, I actually mean the

01:37:39 workload. The um and that that depends

01:37:42 on the shape of the market.

01:37:46 um uh we may decide to add other

01:37:48 accelerators like for example recently

01:37:50 we added uh Grock um and we're going to

01:37:53 fold Grock into our CUDA ecosystem

01:37:56 and and um uh we do we're we're doing

01:38:00 that now because the value of tokens

01:38:04 um have gone up so high that that you

01:38:07 could have different pricing of tokens.

01:38:09 Back in the old days in the, you know,

01:38:10 just a couple years ago, tokens are

01:38:12 either free or barely, you know, barely

01:38:14 expensive, right? And so, but now you

01:38:16 can have different customers and those

01:38:18 customers want different answers. And

01:38:20 so, because the customers make so much

01:38:22 money, like for example, our software

01:38:24 engineers, if I can give them much more

01:38:28 um responsive tokens so that they're

01:38:31 even more productive than they are

01:38:32 today, I would pay for it.

01:38:35 >> But that market has only recently

01:38:36 emerged. And so I think that we now have

01:38:40 we now have the ability to have the same

01:38:42 model based on the response time have

01:38:45 different segments and that's the reason

01:38:47 why we decided to expand the paro

01:38:50 frontier and and create a segment of

01:38:54 inference that is faster response time

01:38:57 even though it's lower lower throughput

01:38:59 at the mo until now higher throughput is

01:39:02 always better. Um we we think that there

01:39:04 there could be a world where there could

01:39:06 be very high ASP tokens and and um even

01:39:11 though the even though the throughput is

01:39:12 lower in the factory the ASPs make up

01:39:15 for it.

01:39:16 >> Yeah. That's the reason why we did it.

01:39:17 But otherwise from an architecture

01:39:19 perspective um I I think Nvidia's

01:39:21 architecture is I would I would rather

01:39:23 put if I if I have more money I put more

01:39:26 behind the architecture. M I I think

01:39:28 this idea of extremely premium tokens

01:39:30 and just the disagregation of the

01:39:32 inference market is very interesting.

01:39:34 >> The segmentation y final question um

01:39:39 supposed deep learning if revolution

01:39:40 didn't happen. Um what would Nvidia be

01:39:44 doing? Obviously games but given

01:39:48 >> accelerated computing

01:39:50 >> accelerated computing the same thing

01:39:52 we've been doing all along. I the the

01:39:55 premise of our company is that Moors law

01:39:57 Moore's law is going to more general

01:39:59 purpose computing is good for a lot of

01:40:01 things but for a lot of computation is

01:40:03 not ideal and so we combined an

01:40:07 architecture called a GPU CUDA to a CPU

01:40:11 so that we can accelerate the workload

01:40:13 of the CPU and so different different

01:40:16 kernels of code or algorithms could be

01:40:18 offloaded onto our GPU and as a result

01:40:21 you speed up an an application by you

01:40:23 you know 100x 200x and where can you use

01:40:26 that? Um well obviously engineering and

01:40:28 science and physics and you know so on

01:40:30 so data processing um uh computer

01:40:34 graphics image generation I mean all

01:40:36 kinds of things even if AI doesn't exist

01:40:38 today Nvidia will be very very large

01:40:40 yeah and so so I think the the reason

01:40:43 for that is is fairly fundamental which

01:40:45 is which is the ability for general

01:40:47 purpose computing to continue to scale

01:40:50 has largely run its course and the only

01:40:53 the the not the only way but the the way

01:40:54 to do that is through domain specific

01:40:57 acceleration and one of the domain that

01:41:00 we started with was computer graphics

01:41:03 but many there are many many other

01:41:05 domains I mean there's you know you know

01:41:07 all kinds of uh scient particle physics

01:41:10 and fluids and you know and and so

01:41:13 structured data processing all kinds of

01:41:14 different types of of algorithms that

01:41:16 benefit from CUDA and so our our mission

01:41:20 was uh really to bring accelerated

01:41:23 computing to the world and advance the

01:41:25 type of applications that general

01:41:27 purpose computing can't do and scale to

01:41:29 the level of of uh capability that helps

01:41:32 break through certain fields of science.

01:41:35 And and so some of the early

01:41:37 applications were uh molecular dynamics,

01:41:40 uh seismic processing for energy

01:41:42 discovery,

01:41:43 um uh image processing of course, uh and

01:41:46 so all of those kind of fields where

01:41:48 where general purpose computing is just

01:41:50 simply too inefficient to do so. And so

01:41:53 yeah, if if there was no AI, I would be

01:41:55 very sad. Um, but because of because of

01:42:00 of the advances that we made in

01:42:03 computing, we democratized deep

01:42:05 learning. We made it possible for any

01:42:08 researcher, any scientist anywhere, any

01:42:10 student to be able to access a PC or,

01:42:13 you know, a a GeForce adding card and

01:42:16 and uh do amazing science. And um uh

01:42:20 that that fundamental promise uh hasn't

01:42:23 changed, not even a little bit. And so

01:42:25 if you see GT if you watch GTC, there's

01:42:28 the whole beginning part of it, none of

01:42:30 it's AI. That whole part of it with with

01:42:33 uh computational lithography or or uh

01:42:37 our quantum chemistry work or you know

01:42:39 uh all of that stuff, data processing

01:42:41 work, all of that stuff is is uh

01:42:45 unrelated to AI and and and it's still

01:42:48 very important. I mean there's, you

01:42:49 know, I I know that that AI is is very

01:42:51 interesting and and quite exciting. Um

01:42:54 but but um there's a lot of people doing

01:42:57 a lot of very important work that's not

01:42:59 not AI related and tensors is not the

01:43:01 only way that you compute with

01:43:03 >> and um I and we want to help everybody.

01:43:06 >> It doesn't. Thank you so much.

01:43:08 >> You're welcome. I enjoyed it. Me too.

01:43:10 Sweet.