YouTube Research · April 19, 2026

Frontier AI and the Future of Intelligence — Raia Hadsell, Google DeepMind

AI Engineer ·April 18, 2026 ·24:38 ·Watch on YouTube →

TL;DR

Hadsell, who now leads ~1,200 scientists across ten DeepMind labs¹¹, uses the talk to show that the frontier of AI is no longer just language models.
She picks three non-LLM bets — omnimodal embeddings, weather forecasting, and playable world models — to argue where the next compounding gains will come from.
Embedding models, inspired by the brain's "Jennifer Aniston cell", give a single vector for text, video, audio and PDFs, with Matryoshka nesting so you can trade cost for expressiveness.
GraphCast, GenCast and the new cyclone-first FGN have moved AI weather models from research curiosity to operational use — the US National Hurricane Center is already running FGN live.
Genie 3 is a real-time, playable, memory-consistent world model; Hadsell ends with a pitch that adversarially prompting someone else's world is a new genre of gaming and a tool for education.

Three things that aren't language models[01:19]

Hadsell opens with a decades-at-a-glance biography — 90s philosophy of religion undergrad, a 2000s pivot into computer science and a PhD on convolutional and Siamese networks with Yann LeCun², then joining DeepMind in the early 2010s when it was "a small group of 30, 40 people" working on Atari, Go and StarCraft.¹¹ She now runs roughly 1,200 researchers and engineers across ten labs and is the UK's AI ambassador.

The organising idea of the talk is deliberately contrarian for a 2026 audience: "the theme of this talk overall is things that are not directly language models." She picks three — embeddings, weather, world models — to show that DeepMind's frontier work still reaches well beyond the chatbot stack.

The Jennifer Aniston cell[05:05]

To motivate embeddings, Hadsell borrows from neuroscience. In the human medial temporal lobe, a small group of neurons will activate for a specific concept — a particular person or place — regardless of modality: a written name, a photograph, a voice clip all light up the same cells.¹ The brain uses this invariance for fast retrieval, recognition and comparison.

Embedding models are the artificial analogue: a network trained with a contrastive loss to place the same concept at the same point in a vector space, no matter the input modality. Hadsell notes this is a direct descendant of the Siamese-network work she did under LeCun in the 2000s² — "sometimes we want to generate, sometimes we want to retrieve," and the retriever half has quietly become as important as the generator.

Gemini Embedding 2 and the Matryoshka trick[07:32]

Google's new Gemini Embedding 2 is the practical payoff.⁴ It is, in her framing, "fully omnimodal": a single vector that represents text up to 8K tokens, 128 seconds of video, 80 seconds of audio, or a full PDF. Because it's derived from Gemini itself, it inherits the base model's world knowledge rather than stapling together modality-specific encoders end-to-end.

The other half of the story is Matryoshka Representation Learning³: the same network produces nested embeddings — start a retrieval with 256 dimensions for cheap approximate recall, then widen out to more dimensions only when you need the precision. One model, many operating points.

Learning the weather from 40 years of data[09:52]

The second "not a language model" story starts with a question from the UK Met Office: can neural networks beat physics simulations at rainfall?¹² The answer, given forty years of global reanalysis data, was yes — and the team has iterated three times since.

GraphCast runs a spherical graph neural network over an icosahedral mesh that wraps the Earth, auto-regressively predicting roughly one hundred atmospheric variables — wind, temperature, humidity — up to fifteen days out.⁵ Hadsell uses Hurricane Lee as the canonical demo: GraphCast nailed the Nova Scotia landfall nine days in advance, versus six days for the gold-standard physics models — three days that matter when you're evacuating a coast.⁸

GenCast went further by making the model probabilistic, which matters because "the weather is fundamentally chaotic" and a point forecast throws away the tails you actually want to plan for. Against 1,320 gold-standard benchmarks GenCast is more accurate 97% of the time, and generates a 15-day forecast in eight minutes on a single chip instead of hours on a supercomputer.⁶

The current frontier is FGN — a Functional Generative Network that skips the "predict weather, then detect cyclones" pipeline and instead learns cyclone category, trajectory, wind speed and eye formation directly end-to-end. It's already running operationally at the US National Hurricane Center.⁷

From Genie 1 to Genie 3[14:33]

The final act is world models, which Hadsell traces back to DeepMind's long line of Atari, Go, StarCraft and MuJoCo work. The shift: stop training just the agent, start generating the environment itself.

Genie 1 produced a few seconds of a 2D platformer that responded to left/right input. Genie 2 lifted that into slow, interactive 3D. And in parallel, Veo 3 showed that photoreal video generation was solved — but non-interactive and not real-time.¹⁰ Genie 3 unifies the two: a playable, real-time, high-quality 3D world you can steer through a text prompt.⁹ Hadsell's live demo didn't cooperate with the conference Wi-Fi, but she described walking a muddy lane in Kent, skiing, and inhabiting an origami lizard world.

Her closing pitch is about what this unlocks. "Adversarially prompting your experience of a world" is a new genre of gaming — inject surprises into a friend's scene in real time — and the same capability reframes education as something you can step inside of rather than read about.

Annotations & Sources

1 Quiroga, Reddy, Kreiman, Koch & Fried (Nature 435, 2005) recorded from medial temporal lobe neurons in epilepsy patients and found individual cells that fired selectively to many different images of the same person — most famously Jennifer Aniston — and even to their written name, evidence for an abstract, cross-modal representation. source →
2 Hadsell completed her PhD under Yann LeCun at NYU on Siamese-net embeddings and deep learning for off-road robots — work that underpins modern contrastive/triplet losses. In November 2025 she was appointed an AI Ambassador to the UK's Department for Science, Innovation and Technology. source →
3 Kusupati et al. (NeurIPS 2022) train a single embedding whose nested prefixes are each independently usable, yielding up to 14× smaller vectors at matching ImageNet-1K accuracy and real-world retrieval speedups with no extra inference cost. source →
4 Announced in March 2026, Gemini Embedding 2 is Google's first natively multimodal embedding model, mapping text, images, video (up to 128s), audio (up to 80s) and documents into a single vector space via the Gemini API and Vertex AI. source →
5 Lam et al. (Science, 2023) present GraphCast, a graph neural network on a multi-scale icosahedral mesh that issues 10-day 0.25° forecasts of 227 variables in under a minute and beats ECMWF's HRES on more than 90% of 1,380 targets. source →
6 Price et al. (Nature, Dec 2024) introduce GenCast, a diffusion-based ensemble model on a 0.25° sphere that generates a stochastic 15-day forecast in ~8 minutes on a TPUv5 and outperforms ECMWF's ENS on 97.2% of 1,320 target/lead-time combinations. source →
7 In June 2025 DeepMind announced a partnership with the U.S. National Hurricane Center to incorporate its experimental Functional Generative Network cyclone model — which produces 50-member 15-day track-and-intensity ensembles and beats ENS's 5-day track error by ~140 km — into NHC's live forecasting workflow. source →
8 NOAA's official Tropical Cyclone Report (AL132023) confirms Hurricane Lee was a 2023 Atlantic storm — it peaked as a Category 5 on 7 Sept 2023 and made landfall as a post-tropical cyclone on western Nova Scotia on 16 Sept 2023. (Hadsell said "late 2024" in the talk; the event was actually September 2023.) source →
9 Genie 3 generates promptable, navigable 720p worlds at 24 fps from a text prompt and maintains multi-minute consistency; a prototype built on it has been rolled out to Google AI Ultra subscribers for text- and image-prompted world creation. source →
10 Announced at Google I/O in May 2025, Veo 3 is DeepMind's state-of-the-art generative video model with native synchronized audio, improved real-world physics and prompt adherence, and SynthID watermarking on all outputs. source →
11 DeepMind Technologies was founded in London in September 2010 by Demis Hassabis, Shane Legg and Mustafa Suleyman, and acquired by Google on 26 January 2014 for a reported $400–650M, with a binding ethics board and a prohibition on military use. source →
12 Ravuri et al. (Nature, 30 Sept 2021) introduce a deep generative model for 5–90-minute radar precipitation nowcasting; in blinded evaluation by more than fifty Met Office forecasters it was ranked best for accuracy and usefulness in 88% of cases versus leading baselines. source →