Episode Details

Back to Episodes
Generative Video WorldSim, Diffusion, Vision, Reinforcement Learning and Robotics — ICML 2024 Part 1

Generative Video WorldSim, Diffusion, Vision, Reinforcement Learning and Robotics — ICML 2024 Part 1

Published 1 year, 3 months ago
Description

Regular tickets are now sold out for Latent Space LIVE! at NeurIPS! We have just announced our last speaker and newest track, friend of the pod Nathan Lambert who will be recapping 2024 in Reasoning Models like o1! We opened up a handful of late bird tickets for those who are deciding now — use code DISCORDGANG if you need it. See you in Vancouver!

We’ve been sitting on our ICML recordings for a while (from today’s first-ever SOLO guest cohost, Brittany Walker), and in light of Sora Turbo’s launch (blogpost, tutorials) today, we figured it would be a good time to drop part one which had been gearing up to be a deep dive into the state of generative video worldsim, with a seamless transition to vision (the opposite modality), and finally robots (their ultimate application).

Sora, Genie, and the field of Generative Video World Simulators

Bill Peebles, author of Diffusion Transformers, gave his most recent Sora talk at ICML, which begins our episode:

* William (Bill) Peebles - SORA (slides)

Something that is often asked about Sora is how much inductive biases were introduced to achieve these results. Bill references the same principles brought by Hyung Won Chung from the o1 team - “sooner or later those biases come back to bite you”.

We also recommend these reads from throughout 2024 on Sora.

* Lilian Weng’s literature review of Video Diffusion Models

* Sora API leak

* Estimates of 100k-700k H100s needed to serve Sora (not Turbo)

* Artist guides on using Sora for professional storytelling

Google DeepMind had a remarkably strong presence at ICML on Video Generation Models, winning TWO Best Paper awards for:

* Genie: Generative Interactive Environments (covered in oral, poster, and workshop)

* VideoPoet: A Large Language Model for Zero-Shot Video Generation (see website)

We end this part by taking in Tali Dekel’s talk on The Future of Video Generation: Beyond Data and Scale.

Part 2: Generative Modeling and Diffusion

Since 2023, Sander Dieleman’s perspectives (blogpost,

Listen Now