Podcast Episodes

Back to Search

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Episode 1352

🤗 Upvotes: 39 | cs.CV

Authors:
Guozhen Zhang, Zixiang Zhou, Teng Hu, Ziqiao Peng, Youliang Zhang, Yi Chen, Yuan …

7 months, 4 weeks ago

Short Long

View Episode

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Episode 1351

🤗 Upvotes: 71 | cs.LG, cs.AI, cs.RO

Authors:
Nikita Kachaev, Mikhail Kolosov, Daniil Zelezetsky, Alexey K. Koval…

7 months, 4 weeks ago

Short Long

View Episode

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

Episode 1350

🤗 Upvotes: 65 | cs.CV, cs.CL

Authors:
Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran, Dantong Zhu, Dongxing Mao, Li…

7 months, 4 weeks ago

Short Long

View Episode

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Episode 1349

🤗 Upvotes: 42 | cs.CV

Authors:
Yiyang Zhou, Haoqin Tu, Zijun Wang, Zeyu Wang, Niklas Muennighoff, Fan Nie, Yejin…

7 months, 4 weeks ago

Short Long

View Episode

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Episode 1348

🤗 Upvotes: 61 | cs.CL, cs.AI

Authors:
Ling-Team, Ang Li, Ben Liu, Binbin Hu, Bing Li, Bingwei Zeng, Borui Ye, Ca…

8 months ago

Short Long

View Episode

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

Episode 1347

🤗 Upvotes: 34 | cs.LG, cs.AI, cs.CL, I.2.7

Authors:
Fali Wang, Jihai Chen, Shuhua Yang, Runxue Bao, Tianxiang Zh…

8 months ago

Short Long

View Episode

The Underappreciated Power of Vision Models for Graph Structural Understanding

Episode 1346

🤗 Upvotes: 31 | cs.CV, cs.AI, cs.LG

Authors:
Xinjian Zhao, Wei Pang, Zhongkai Xue, Xiangru Jian, Lei Zhang, Yaoy…

8 months ago

Short Long

View Episode

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Episode 1345

🤗 Upvotes: 27 | cs.CV

Authors:
Ropeway Liu, Hangjie Yuan, Bo Dong, Jiazheng Xing, Jinwang Wang, Rui Zhao, Yan Xi…

8 months ago

Short Long

View Episode

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Episode 1344

🤗 Upvotes: 24 | cs.CV

Authors:
Yongyuan Liang, Wei Chow, Feng Li, Ziqiao Ma, Xiyao Wang, Jiageng Mao, Jiuhai Che…

8 months ago

Short Long

View Episode

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Episode 1343

🤗 Upvotes: 23 | cs.RO

Authors:
Kyungmin Lee, Sibeen Kim, Minho Park, Hyunseung Kim, Dongyoon Hwang, Hojoon Lee, …

8 months ago

Short Long

View Episode

Podcast Episodes

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph

The Underappreciated Power of Vision Models for Graph Structural Understanding

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Love PodBriefly?