Podcast Episodes
Back to SearchAutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
Episode 1816
🤗 Upvotes: 26 | cs.AI
Authors:
Lei Xiong, Kun Luo, Ziyi Xia, Wenbo Zhang, Jin-Ge Yao, Zheng Liu, Jingying Shao, …
2Â months ago
Meta-CoT: Enhancing Granularity and Generalization in Image Editing
Episode 1815
🤗 Upvotes: 24 | cs.CV, cs.AI, cs.LG, cs.MM
Authors:
Shiyi Zhang, Yiji Cheng, Tiankai Hang, Zijin Yin, Runze He, …
2Â months ago
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
Episode 1814
🤗 Upvotes: 22 | cs.CV
Authors:
Jiayi Guo, Linqing Wang, Jiangshan Wang, Yang Yue, Zeyu Liu, Zhiyuan Zhao, Qingli…
2Â months ago
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Episode 1813
🤗 Upvotes: 102 | cs.CV
Authors:
Weijie Wang, Xiaoxuan He, Youping Gu, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Di…
2Â months ago
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
Episode 1812
🤗 Upvotes: 100 | cs.AI
Authors:
Zhengxu Yu, Yu Fu, Zhiyuan He, Yuxuan Huang, Lee Ka Yiu, Meng Fang, Weilin Luo, …
2Â months ago
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
Episode 1811
🤗 Upvotes: 57 | cs.CV
Authors:
Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen, Angel X. Chang
2Â months ago
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Episode 1810
🤗 Upvotes: 47 | cs.CV
Authors:
Zhiheng Liu, Weiming Ren, Xiaoke Huang, Shoufa Chen, Tianhong Li, Mengzhao Chen, …
2Â months ago
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
Episode 1809
🤗 Upvotes: 42 | cs.RO
Authors:
Qi Li, Bo Yin, Weiqi Huang, Ruhao Liu, Bojun Zou, Runpeng Yu, Jingwen Ye, Weihao …
2Â months ago
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
Episode 1808
🤗 Upvotes: 27 | cs.CV, cs.SE
Authors:
Fanqing Meng, Lingxiao Du, Zijian Wu, Guanzheng Chen, Xiangyan Liu, Jiaqi …
2Â months ago
SketchVLM: Vision language models can annotate images to explain thoughts and guide users
Episode 1807
🤗 Upvotes: 22 | cs.CV, cs.AI
Authors:
Brandon Collins, Logan Bolton, Hung Huy Nguyen, Mohammad Reza Taesiri, Tru…
2Â months ago