Publications


Selected Publications

The list may not be up-to-date. Please find my latest publications on Google Scholar.


RAGEN (RL-Agent): Training Agents by Reinforcing Reasoning [Website][PDF][Code][Experimental Logs][td;lr]
Zihan Wang*, Kangrui Wang*, Qineng Wang*, Pingyue Zhang*, Linjie Li*, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Best Poster Award at MMLS 2025 (Midwest Machine Learning Symposium)
2.3k+ Github Stars, Featured by MIT Tech Review, Lambda Partner Spotlight, VentureBeat, Medium, AI News, MarkTechPost, Business Leaders Review, etc.

VAGEN: Reinfocing World Model Reasoning for Multi-Turn VLM Agents [PDF][Blog][Code][td;lr]
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Chi Wan, Hanyang Chen, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
NeurIPS 2025

Exploring Diffusion Transformer Designs via Grafting [Website][PDF][Blog][Code][td;lr]
Keshigeyan Chandrasegaran*, Michael Poli*, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei
NeurIPS 2025
Oral (Top 0.36%)

Spatial Mental Modeling from Limited Views [Website][PDF][Data][Code][td;lr]
Qineng Wang*, Baiqiao Yin*, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu+, Li Fei-Fei+, Manling Li+
The Best of ICCV 2025, featured by Voxel 51
Spotlight at ICCV 2025 Workshop on Structural Priors for Vision

ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference [Website][PDF][Data][Code][td;lr]
Sanjana Srivastava*, Kangrui Wang*, Yung-Chieh Chan*, Tianyuan Dai, Manling Li, Ruohan Zhang, Mengdi Xu, Jiajun Wu, Li Fei-Fei
Best Paper Award at RSS 2025 on Continual Robot Learning from Humans

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents [Website][PDF][Code]
Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang
ICML 2025
Oral (Top 1%)

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas [PDF][Code][Data]
Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Carlos Niebles, Mor Geva, Junxian He, Jiajun Wu, Manling Li
ICML 2025

Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging [Website][PDF][Code][Data]
Shiqi Chen, Jinghan Zhang, Tongyao Zhu, Wei Liu, Siyang Gao, Miao Xiong, Manling Li, Junxian He
ICML 2025

SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering [Website][PDF][Code][Data]
Xuehang Guo, Xingyao Wang, Yangyi Chen, Sha Li, Chi Han, Manling Li, Heng Ji
ICML 2025

T*: Re-thinking Temporal Search for Long-Form Video Understanding [Website][PDF][Data][Code]
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025
Oral at ICCV 2025 Workshop on Long Multi-Scene Video Foundations

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models [Website][PDF][Code]
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, Jiajun Wu
CVPR 2025

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models [PDF]
Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu
ICLR 2025

Visually Descriptive Language Modeling for Vector Graphics Reasoning [PDF][Website][Code]
Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji
TMLR

The Law of Knowledge Overshadowing: Towards Understanding, Predicting and Preventing LLM Hallucination [PDF]
Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi Fung, Kathleen McKeown, ChengXiang Zhai, Manling Li, Heng Ji
ACL 2025 Findings

Chain-of-Experts: Unlocking the Communication Power of MoEs [PDF][Blog][Code][td;lr]
Zihan Wang, Rui Pan, Jiarui Yao, Róbert Csordás, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [Website][PDF][Code][Data][Docker][PyPi][Doc]
Manling Li*, Shiyu Zhao*, Qineng Wang*, Kangrui Wang*, Yu Zhou*, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
NeurIPS 2024 D&B Track
Oral (Top 0.6%)
Best Paper Award at SoCal NLP 2024, Top 0.4%

HourVideo: 1-Hour Video-Language Understanding [Website][PDF][Data][Code]
Keshigeyan Chandrasegaran, Agrim Gupta, Taran Kota, Lea M. Hadzic, Jimming He, Cristobal Eyzaguirre, Zane Durante, Manling Li, Jiajun Wu, Li Fei-Fei
NeurIPS 2024 D&B Track

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos [Website][PDF][Data][Code]
Yunong Liu, Weiyu Liu, Shubh Khanna, Cristobal Eyzaguirre, Manling Li, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Jiajun Wu
NeurIPS 2024 D&B Track

LM-Steer: Word Embeddings Are Steers for Language Models [Website][PDF][Code][Live Demo][Slides][Poster]
Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji
ACL 2024
(Outstanding Paper Award at ACL 2024)

Why Does New Knowledge Create Messy Ripple Effects in LLMs? [PDF]
Jiaxin Qin, Zixuan Zhang, Chi Han, Pengfei Yu, Manling Li, Heng Ji
EMNLP 2024

Deep Concept Injection for Zero-shot Multimodal Reasoning [PDF]
Xudong Lin, Manling Li, Richard Zemel, Heng Ji, Shih-Fu Chang
EMNLP 2024

SmartBook: AI-Assisted Situation Report Generation [PDF]
Revanth Gangi Reddy, Yi Fung, Qi Zeng, Manling Li, Zihan Wang, Paul Sullivan, Heng Ji
arXiv

Controlling Object Existence Hallucinations in Large Vision Language Models [PDF]
Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li
arXiv

Event-centric Multimodal Knowledge Acquisition [PDF]
Manling Li
Thesis Committee: Heng Ji, Jiawei Han, Chengxiang Zhai, Shih-Fu Chang, Kyunghyun Cho
Thesis (ACL Inaugral Best Desseratation Award Honorable Mention)

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners [PDF] [Code]
Zhenhailong Wang*,Manling Li*, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
NeurIPS'22 (equal contribution)

CLIP-Event:Connecting Vision and Text with Event Structures [PDF] [Data] [Code]
Manling Li, Ruochen Xu, Shuohang Wang, Xudong Lin, Chenguang Zhu, Xuedong Huang, Heng Ji, Shih-Fu Chang
CVPR'22
(Oral, Top 4.1%)

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation [PDF] [Code/Data]
Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, David Liem, Ahmed Elsayed, Martha Palmer, Jasmine Rah, Clare Voss, Cynthia Schneider, Boyan Onyshkevych
NAACL'21: System Demonstrations
(Best Demo Paper Award at NAACL2021)

Connecting the Dots: Event Graph Schema Induction with Path Language Modeling [PDF] [Code/Data] [Slides]
Manling Li, Qi Zeng, Ying Lin, Kyunghyun Cho, Heng Ji, Jonathan May, Nathanael Chambers and Clare Voss
EMNLP'20

GAIA: A Fine-grained Multimedia Knowledge Extraction System [PDF] [Code] [Video]
Manling Li*, Alireza Zareian*, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare R. Voss, Dan Napierski, Marjorie Freedman
ACL'20
(Best Demo Paper Award at ACL2020)

Cross-media Structured Common Space for Multimedia Event Extraction [PDF] [Code] [Slides]
Manling Li*, Alireza Zareian*, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
ACL'20 pp.2557–2568

GAIA at SM-KBP 2020: A Dockerized Multi-media Multi-lingual Knowledge Extraction, Clustering, Temporal Tracking and Hypothesis Generation System [PDF] [Project]
Manling Li, Ying Lin, Tuan Manh Lai, Xiaoman Pan, Haoyang Wen, Sha Li, etc %Zhenhailong Wang, Pengfei Yu, Lifu Huang, Di Lu, Qingyun Wang, Haoran Zhang, Qi Zeng, Chi Han, Zixuan Zhang, Yujia Qin, Xiaodan Hu, Nikolaus Parulian, Daniel Campos, Heng Ji, Brian Chen, Xudong Lin, Alireza Zareian, Amith Ananthram, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Yifan Wang, Michael Spector, Mitchell DeHaven, Daniel Napierski, Marjorie Freedman, Pedro Szekely, Haidong Zhu, Ram Nevatia, Yang Bai, Yifan Wang, Ali Sadeghian, Haodi Ma, Daisy Zhe Wang
TAC-KBP: Text Analysis Conference Knowledge Base Population Workshop 2020
Rank 1st in the National Institute of Standards and Technology (NIST) Streaming Multimedia Knowledge Base Population (SM-KBP) 2020

Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization
[PDF] [Code]
Manling Li, Lingyu Zhang, Heng Ji, Rich Radke
ACL'19: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.2190–2196

GAIA at SM-KBP 2019: A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System [PDF] [Project]
Manling Li, Ying Lin, Ananya Subburathinam, Spencer Whitehead, Xiaoman Pan, Di Lu, Qingyun Wang, Tongtao Zhang, Lifu Huang, Heng Ji, Alireza Zareian, Hassan Akbari, Brian Chen, Bo Wu, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Jennifer Chen, Eric Berquist, Kexuan Sun, Xujun Peng, Ryan Gabbard Marjorie Freedman, Pedro Szekely, T.K. Satish Kumar, Arka Sadhu, Ram Nevatia, Miguel Rodriguez, Yifan Wang, Yang Bai, Ali Sadeghian, Daisy Zhe Wang
TAC-KBP: Text Analysis Conference Knowledge Base Population Workshop 2019
Rank 1st in the National Institute of Standards and Technology (NIST) Streaming Multimedia Knowledge Base Population (SM-KBP) 2019