Publications

Please find my latest publications on Google Scholar.


2025


RAGEN (RL-Agent): Training Agents by Reinforcing Reasoning [Website][Code][Tweet]
Zihan Wang*, Kangrui Wang*, Qineng Wang*, Pingyue Zhang*, Linjie Li*, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
Open-source Project: LLM Agent Training via Reinforcement Learning, featuring:
- RL training for LLM Agents under a MDP formulation
- Clean, Minimized Code Organization
- Multi-Env, Multi-Action
- Secret Sauce for RL Training algorithms
- Easy-to-use Agent Environment Protocol

VAGEN: Training VLM Agents with Multi-Turn Reinforcement Learning [Blog][Code][Tweet]
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Qineng Wang*, Linjie Li*, Zhengyuan Yang, Chi Wan, Yiping Lu, Manling Li
Open-source Project: VAGEN is a multi-turn reinforcement learning framework designed specifically for training VLM Agents. VAGEN leverages the TRICO algorithm to efficiently train VLMs for visual agentic tasks.

Chain-of-Experts: Unlocking the Communication Power of MoEs [Blog][Code][Tweet]
Zihan Wang, Rui Pan, Lu Yin, Manling Li, Shiwei Liu
Open-source Project: CoE achieves 17.6%-42% lower memory usage and reduces Math task validation loss from 1.20 to 1.12 with comparable compute.

Re-thinking Temporal Search for Long-Form Video Understanding [Website][PDF][Data][Code]
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models [Website][PDF][Code]
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, Jiajun Wu
CVPR 2025

Foundation Models Meet Embodied Agents [Website/Slides/Videos]
Manling Li, Yunzhu Li, Jiayuan Mao, Wenlong Huang
AAAI 2025: Tutorial
NAACL 2025: Tutorial

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models [PDF]
Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu
ICLR 2025

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas [PDF][Code][Data]
Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Carlos Niebles, Mor Geva, Junxian He, Jiajun Wu, Manling Li
arXiv

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents [Website][PDF][Code]
Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang
arXiv

SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering [Website][PDF][Code][Data]
Xuehang Guo, Xingyao Wang, Yangyi Chen, Sha Li, Chi Han, Manling Li, Heng Ji
arXiv


2024


Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [Website][PDF][Code][Data][Docker][PyPi][Doc]
Manling Li*, Shiyu Zhao*, Qineng Wang*, Kangrui Wang*, Yu Zhou*, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
NeurIPS 2024 Benchmark Track (Oral, Top 0.6%)
Best Paper Award at SoCal NLP 2024, Top 0.4%

HourVideo: 1-Hour Video-Language Understanding [Website][PDF][Data][Code]
Keshigeyan Chandrasegaran, Agrim Gupta, Taran Kota, Lea M. Hadzic, Jimming He, Cristobal Eyzaguirre, Zane Durante, Manling Li, Jiajun Wu, Li Fei-Fei
NeurIPS 2024 Benchmark Track

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos [Website][PDF][Data][Code]
Yunong Liu, Weiyu Liu, Shubh Khanna, Cristobal Eyzaguirre, Manling Li, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Jiajun Wu
NeurIPS 2024 Benchmark Track

Visually Descriptive Language Modeling for Vector Graphics Reasoning [PDF][Website][Code]
Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji
arXiv

LM-Steer: Word Embeddings Are Steers for Language Models [Website][PDF][Code][Live Demo][Slides][Poster]
Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji
ACL 2024
(Outstanding Paper Award at ACL 2024)

Why Does New Knowledge Create Messy Ripple Effects in LLMs? [PDF]
Jiaxin Qin, Zixuan Zhang, Chi Han, Pengfei Yu, Manling Li, Heng Ji
EMNLP 2024

Deep Concept Injection for Zero-shot Multimodal Reasoning [PDF]
Xudong Lin, Manling Li, Richard Zemel, Heng Ji, Shih-Fu Chang
EMNLP 2024

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models [PDF]
Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi Fung, Jing Li, Manling Li, Heng Ji
arXiv

MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders [PDF]
Cheng Li, May Fung, Qingyun Wang, Chi Han, Manling Li, Jindong Wang, Heng Ji
arXiv

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate [PDF]
Kyungha Kim*, Sangyun Lee*, Kung-Hsiang Huang*, Hou Pong Chan, Manling Li, Heng Ji
arXiv

InfoPattern: Unveiling Information Propagation Patterns in Social Media [PDF]
Chi Han*, Jialiang Xu*, Manling Li* , Hanning Zhang*, Tarek Abdelzaher, Heng Ji
arXiv

SmartBook: AI-Assisted Situation Report Generation [PDF]
Revanth Gangi Reddy, Yi Fung, Qi Zeng, Manling Li, Zihan Wang, Paul Sullivan, Heng Ji
arXiv

Controlling Object Existence Hallucinations in Large Vision Language Models [PDF]
Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li
arXiv


2023


Event-centric Multimodal Knowledge Acquisition [PDF]
Manling Li
Thesis Committee: Heng Ji, Jiawei Han, Chengxiang Zhai, Shih-Fu Chang, Kyunghyun Cho
Thesis

ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation [PDF]
Yangyi Chen, Xingyao Wang, Manling Li, Derek Hoiem, Heng Ji
EMNLP 2023

Defining a New NLP Playground [PDF]
Sha Li, Chi Han, Pengfei Yu, Carl Edwards, Manling Li, Xingyao Wang, Yi Fung, Charles Yu, Joel R. Tetreault, Eduard H Hovy, Heng Ji
EMNLP 2023 Findings

Knowledge-Driven Vision-Language Encoding [Website]
Manling Li, Xudong Lin, Jie Lei, Mohit Bansal, Carl Vondrick, Shih-Fu Chang, Heng Ji
CVPR 2023: Tutorial

Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval [PDF] [Code]
Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang
CVPR 2023

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting [Website]
Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Manling Li, Yangqiu Song, Carl Yang
NeurIPS 2023

Non-Sequential Graph Script Induction via Multimedia Grounding [PDF]
Yu Zhou, Sha Li, Manling Li, Xudong Lin, Shih-Fu Chang, Mohit Bansal and Heng Ji
ACL 2023 ( denotes supervised undergraduate)

A Language First Approach to Procedure Planning [PDF]
Jiateng Liu, Sha Li, Zhenhailong Wang, Manling Li, Heng Ji
ACL 2023 Findings ( denotes supervised undergraduate)

Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification [PDF]
Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch and Jiawei Han
ACL 2023 ( denotes supervised undergraduate)

Multimedia Generative Script Learning for Task Planning [PDF]
Qingyun Wang, Manling Li, Hou Pong Chan, Lifu Huang, Julia Hockenmaier, Girish Chowdhary and Heng Ji
ACL 2023 Findings

Learning to Decompose Visual Features with Latent Textual Prompts [PDF] [Code]
Feng Wang, Manling Li, Xudong Lin, Hairong Lv, Alexander Schwing, Heng Ji
ICLR 2023 ( denotes supervised undergraduate)

Knowledge-Driven Vision-Language Pretraining [PDF] [Website]
Manling Li, Xudong Lin, Jie Lei, Mohit Bansal, Shih-Fu Chang, Heng Ji
AAAI 2023: Tutorial

Video Event Extraction via Tracking Visual States of Arguments [PDF] [Code]
Guang Yang, Manling Li, Jiajie Zhang, Xudong Lin, Shih-Fu Chang, Heng Ji
AAAI 2023 ( denotes supervised undergraduate)

ADEPT: A DEbiasing PrompT Framework [PDF] [Code]
Ke Yang, Charles Yu, Yi Fung, Manling Li, Heng Ji
AAAI 2023 ( denotes supervised undergraduate)


2022


Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners [PDF] [Code]
Zhenhailong Wang*,Manling Li*, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
NeurIPS'22 (equal contribution)

CLIP-Event:Connecting Vision and Text with Event Structures [PDF] [Data] [Code]
Manling Li, Ruochen Xu, Shuohang Wang, Xudong Lin, Chenguang Zhu, Xuedong Huang, Heng Ji, Shih-Fu Chang
CVPR'22 (Oral, Top 4.1%)

COVID-19 Claim Radar: A Structured Claim Extraction and Tracking System [PDF] [Code] [Demo] [Video]
Manling Li, Revanth Gangi Reddy, Ziqi Wang, Yi-Shyuan Chiang, Tuan M. Lai, Pengfei Yu, Zixuan Zhang,Heng Ji
ACL'22 Demo

Event Schema Induction with Double Graph Autoencoders [PDF] [Code]
Xiaomeng Jin, Manling Li and Heng Ji
NAACL'22

New Frontiers of Information Extraction [PDF] [Website] [Slides] [Videos]
Muhao Chen, Lifu Huang, Manling Li, Ben Zhou, Heng Ji
NAACL'22: Tutorial

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding [PDF] [Data]
Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avi Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji
AAAI'22


2021


The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction [PDF] [Data]
Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han and Clare Voss
EMNLP'21

Timeline Summarization based on Event Graph Compression via Time-Aware Optimal Transport [PDF] [Data]
Manling Li, Tengfei Ma, Mo Yu, Lingfei Wu, Tian Gao, Heng Ji and Kathleen McKeown
EMNLP'21

Joint Multimedia Event Extraction from Video and Article [PDF] [Data]
Brian Chen, Xudong Lin, Christopher Thomas, Manling Li, Shoya Yoshida, Lovish Chum, Heng Ji and Shih-Fu Chang
EMNLP'21 Findings

Event-centric Natural Language Processing [PDF] [Slides]
Muhao Chen, Hongming Zhang, Qiang Ning, Manling Li, Heng Ji, Kathleen McKeown and Dan Roth
ACL'21: Tutorial.

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation [PDF] [Code/Data]
Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, David Liem, Ahmed Elsayed, Martha Palmer, Jasmine Rah, Clare Voss, Cynthia Schneider, Boyan Onyshkevych
NAACL'21: System Demonstrations
(Best Demo Paper Award at NAACL2021)

RESIN: A Dockerlized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System [PDF] [Code]
Haoyang Wen, Ying Lin, Tuan M. Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi R. Fung, Piyush Mishra, Qing Lyu, Dídac Surís, Brian Chen, Susan W. Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang and Heng Ji
NAACL'21: System Demonstrations

Event-centric Natural Language Processing [PDF] [Slides]
Muhao Chen, Hongming Zhang, Qiang Ning, Manling Li, Heng Ji and Dan Roth
AAAI'21: Tutorial.


2020


Connecting the Dots: Event Graph Schema Induction with Path Language Modeling [PDF] [Code/Data] [Slides]
Manling Li, Qi Zeng, Ying Lin, Kyunghyun Cho, Heng Ji, Jonathan May, Nathanael Chambers and Clare Voss
EMNLP'20: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.

GAIA: A Fine-grained Multimedia Knowledge Extraction System [PDF] [Code] [Video]
Manling Li*, Alireza Zareian*, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare R. Voss, Dan Napierski, Marjorie Freedman
ACL'20: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 77–86
(Best Demo Paper Award at ACL2020)

Cross-media Structured Common Space for Multimedia Event Extraction [PDF] [Code] [Slides]
Manling Li*, Alireza Zareian*, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
ACL'20: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.2557–2568

GAIA at SM-KBP 2020 - A Dockerized Multi-media Multi-lingual Knowledge Extraction, Clustering, Temporal Tracking and Hypothesis Generation System [PDF] [Project]
Manling Li, Ying Lin, Tuan Manh Lai, Xiaoman Pan, Haoyang Wen, Sha Li, etc %Zhenhailong Wang, Pengfei Yu, Lifu Huang, Di Lu, Qingyun Wang, Haoran Zhang, Qi Zeng, Chi Han, Zixuan Zhang, Yujia Qin, Xiaodan Hu, Nikolaus Parulian, Daniel Campos, Heng Ji, Brian Chen, Xudong Lin, Alireza Zareian, Amith Ananthram, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Yifan Wang, Michael Spector, Mitchell DeHaven, Daniel Napierski, Marjorie Freedman, Pedro Szekely, Haidong Zhu, Ram Nevatia, Yang Bai, Yifan Wang, Ali Sadeghian, Haodi Ma, Daisy Zhe Wang
TAC-KBP: Text Analysis Conference Knowledge Base Population Workshop 2020 (Rank 1st in the leaderboard.)


2019


Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization
[PDF] [Code]
Manling Li, Lingyu Zhang, Heng Ji, Rich Radke
ACL'19: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.2190–2196

Multilingual Entity, Relation, Event and Human Value Extraction [PDF] [Code] [Video]
Manling Li, Ying Lin, Joe Hoover, Spencer Whitehead, Clare Voss, Morteza Dehghani, Heng Ji
NAACL'19: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp.110–115

GAIA at SM-KBP 2019 - A Multi-media Multi-lingual KnowledgeExtraction and Hypothesis Generation System [PDF] [Project]
Manling Li, Ying Lin, Ananya Subburathinam, Spencer Whitehead, Xiaoman Pan, Di Lu, Qingyun Wang, Tongtao Zhang, Lifu Huang, Heng Ji, Alireza Zareian, Hassan Akbari, Brian Chen, Bo Wu, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Jennifer Chen, Eric Berquist, Kexuan Sun, Xujun Peng, Ryan Gabbard Marjorie Freedman, Pedro Szekely, T.K. Satish Kumar, Arka Sadhu, Ram Nevatia, Miguel Rodriguez, Yifan Wang, Yang Bai, Ali Sadeghian, Daisy Zhe Wang
TAC-KBP: Text Analysis Conference Knowledge Base Population Workshop 2019 (Rank 1st, with more than 10% higher than the second team.)


2018 and Before


Please see my full list at Google Scholar.