CS 496 Agent AI

Spring 2025

Time: Monday 2:00pm-4:50pm, Apr 1-Jun 7, 2025
Location: Technological Institute L160, Over zoom for some external talks, project presentations and discussions
Instructor: Prof. Manling Li (Email: manling.li@northwestern.edu)
TA: Jiahao Yu (Email: jiahao.yu@northwestern.edu)
Instructor and TA Office Hours: Instructor office hour is on Monday 9:00am-10:00am (in-person), may change to zoom due to travel schedules. TA office hours are on Monday and Wednesday over zoom (Please contact TA jiahao.yu@northwestern.edu about it)
Course Google Folder: announced on Canvas.
Assignment Submission: on Canvas: https://canvas.northwestern.edu/courses/230363

Course Summary: This comprehensive course explores two major categories of AI agents: web-based agents that interact with digital environments and embodied agents that operate in physical spaces. Students will learn to design and implement both types of agents, understanding their unique challenges and capabilities, while mastering the integration of LLMs with various interaction modalities. Prerequisites

  • Introduction to Machine Learning
  • Python Programming
  • Basic Robotics or Computer Vision
  • Linear Algebra
  • Probability and Statistics

Students who complete this course will be able to:

  • Design web agents that navigate digital environments
  • Design embodied agents for physical interaction
  • Create robust perception and action systems
  • Control decision

Course Syllabus:

Week Topic Details
Week 1 Introduction to Agent AI
  • Definition and Overview of Agents
  • Markov Decision Process (MDP)
  • Agent Formulation based on MDP
  • Role of Large Language Models (LLMs)
    • Different roles of LLMs in perception and interaction with environments.
    • LLMs as reasoning engines for decision-making.
Week 2 Agent Learning Mechanisms
  • Agent Architectures
  • Self-supervised Finetuning
  • Reinforcement learning in agent control.
  • Inference Time Scaling for agents.
  • LLMs in Agent Learning
    • Prompt-based few-shot and zero-shot learning for agents.
    • Using LLMs to generate or refine reward functions.
    • LLMs as planners or policy advisors for RL agents.
Week 3 Reasoning and Planning in Agent Models
  • Logical reasoning for agents.
  • Integration of world knowledge.
  • Classical Planning: Search-based planning (BFS, DFS). Optimal algorithms: A*, Dijkstra.
  • Classical Planning: Logic-based planning (STRIPS, PDDL).
Week 4 Benchmarking and Evaluation
  • Evaluation Metrics: Task performance, generalization, and safety.
  • Ethical considerations in agent evaluation.
  • Benchmark Datasets and Tasks
    • Web agent benchmarks: WebArena, WebGPT, CrawlBot.
    • Embodied benchmarks: BEHAVIOR, Habitat, ALFRED.
Week 5 Web Agents
  • Web-Based Environments for Agents: HTML, APIs, and web crawling.
  • Examples of Web Agents
  • LLMs in Web Agents
    • Context understanding and dialogue generation.
    • Fine-tuning LLMs for domain-specific agents.
    • Challenges: Multimodal web understanding.
Week 6 Embodied Agents
  • Foundations of Embodied Intelligence: What defines an embodied agent?
  • Simulated Environments: OpenAI Gym, Habitat, BEHAVIOR, MuJoCo.
  • Tasks for embodied agents (navigation, manipulation).
  • Examples of Embodied Agents: Human-robot interaction; Autonomous driving and drones.
  • LLMs in Embodied Agents
    • Embedding reasoning and planning into embodied systems.
    • Hierarchical decision-making with LLMs.
    • Challenges: Grounding language in physical environments.
Week 7 Embodied Agents Advanced Topics
  • Diffusion Models
  • Vision-Language-Action Models
  • Large World Models
Week 8 Multi-Agent Systems
  • Multi-agent collaboration and negotiation.
  • Emergent behavior in multi-agent settings.
  • Multi-Agent Planning
    • Task allocation and shared planning.
    • Auction-based and distributed algorithms.
  • Examples: Swarm robotics, multi-user web agents.
Week 9 Ethics and Safety in Agent AI
  • Bias in agent decision-making.
  • Social Norms in Agent AI
  • Trustworthiness: Ensuring transparency and accountability.
  • Scaling agents for real-world applications.
Week 10 Final Project Presentations
  • Building a web agent with LLM integration.
  • Creating embodied agents in simulated environments.
  • Evaluate agent performance using real-world scenarios.

Grading:

  • Weekly Reading
    • 20pts in total
    • Submit a paragraph for one paper each week.
  • Mid-Term Exams
    • 30pts in total
    • Openbook exam, about open-end questions regarding three papers.
  • Term Project
    • 50 pts in total, 8pts project proposal (5pts report, 3pts lightning talk), 12pts mid-term project report (8pts report, 4pts milestone presentation), 30pts final project report (20pts report, 10pts presentation).
    • The instructor will give 10 topics for the students to choose from. Students are expected to do self-teaming and each team should consist of 3-6 students. Everyone is encouraged to submit papers based on the term projects. Project score will by default be the same for all team members, but some team members can get a higher or lower score than the team score based on individual performance that is assessed in two ways: (1) checking contribution to final deliverables (e.g., Git commits and Final Project Report), and (2) Instructor and TAs’ opinion from project presentations.
Manling Li
Manling Li
Assistant Professor

I study reasoning and planning in multimodal foundation models.