Learning Objectives

By the end of the course, students will be able to

  • Define the key features of reinforcement learning that distinguish it from standard machine learning.
  • Identify the strengths and limitations of various reinforcement learning algorithms.
  • Formulate and solve sequential decision-making problems by applying relevant reinforcement learning tools.
  • Recognize the common boundary between optimization and reinforcement learning.
  • Generalize or discover new applications, algorithms, or theories of reinforcement learning towards independent research on the topic.

Course Content

  • Week 1
    Introduction
    • Learning objectives and course logistics
    • An overview on RL
    • A primer on optimization
  • Week 2
    Dynamic Programming and Linear Programming
    • Markov Decision Processes (MDPs)
    • Bellman Equations and Bellman Optimality
    • Value/Policy Iteration
    • Linear Programming
  • Week 3
    Value-based RL
    • From Planning to Reinforcement Learning
    • Model-free Prediction
    • Model-free Control
    • Function Approximation
    • Convergence Analysis
  • Week 4
    Policy-based RL I (Algorithms)
    • Overview of Policy-based RL
    • Policy Gradient Estimation
    • Policy Gradient Methods (PG)
    • Natural Policy Gradient (NPG)
    • Beyond PG: TRPO, PPO, etc.
  • Week 5
    Policy-based RL II (Theory)
    • Performance Difference Lemma
    • Global Convergence of Policy Gradient Methods
    • Global Convergence of Natural Policy Gradient Methods
    • Remarks on Sample Efficiency
  • Week 6
    Multi-agent RL and Markov Games
    • RL From Single Agent to Multiple Agents
    • Preliminaries: Normal Form Games and Repeated Games
    • Markov Games and Algorithms
    • Zero-Sum Markov Games and Algorithms
  • Week 7
    Imitation Learning
    • Offline Imitation Learning: Behavior Cloning
    • Online Interactive Imitation Learning: DAGGER, AggreVaTe
    • Inverse Reinforcement Learning: Feature Expectation Matching, Max-Ent IRL
    • Generative Adversarial Imitation Learning (GAIL)
  • Week 8
    Deep RL
    • Algorithms:
    • Actor-Critic Methods
    • Overview of Deep RL
    • Value-based Deep RL
    • Policy-based/Actor-Critic Deep RL
    • Theory:
    • From DL Theory to Deep RL Theory
    • Convergence Analysis of Neural TD-learning
    • Value-based Deep RL
    • Convergence Analysis of Neural Actor-Critic
  • Week 9
    Going Beyond: Model-based RL, Offline RL, Many-agent RL
    • Model-based RL
    • Offline RL
    • Many-agent RL
    • Summary and Outlook

Recommended References

There is no required textbook. Lectures and class discussions are mostly based on classical and recent papers on the topic.

RL textbooks

  • [S09] Algorithms for Reinforcement Learning, Csaba Szepesvari, 2009.
  • [SB18] Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto, 2018.
  • [B19] Reinforcement Learning and Optimal Control, Dimitri P. Bertsekas, 2019.
  • [AJK20] Reinforcement Learning: Theory and Algorithms, Alekh Agarwal, Nan Jiang, and Sham M. Kakade, 2020.
  • [M21] Control Systems and Reinforcement Learning, S. Meyn, Cambridge University Press, 2021.
  • [KWM22] Algorithms for Decision Making, Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray, MIT Press, 2022.

Optimization foundations

ML and AI foundations

Conferences and workshop proceedings

Useful Links

People