Learning Objectives

By the end of the course, students will be able to

  • Define the key features of RL that distinguishes it from standard ML;
  • Identify the strengths and limitations of various reinforcement learning algorithms;
  • Formulate and solve sequential decision-making problems by applying relevant reinforcement learning tools;
  • Recognize the common, connecting boundary of optimization and RL;
  • Generalize or discover “new” applications, algorithms, or theories of reinforcement learning towards conducting independent research on the topic.

Course Content

  • Week 1
    Introduction
    • Learning objectives and course logistics
    • An overview on RL
    • A primer on optimization
  • Week 2
    Dynamic Programming and Linear Programming
    • Markov Decision Processes (MDPs)
    • Bellman Equations and Bellman Optimality
    • Value/Policy Iteration
    • Linear Programming
  • Week 3
    Value-based RL
    • From Planning to Reinforcement Learning
    • Model-free Prediction
    • Model-free Control
    • Function Approximation
    • Convergence Analysis
  • Week 4
    Policy-based RL I (Algorithms)
    • Overview of Policy-based RL
    • Policy Gradient Estimation
    • Policy Gradient Methods (PG)
    • Natural Policy Gradient (NPG)
    • Beyond PG: TRPO, PPO, etc.
  • Week 5
    Policy-based RL II (Theory)
    • Performance Difference Lemma
    • Global Convergence of Policy Gradient Methods
    • Global Convergence of Natural Policy Gradient Methods
    • Remarks on Sample Efficiency
  • Week 6
    Multi-agent RL and Markov Games
    • RL From Single Agent to Multiple Agents
    • Preliminaries: Normal Form Games and Repeated Games
    • Markov Games and Algorithms
    • Zero-Sum Markov Games and Algorithms
  • Week 7
    Imitation Learning
    • Offline Imitation Learning: Behavior Cloning
    • Online Interactive Imitation Learning: DAGGER, AggreVaTe
    • Inverse Reinforcement Learning: Feature Expectation Matching, Max-Ent IRL
    • Generative Adversarial Imitation Learning (GAIL)
  • Week 8
    Deep RL
    • Algorithms:
      • Actor-Critic Methods
      • Overview of Deep RL
      • Value-based Deep RL
      • Policy-based/Actor-Critic Deep RL
    • Theory:
      • From DL Theory to Deep RL Theory
      • Convergence Analysis of Neural TD-learning
      • Value-based Deep RL
      • Convergence Analysis of Neural Actor-Critic
  • Week 9
    Going Beyond: Model-based RL, Offline RL, Many-agent RL
    • Model-based RL
    • Offline RL
    • Many-agent RL
    • Summary and Outlook

Recommended References

There is no required textbook. Lectures and class discussions are mostly based on classical and recent papers on the topic.

RL textbooks:

  • [S09] Algorithms for Reinforcement Learning, Csaba Szepesvári, 2009.
  • [SB18] Reinforcement learning: an introduction, by Richard S. Sutton, Andrew G. Barto, 2018.
  • [B19] Reinforcement Learning and Optimal Control, by Dimitri P. Bertsekas, 2019.
  • [AJK20] Reinforcement Learning: Theory and Algorithms by Alekh Agarwal, Nan Jiang, Sham M. Kakade, 2020.
  • [M21] Control Systems and Reinforcement Learning by S. Meyn, Cambridge University Press, 2021.
  • [KWM22] Algorithms for Decision Making by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Mray, MIT Press, 2022.

Optimization foundations:

ML/AI foundations

Conferences and Workshop Proceedings

Useful links