RL Theory Reading Group (2022 Winter)


课程内容:CS6789: Foundations of Reinforcement Learning (Cornell) (全英文)
课程内容介绍:这是一门advanced的且theory-heavy的课程,学生在在开始本课程前,需要对于线性代数,概率论,优化,机器学习有较好的掌握。
开始时间:Jan, 2022
参与方式:首先,确定想要参与。然后,添加组织者微信进入微信群。每个参与者将至少负责一个slides的讲解。

具体时间


每周一 上午8:30-9:30
每周四 上午8:30-9:30

Staff


组织者: Yanjie Ze (SJTU)
Email: zeyanjie@sjtu.edu.cn
Wechat ID: zeyanjie

Meeting Information

讨论小组将使用腾讯会议进行,发布在微信群里。

If you are not enrolled/wait listed, but want to have access, please email zeyanjie@sjtu.edu.cn to ask for permission. We will make a decision based on the capacity of the class and your research background.

Course Notes: RL Theory and Algorithms

The course will be largely based of the working draft of the book "Reinforcement Learning Theory and Algorithms", available here. We will be updating these notes in AJKS throughout the course of the term. If you find typos or errors, please let us know. We would appreciate it!

Schedule (tentative)

Date Lecture Reading Slides/HW Lecturer Video
01/24 Fundamentals: Markov Decision Processes Ch.1 Slides, Annotated slides, HW0 Yanjie Ze bilibili
01/27 Fundamentals: Policy Iteration and Value Iteration Ch.1 Slides, Annotated slides Qi Liu bilibili
01/30 Fundamentals: Computational Complexity & The LP-Formulation Ch.1 Slides, Annotated slides Jiaqi Xue bilibili
02/03 Fundamentals: Statistical Limits with a Generative Model Ch.2 Slides, Annotated slides, Zhuowen Zheng TBD
02/07 Fundamentals: Generalization in RL Ch.5 Slides_1, Slides_2, Annotated Slides_1, Annotated Slides_2 Tianyi Bai TBD
02/10 Fundamentals: Lower Bounds (the Linear Q* Assumption) & Bellman Completeness Ch.3 Slides_2_continued, Slides_Bellman_complete, Slides_2_continued_annotated, HW1 Zhouliang Yu TBD
02/14 Fundamentals: Linear Bellman Completion (Continue) Ch.3 Slides, Annotated Slides Youcheng Li TBD
02/17 Fundamentals: Fitted Dynamic Programming Ch.4 Slides, Annotated Slides Yan Zhang TBD
TBD Exploration: Multi-armed Bandits / Linear Bandits Ch.6 Slides, Annotated Slides Fanpeng Meng TBD
TBD Exploration: Linear Bandits Ch.6 Slides, Annotated slides Tiancheng Fang TBD
TBD Exploration: Efficient Exploration in Tabular MDPs Ch.7 Slides, Annotated Slides Yanjie Ze TBD
TBD Exploration: Efficient Exploration in Linear MDPs Ch.8 Slides, Annotated Slides Qi Liu TBD
TBD Exploration: Linear MDP (continue) Slides, Annotated Slides Jiaqi Xue TBD
TBD Exploration: Learning in Large Scale MDPs (Bellman rank) Ch.9 Slides, Annotated Slides Zhuowen Zheng TBD
TBD Exploration: Learning in Large Scale MDPs (Bellman rank Continued) Slides, Annotated Slides Tianyi Bai TBD
TBD Policy Optimization: Policy Gradient and Convergence Ch.11 Slides, Annotated Slides, HW2 Zhouliang Yu TBD
TBD Policy Optimization: Global Convergence Ch.12 Slides, Annotated Slides Youcheng Li TBD
TBD Policy Optimization: Natural Policy Gradient (NPG) and its Global Convergence Ch.12 Slides, Annotated Slides Yan Zhang TBD
TBD Policy Optimization: NPG and Function Approximation Ch.12 Slides, Annotated Slides Fanpeng Meng TBD
TBD Policy Optimization: Function Approximation Ch.13 Slides, Annotated Slides Tiancheng Fang TBD
TBD Control: Linear Quadratic Regulators (LQRs) Ch.16 Slides, Annotated Slides TBD TBD
TBD Control: Convex parameterization for linear systems and online control Ch.16 Slides, Annotated Slides TBD TBD
TBD Offline RL: Recent Advancements in Offline RL Slides, HW3 TBD TBD
TBD Imitation Learning: Behavior Cloning and Distribution Matching Slides TBD TBD
TBD Imitation Learning: Interactive Imitation Learning Slides TBD TBD