RL Theory Reading Group (2022 Winter)

课程内容：CS6789: Foundations of Reinforcement Learning (Cornell) （全英文）
课程内容介绍：这是一门advanced的且theory-heavy的课程，学生在在开始本课程前，需要对于线性代数，概率论，优化，机器学习有较好的掌握。
开始时间：Jan, 2022
参与方式：首先，确定想要参与。然后，添加组织者微信进入微信群。每个参与者将至少负责一个slides的讲解。

具体时间

每周一上午8:30-9:30
每周四上午8:30-9:30

Staff

组织者: Yanjie Ze (SJTU)
Email: zeyanjie@sjtu.edu.cn
Wechat ID: zeyanjie

Meeting Information

讨论小组将使用腾讯会议进行，发布在微信群里。

If you are not enrolled/wait listed, but want to have access, please email zeyanjie@sjtu.edu.cn to ask for permission. We will make a decision based on the capacity of the class and your research background.

Course Notes: RL Theory and Algorithms

The course will be largely based of the working draft of the book "Reinforcement Learning Theory and Algorithms", available here. We will be updating these notes in AJKS throughout the course of the term. If you find typos or errors, please let us know. We would appreciate it!

Schedule (tentative)

Date	Lecture	Reading	Slides/HW	Lecturer	Video
01/24	Fundamentals: Markov Decision Processes	Ch.1	Slides, Annotated slides, HW0	Yanjie Ze	bilibili
01/27	Fundamentals: Policy Iteration and Value Iteration	Ch.1	Slides, Annotated slides	Qi Liu	bilibili
01/30	Fundamentals: Computational Complexity & The LP-Formulation	Ch.1	Slides, Annotated slides	Jiaqi Xue	bilibili
02/03	Fundamentals: Statistical Limits with a Generative Model	Ch.2	Slides, Annotated slides,	Zhuowen Zheng	TBD
02/07	Fundamentals: Generalization in RL	Ch.5	Slides_1, Slides_2, Annotated Slides_1, Annotated Slides_2	Tianyi Bai	TBD
02/10	Fundamentals: Lower Bounds (the Linear Q* Assumption) & Bellman Completeness	Ch.3	Slides_2_continued, Slides_Bellman_complete, Slides_2_continued_annotated, HW1	Zhouliang Yu	TBD
02/14	Fundamentals: Linear Bellman Completion (Continue)	Ch.3	Slides, Annotated Slides	Youcheng Li	TBD
02/17	Fundamentals: Fitted Dynamic Programming	Ch.4	Slides, Annotated Slides	Yan Zhang	TBD
TBD	Exploration: Multi-armed Bandits / Linear Bandits	Ch.6	Slides, Annotated Slides	Fanpeng Meng	TBD
TBD	Exploration: Linear Bandits	Ch.6	Slides, Annotated slides	Tiancheng Fang	TBD
TBD	Exploration: Efficient Exploration in Tabular MDPs	Ch.7	Slides, Annotated Slides	Yanjie Ze	TBD
TBD	Exploration: Efficient Exploration in Linear MDPs	Ch.8	Slides, Annotated Slides	Qi Liu	TBD
TBD	Exploration: Linear MDP (continue)		Slides, Annotated Slides	Jiaqi Xue	TBD
TBD	Exploration: Learning in Large Scale MDPs (Bellman rank)	Ch.9	Slides, Annotated Slides	Zhuowen Zheng	TBD
TBD	Exploration: Learning in Large Scale MDPs (Bellman rank Continued)		Slides, Annotated Slides	Tianyi Bai	TBD
TBD	Policy Optimization: Policy Gradient and Convergence	Ch.11	Slides, Annotated Slides, HW2	Zhouliang Yu	TBD
TBD	Policy Optimization: Global Convergence	Ch.12	Slides, Annotated Slides	Youcheng Li	TBD
TBD	Policy Optimization: Natural Policy Gradient (NPG) and its Global Convergence	Ch.12	Slides, Annotated Slides	Yan Zhang	TBD
TBD	Policy Optimization: NPG and Function Approximation	Ch.12	Slides, Annotated Slides	Fanpeng Meng	TBD
TBD	Policy Optimization: Function Approximation	Ch.13	Slides, Annotated Slides	Tiancheng Fang	TBD
TBD	Control: Linear Quadratic Regulators (LQRs)	Ch.16	Slides, Annotated Slides	TBD	TBD
TBD	Control: Convex parameterization for linear systems and online control	Ch.16	Slides, Annotated Slides	TBD	TBD
TBD	Offline RL: Recent Advancements in Offline RL		Slides, HW3	TBD	TBD
TBD	Imitation Learning: Behavior Cloning and Distribution Matching		Slides	TBD	TBD
TBD	Imitation Learning: Interactive Imitation Learning		Slides	TBD	TBD