Date |
|
Lecture |
Reading |
Slides/HW |
Lecturer |
Video |
01/24 |
|
Fundamentals: Markov Decision Processes |
Ch.1 |
Slides, Annotated slides, HW0 |
Yanjie Ze |
bilibili |
01/27 |
|
Fundamentals: Policy Iteration and Value Iteration |
Ch.1 |
Slides, Annotated slides |
Qi Liu |
bilibili |
01/30 |
|
Fundamentals: Computational Complexity & The LP-Formulation |
Ch.1 |
Slides,
Annotated slides |
Jiaqi Xue |
bilibili |
02/03 |
|
Fundamentals: Statistical Limits
with a Generative Model |
Ch.2 |
Slides,
Annotated slides,
|
Zhuowen Zheng |
TBD |
02/07 |
|
Fundamentals: Generalization in RL |
Ch.5
|
Slides_1, Slides_2,
Annotated Slides_1, Annotated Slides_2
|
Tianyi Bai |
TBD |
02/10 |
|
Fundamentals: Lower Bounds
(the Linear Q* Assumption) & Bellman
Completeness |
Ch.3
|
Slides_2_continued, Slides_Bellman_complete,
Slides_2_continued_annotated,
HW1
|
Zhouliang Yu |
TBD |
02/14 |
|
Fundamentals: Linear Bellman Completion (Continue) |
Ch.3
|
Slides, Annotated Slides |
Youcheng Li |
TBD |
02/17 |
|
Fundamentals: Fitted Dynamic Programming |
Ch.4 |
Slides, Annotated Slides |
Yan Zhang |
TBD |
TBD |
|
Exploration: Multi-armed Bandits / Linear Bandits |
Ch.6 |
Slides, Annotated Slides |
Fanpeng Meng |
TBD |
TBD |
|
Exploration: Linear Bandits |
Ch.6 |
Slides, Annotated slides |
Tiancheng Fang |
TBD |
TBD |
|
Exploration: Efficient Exploration in Tabular MDPs |
Ch.7 |
Slides, Annotated Slides |
Yanjie Ze |
TBD |
TBD |
|
Exploration: Efficient Exploration in Linear MDPs |
Ch.8 |
Slides, Annotated Slides |
Qi Liu |
TBD |
TBD |
|
Exploration: Linear MDP (continue) |
|
Slides, Annotated Slides |
Jiaqi Xue |
TBD |
TBD |
|
Exploration: Learning in Large Scale MDPs (Bellman rank) |
Ch.9 |
Slides, Annotated Slides |
Zhuowen Zheng |
TBD |
TBD |
|
Exploration: Learning in Large Scale MDPs (Bellman rank Continued) |
|
Slides, Annotated Slides |
Tianyi Bai |
TBD |
TBD |
|
Policy Optimization: Policy Gradient and Convergence |
Ch.11 |
Slides, Annotated Slides, HW2 |
Zhouliang Yu |
TBD |
TBD |
|
Policy Optimization: Global
Convergence |
Ch.12 |
Slides, Annotated Slides |
Youcheng Li |
TBD |
TBD |
|
Policy Optimization: Natural Policy Gradient (NPG) and its Global Convergence |
Ch.12 |
Slides, Annotated Slides |
Yan Zhang |
TBD |
TBD |
|
Policy Optimization: NPG and
Function Approximation |
Ch.12 |
Slides, Annotated Slides |
Fanpeng Meng |
TBD |
TBD |
|
Policy Optimization: Function
Approximation |
Ch.13 |
Slides, Annotated Slides |
Tiancheng Fang |
TBD |
TBD |
|
Control: Linear Quadratic
Regulators (LQRs) |
Ch.16 |
Slides, Annotated Slides |
TBD |
TBD |
TBD |
|
Control: Convex parameterization for linear systems and online control |
Ch.16
| Slides, Annotated Slides |
TBD |
TBD |
TBD |
|
Offline RL: Recent Advancements in Offline RL |
|
Slides, HW3 |
TBD |
TBD |
TBD |
|
Imitation Learning: Behavior Cloning and Distribution Matching |
|
Slides |
TBD |
TBD |
TBD |
|
Imitation Learning: Interactive Imitation Learning |
|
Slides |
TBD |
TBD |