Reinforcement Learning / Fall 2025
Updates
-
2025-11-14
New Lecture is up: Temporal-Difference Learning [slides] -
2025-11-07
New Assignment released: [MP #2 - Markov Decision Process] -
2025-11-07
New Lecture is up: Monte Carlo Methods [slides] -
2025-10-31
New Lecture is up: Dynamic Programming [slides] -
2025-10-24
New Lecture is up: Markov Decision Process [slides] -
2025-10-18
New Assignment released: [MP #1 - Bandit Algorithms] -
2025-10-17
We have included the explanations of our bonus point policy in each assignment, which help you better understand how the points are added up.
Course Description
This is an undergraduate-level course on reinforcement learning. We will discuss the foundations in reinforcement learning, starting from multi-armed bandits, to Markov Decision Process, planning, on-policy and off-policy learning, and its recent development under the context of deep learning.
You can use this QR code to join our WeChat group for notifcations and group discussions, please set your in-group alias to ‘班号+姓名’, e.g., ‘计33-王宏宁’。

Time: Friday, 3:20pm-4:55pm, 四教4104.
Office Hours: In order to make our discussion prepared and more effective, please request a meeting 24 hours ahead via email or wechat.
1. Instructor, Monday, 4pm-5pm, FIT 3-520;
2. TA, Tuesday, 11am-12pm, FIT 4-504.
