导航
学术报告|
当前位置:首页  科学研究  学术报告
报告题目: q-Learning in Continuous Time
报 告 人: 周迅宇 教授
报告人所在单位: 纽约哥伦比亚大学
报告日期: 2023-04-25
报告时间: 14:00-15:00
报告地点: 腾讯会议:175 998 314 会议密码:123456
   
报告摘要:

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized,exploratory diffusion process formulation. As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function. This function is related to the instantaneous advantage rate function as well as  the Hamiltonian. We develop a ``q-learning theory around the q-function that is independent of time discretization. We jointly characterize the q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor--critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time  algorithm proposed in Jia and Zhou (2022). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms and time-discretized conventional Q-learning algorithms. This is a joint work with Yanwei Jia. 

学术海报.pdf

   
本年度学院报告总序号: 790

Copyright © |2012 复旦大学数学科学学院版权所有 沪ICP备042465