We study the exploratory Hamilton--Jacobi--Bellman (HJB) equation arising from the entropy-regularized exploratory control problem, which was formulated by Wang, Zariphopoulou and Zhou (J. Mach. Learn. Res., 21, 2020) in the context of reinforcement learning in continuous time and space. We establish the well-posedness and regularity of the viscosity solution to the equation, and we derive an explicit rate of convergence for this problem as exploration diminishes to zero. If time permitted, I would also discuss the analysis of the policy iteration algorithm used to study the control problem. These are joint works with Xun Yu Zhou, Hung Tran and Wenpin Tang.
学术海报.pdf