In this paper we investigate the convergence of the Policy Iteration Algorithm (PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for the iterative PDEs involved in the algorithm (see, e.g., Huang-Wang-Zhou (2023)), we shall provide a simple proof from scratch for the convergence of the PIA. Our approach builds on probabilistic representation formulae for solutions of PDEs and their derivatives. Moreover, in the finite horizon model and in the infinite horizon model with large discount factor, the similar arguments lead to a super-exponential rate of convergence without tear. Finally, with some extra efforts we show that our approach can be extended to the diffusion control case in the one dimensional setting, also with a super-exponential rate of convergence. In addition, building upon this approach we address model uncertainty and propose a new algorithm that does not require the explicit functional form of the coefficients, relying instead on observable values. This significantly enhances the applicability of our results in practical scenarios.
Gaozhan Wang20250526.pdf