Speaker Bio: Zhihan is a fourth-year PhD student in the Paul G. Allen School of Computer Science & Engineering at University of Washington, advised by Prof. Maryam Fazel. His research interests are broadly in statistics, optimization and machine learning.
Abstract: We propose Compatible Mirror Policy Optimization (CoMPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$ norm to measure function approximation errors (regardless of the mirror map), CoMPO uses the Bregman divergence induced by the specific mirror map for policy projection. Such a compatibility bridges the gap between theory and practice: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing them strong convergence guarantees.