Speaker: Prof. Chandrajit Bajaj , UT Austin
Abstract: A Hamiltonian represents the energy of a dynamical system in phase space with coordinates of position and momentum. The Hamilton’s equations of motion are obtainable as coupled symplectic differential equations. In this talk I shall show how optimized decision making (action sequences) can be obtained via a reinforcement learning problem wherein the agent interacts with the unknown environment to simultaneously learn a Hamiltonian surrogate and the optimal action sequences using Hamilton dynamics, by invoking the Pontryagin Maximum Principle. We use optimal control theory to define an optimal control gradient flow, which guides the reinforcement learning process of the agent to progressively optimize the Hamiltonian while simultaneously converging to the optimal action sequence. Extensions to stochastic Hamiltonians leading to stochastic action sequences and the free-energy principle shall also be discussed. This is joint work with Taemin Heo, Minh Nguyen.