-
CIS
IEEE Members: Free
Non-members: FreeLength: 00:48:52
Approximation of cost functions within a low-dimensional space of basis functions in a major approach in approximation dynamic programming. It may be implemented by well-established methods such as temporal differences, aggregation, and Bellman error. We show that all of these methods can be viewed within a unified framework, based on an extended form of Galerkin approximation approach that involves projected equations. However, there are two major differences: the first is that the implementation is simulation-based, and the second is that the projection is done using a (weighted) Euclidean seminorm (rather than norm). This extension carries over to weighted multistep projected Bellman equations, similar to those of multistep TD(?)-type methods. An important new feature is that the associated weights need not be geometrically distributed and may be state-dependent. This allows greater flexibility to design simulation methods with desirable bias-variance and exploration characteristics, in the context of standard and optimistic policy iteration methods. Moreover, it allows us to establish a close connection between projected equation and aggregation methods, and to develop for the first time multistep aggregation methods of the TD(?)-type