Paper
10 November 2022 Policy improvement in dynamic programming
Zhewen Zhang
Author Affiliations +
Proceedings Volume 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022); 123483Q (2022) https://doi.org/10.1117/12.2641811
Event: 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 2022, Zhuhai, China
Abstract
Policy improvement has a long history and is the essential element in dynamic programming. The general categories of policy improvement can be divided into four aspects including: heuristic methods, approximation methods, sampling methods and numerical improvement. Paralleling with the classic policy improvement methods, several variant tools are also introduced including Lambda Policy, Path Integral, High Confidence in Policy Improvement and Finite Sample Analysis for SARSA in linear function. Moreover, the introductions of those policy improvement methods, evaluation, and comparison between them are illustrated in this paper. There are totally three perspectives where this paper dissects the evaluation from training speed, sampling efficiency and methods ability.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhewen Zhang "Policy improvement in dynamic programming", Proc. SPIE 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 123483Q (10 November 2022); https://doi.org/10.1117/12.2641811
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Monte Carlo methods

Computer programming

Statistical analysis

Numerical analysis

Algorithm development

Data modeling

Machine learning

RELATED CONTENT


Back to Top