EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING

Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Xiaotian Gao

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:17

11 May 2022

In this paper, we investigate the exploration-exploitation dilemma of reinforcement learning algorithms. We adapt the information directed sampling, an exploration framework that measures the information gain of a policy, to the continuous reinforcement learning. To stabilize the off-policy learning process and further improve the sample efficiency, we propose to use a randomized learning target and to dynamically adjust the update-to-data ratio for different parts of the neural network model. Experiments show that our approach significantly improves over existing methods and successfully completes tasks with highly sparse reward signals.

Tags:

exploration

policy

reinforcement learning

EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING

Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Xiaotian Gao

Value-Added Bundle(s) Including this Product

ICASSP 2022, May 2022 Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

KEYNOTE: Designing and playing games with computational intelligence

Policy and Regulatory Pathways to Support Grid Modernization Investments (slides)

Hydrogen Integration: Policy, Research,and Implementation Insights (slides)

Join an IEEE Society