Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:17
11 May 2022

In this paper, we investigate the exploration-exploitation dilemma of reinforcement learning algorithms. We adapt the information directed sampling, an exploration framework that measures the information gain of a policy, to the continuous reinforcement learning. To stabilize the off-policy learning process and further improve the sample efficiency, we propose to use a randomized learning target and to dynamically adjust the update-to-data ratio for different parts of the neural network model. Experiments show that our approach significantly improves over existing methods and successfully completes tasks with highly sparse reward signals.

More Like This

  • PES
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00
  • PES
    Members: Free
    IEEE Members: $25.00
    Non-members: $40.00