Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 10:24
26 Oct 2020

We present a general model for actor-critic methods that represent the possibility of combining value function estimations as a means to further reduce the policy gradient's variance and improve the learning result. We show the potential of this architecture by implementing an example case to learn some of the Pybullet continuous control robotic tasks with OpenAI Gym. We show by experimenting with a special case the effect of the external parameters on the overall performance of the policy optimization algorithm.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00