bsuite/experiments/cartpole_swingup/cartpole_swingup.py · master · AIcrowd / research / bsuite

Apr 23, 2020

DeepMind authored Apr 23, 2020

Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode.

Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug).

PiperOrigin-RevId: 308033113
Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b

beb16302

DeepMind authored Apr 23, 2020

Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode.

Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug).

PiperOrigin-RevId: 308033113
Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b