-
DeepMind authored
Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode. Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug). PiperOrigin-RevId: 308033113 Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b
beb16302DeepMind authoredReturn is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode. Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug). PiperOrigin-RevId: 308033113 Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b
Loading