Skip to content
  • DeepMind's avatar
    beb16302
    Calculate best episode using full episode return in cartpole_swingup. · beb16302
    DeepMind authored
    Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode.
    
    Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug).
    
    PiperOrigin-RevId: 308033113
    Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b
    beb16302
    Calculate best episode using full episode return in cartpole_swingup.
    DeepMind authored
    Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode.
    
    Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug).
    
    PiperOrigin-RevId: 308033113
    Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b
Loading