Skip to content
  • John Aslanides's avatar
    f779cf56
    Fix sequence buffer bug and add some test coverage. · f779cf56
    John Aslanides authored
    We weren't resetting the buffer state correctly when draining trajectories.
    
    Note that several unrelated bugs initially masked this bug:
    - This bug only shows up in the case where `max_sequence_length` is longer than the episode length, a scenario which is not currently covered by the agent integration tests (they only run against catch: episode len = 10, max_sequence_length=32); I will fix this coverage issue in a follow-up change.
    - This bug causes the actor-critic agents to crash on experiments with long episode lengths (e.g. cartpole, mountain_car). These crashes don't show up obviously in high-level benchmarking/analysis (radar plot) due to the fact that crashed runs (i.e. DNFs) don't count as 'failures', and so adversely affect the score; I'll add a separate change to resolve this as well.
    
    PiperOrigin-RevId: 304403772
    Change-Id: I9dfc2f1b152737e4b10d8afde681e2dadcc85a6f
    f779cf56
    Fix sequence buffer bug and add some test coverage.
    John Aslanides authored
    We weren't resetting the buffer state correctly when draining trajectories.
    
    Note that several unrelated bugs initially masked this bug:
    - This bug only shows up in the case where `max_sequence_length` is longer than the episode length, a scenario which is not currently covered by the agent integration tests (they only run against catch: episode len = 10, max_sequence_length=32); I will fix this coverage issue in a follow-up change.
    - This bug causes the actor-critic agents to crash on experiments with long episode lengths (e.g. cartpole, mountain_car). These crashes don't show up obviously in high-level benchmarking/analysis (radar plot) due to the fact that crashed runs (i.e. DNFs) don't count as 'failures', and so adversely affect the score; I'll add a separate change to resolve this as well.
    
    PiperOrigin-RevId: 304403772
    Change-Id: I9dfc2f1b152737e4b10d8afde681e2dadcc85a6f
Loading