-
John Aslanides authored
We weren't resetting the buffer state correctly when draining trajectories. Note that several unrelated bugs initially masked this bug: - This bug only shows up in the case where `max_sequence_length` is longer than the episode length, a scenario which is not currently covered by the agent integration tests (they only run against catch: episode len = 10, max_sequence_length=32); I will fix this coverage issue in a follow-up change. - This bug causes the actor-critic agents to crash on experiments with long episode lengths (e.g. cartpole, mountain_car). These crashes don't show up obviously in high-level benchmarking/analysis (radar plot) due to the fact that crashed runs (i.e. DNFs) don't count as 'failures', and so adversely affect the score; I'll add a separate change to resolve this as well. PiperOrigin-RevId: 304403772 Change-Id: I9dfc2f1b152737e4b10d8afde681e2dadcc85a6f
f779cf56John Aslanides authoredWe weren't resetting the buffer state correctly when draining trajectories. Note that several unrelated bugs initially masked this bug: - This bug only shows up in the case where `max_sequence_length` is longer than the episode length, a scenario which is not currently covered by the agent integration tests (they only run against catch: episode len = 10, max_sequence_length=32); I will fix this coverage issue in a follow-up change. - This bug causes the actor-critic agents to crash on experiments with long episode lengths (e.g. cartpole, mountain_car). These crashes don't show up obviously in high-level benchmarking/analysis (radar plot) due to the fact that crashed runs (i.e. DNFs) don't count as 'failures', and so adversely affect the score; I'll add a separate change to resolve this as well. PiperOrigin-RevId: 304403772 Change-Id: I9dfc2f1b152737e4b10d8afde681e2dadcc85a6f
Loading