evaluator service timeout - return done all and -1 reward
instead of terminating the session, the episode should be terminated by setting done[__all__
] and reward should be minus 1.
If possible also merge the episode action saving code from evaluator2 in neurips2020_flatland_baselines.
Questions:
- can the done all response to step be generated outside of the env?
- or do we need genuine observations?
- should clients be able to handle empty observations for done agents?