MAX_EPISODE_STEPS: Workaround?

FLATland is able to limit the max allowed number of steps. Once this is reached the "game is over". I located that we did a workaround to enable this feature. I don't think this is a good idea. (The code is since many weeks in the codebase).

I don't think we can just set all agents.done singals to true. Most of our applications just look whether all agents are done or not. If all agents are done the "main" loop stops. Thus the application ends. But with respect to the meaning is completly wrong. Then not all agents reached their goal, thus they are not done. If i like to train my multi agent system under the assumption that all each have to reach their target before a given time limit, then the system gives allways back, all agents are done. Of course the faster they reach their targets the more optimal the rewards gets. But i would prefere to introduce an explicit signal to say, max number of steps reached, game over.

https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/rail_env.py#L464

 dones["ALL"] => TRUE -> thus all agents have all dones (percentage of dones == 1)
if (self._max_episode_steps is not None) and (self._elapsed_steps >= self._max_episode_steps):
          self.dones["__all__"] = True


## Client Performance Stats
====================================================================================================
         - env_creation_wait_time_mean  :0.3616136908531189
         - internal_env_step_time_mean  :0.0005864716410984741
====================================================================================================
{'mean_reward': -15104.62, 'mean_normalized_reward': -23.24, 'mean_percentage_complete': 1.0}




{0: True, 1: True, 2: True, 3: True, 4: True, 5: True, 6: True, 7: True, 8: True, 9: False, 10: False, 11: False, 12: False, 13: False, 14: False, 15: False, 16: False, 17: False, 18: False, 19: False, 20: False, 21: False, 22: False, 23: False, 24: False, 25: False, 26: False, 27: False, 28: False, 29: False, 30: False, 31: False, 32: False, 33: False, 34: False, 35: False, 36: False, 37: False, 38: False, 39: False, 40: False, 41: False, 42: False, 43: False, 44: False, 45: False, 46: False, 47: False, 48: False, 49: False, '__all__': False}

{0: True, 1: True, 2: True, 3: True, 4: True, 5: True, 6: True, 7: True, 8: True, 9: True, 10: True, 11: True, 12: True, 13: True, 14: True, 15: True, 16: True, 17: True, 18: True, 19: True, 20: True, 21: True, 22: True, 23: True, 24: True, 25: True, 26: True, 27: True, 28: True, 29: True, 30: True, 31: True, 32: True, 33: True, 34: True, 35: True, 36: True, 37: True, 38: True, 39: True, 40: True, 41: True, 42: True, 43: True, 44: True, 45: True, 46: True, 47: True, 48: True, 49: True, '__all__': True}

Overall Message Queue Latency :  0.09819219317434237
====================================================================================================
====================================================================================================
## Server Performance Stats
====================================================================================================
         - message_queue_latency_mean   :0.09822061918025596
         - internal_env_step_time_mean  :0.0010862428976029538
====================================================================================================
####################################################################################################
EVALUATION COMPLETE !!
####################################################################################################
# Mean Reward : -15104.62
# Mean Normalized Reward : -23.24
# Mean Percentage Complete : 1.0
####################################################################################################
####################################################################################################

Edited Oct 24, 2019 by mohanty