Inconsistent rewards
The Rewards given by the step function in rail_env return should return dictionaries but in two cases they return lists. These cases occur when the all flag is set in "dones". This can be fixed by replacing the list comprehensions in lines 293 and 188 with dictionary comprehensions in flatland/envs/rail_env.py
293:
self.rewards_dict = [0 * r + global_reward for r in self.rewards_dict]
-> self.rewards_dict = {i:0 * r + global_reward for i,r in self.rewards_dict.items()}
188:
self.rewards_dict = [r + global_reward for r in self.rewards_dict]
-> self.rewards_dict = {i:r + global_reward for i,r in self.rewards_dict.items()}