diff --git a/docs/flatland_2.0.md b/docs/flatland_2.0.md index c41fea1377513721b3b96d705ddb6e4c81ccf6f4..05982babbab45e6aa0d819424c317d5ae05bb2ac 100644 --- a/docs/flatland_2.0.md +++ b/docs/flatland_2.0.md @@ -90,14 +90,14 @@ This is very common for railway networks where the initial plan usually needs to We implemted a poisson process to simulate delays by stopping agents at random times for random durations. The parameters necessary for the stochastic events can be provided when creating the environment. -``` # Use a the malfunction generator to break agents from time to time -stochastic_data = {'prop_malfunction': 0.5, # Percentage of defective agents - 'malfunction_rate': 30, # Rate of malfunction occurence - 'min_duration': 3, # Minimal duration of malfunction - 'max_duration': 10 # Max duration of malfunction - } - +``` +stochastic_data = { + 'prop_malfunction': 0.5, # Percentage of defective agents + 'malfunction_rate': 30, # Rate of malfunction occurence + 'min_duration': 3, # Minimal duration of malfunction + 'max_duration': 10 # Max duration of malfunction +} ``` The parameters are as follows: @@ -109,12 +109,23 @@ The parameters are as follows: You can introduce stochasticity by simply creating the env as follows: ``` -# Use a the malfunction generator to break agents from time to time -stochastic_data = {'prop_malfunction': 0.1, # Percentage of defective agents - 'malfunction_rate': 30, # Rate of malfunction occurence - 'min_duration': 3, # Minimal duration of malfunction - 'max_duration': 20 # Max duration of malfunction - } +env = RailEnv( + ... + stochastic_data=stochastic_data, # Malfunction data generator + ... +) +``` +In your controller, you can check whether an agent is malfunctioning: +``` +obs, rew, done, info = env.step(actions) +... +action_dict = dict() +for a in range(env.get_num_agents()): + if info['malfunction'][a] == 0: + action_dict.update({a: ...}) + +``` + # Custom observation builder tree_observation = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv()) @@ -154,6 +165,18 @@ The different speed profiles can be generated using the `schedule_generator`, wh Keep in mind that the *fastest speed* is 1 and all slower speeds must be between 1 and 0. For the submission scoring you can assume that there will be no more than 5 speed profiles. + + +Later versions of **Flat**land might have varying speeds during episodes. Therefore, we return the agent speeds. +Notice that we do not guarantee that the speed will be computed at each step, but if not costly we will return it at each step. +In your controller, you can get the agents' speed from the `info` returned by `step`: +``` +obs, rew, done, info = env.step(actions) +... +for a in range(env.get_num_agents()): + speed = info['speed'][a] +``` + ## Actions and observation with different speed levels Because the different speeds are implemented as fractions the agents ability to perform actions has been updated. @@ -166,18 +189,97 @@ This action is then executed when a step to the next cell is valid. For example - Agents can make observations at any time step. Make sure to discard observations without any information. See this [example](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/training_navigation.py) for a simple implementation. - The environment checks if agent is allowed to move to next cell only at the time of the switch to the next cell -You can check whether an action has an effect in the environment's next step: +In your controller, you can check whether an agent requires an action by checking `info`: ``` obs, rew, done, info = env.step(actions) ... action_dict = dict() for a in range(env.get_num_agents()): - if info['actionable_agents'][a]: + if info['action_required'][a] and info['malfunction'][a] == 0: action_dict.update({a: ...}) ``` -Notice that `info['actionable_agents'][a]` does not mean that the action has an effect: -if the next cell is blocked, the action cannot be performed. If the action is valid, it will be performend, though. +Notice that `info['action_required'][a]` does not mean that the action will have an effect: +if the next cell is blocked or the agent breaks down, the action cannot be performed and an action will be required again in the next step. + +## Rail Generators and Schedule Generators +The separation between rail generator and schedule generator reflects the organisational separation in the railway domain +- Infrastructure Manager (IM): is responsible for the layout and maintenance of tracks +- Railway Undertaking (RU): operates trains on the infrastructure +Usually, there is a third organisation, which ensures discrimination-free access to the infrastructure for concurrent requests for the infrastructure in a **schedule planning phase**. +However, in the **Flat**land challenge, we focus on the re-scheduling problem during live operations. + +Technically, +``` +RailGeneratorProduct = Tuple[GridTransitionMap, Optional[Any]] +RailGenerator = Callable[[int, int, int, int], RailGeneratorProduct] + +AgentPosition = Tuple[int, int] +ScheduleGeneratorProduct = Tuple[List[AgentPosition], List[AgentPosition], List[AgentPosition], List[float]] +ScheduleGenerator = Callable[[GridTransitionMap, int, Optional[Any]], ScheduleGeneratorProduct] +``` + +We can then produce `RailGenerator`s by currying: +``` +def sparse_rail_generator(num_cities=5, num_intersections=4, num_trainstations=2, min_node_dist=20, node_radius=2, + num_neighb=3, grid_mode=False, enhance_intersection=False, seed=0): + + def generator(width, height, num_agents, num_resets=0): + + # generate the grid and (optionally) some hints for the schedule_generator + ... + + return grid_map, {'agents_hints': { + 'num_agents': num_agents, + 'agent_start_targets_nodes': agent_start_targets_nodes, + 'train_stations': train_stations + }} + + return generator +``` +And, similarly, `ScheduleGenerator`s: +``` +def sparse_schedule_generator(speed_ratio_map: Mapping[float, float] = None) -> ScheduleGenerator: + def generator(rail: GridTransitionMap, num_agents: int, hints: Any = None): + # place agents: + # - initial position + # - initial direction + # - (initial) speed + # - malfunction + ... + + return agents_position, agents_direction, agents_target, speeds, agents_malfunction + + return generator +``` +Notice that the `rail_generator` may pass `agents_hints` to the `schedule_generator` which the latter may interpret. +For instance, the way the `sparse_rail_generator` generates the grid, it already determines the agent's goal and target. +Hence, `rail_generator` and `schedule_generator` have to match if `schedule_generator` presupposes some specific `agents_hints`. + +The environment's `reset` takes care of applying the two generators: +``` + def __init__(self, + ... + rail_generator: RailGenerator = random_rail_generator(), + schedule_generator: ScheduleGenerator = random_schedule_generator(), + ... + ): + self.rail_generator: RailGenerator = rail_generator + self.schedule_generator: ScheduleGenerator = schedule_generator + + def reset(self, regen_rail=True, replace_agents=True): + rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets) + + ... + + if replace_agents: + agents_hints = None + if optionals and 'agents_hints' in optionals: + agents_hints = optionals['agents_hints'] + self.agents_static = EnvAgentStatic.from_lists( + *self.schedule_generator(self.rail, self.get_num_agents(), hints=agents_hints)) +``` + ## Example code diff --git a/flatland/envs/rail_env.py b/flatland/envs/rail_env.py index 9000aaa96ce25f8de0308be9ceb0c406cc522275..848ac15aab5f9b0c933751da8f278633bb1077b8 100644 --- a/flatland/envs/rail_env.py +++ b/flatland/envs/rail_env.py @@ -218,6 +218,8 @@ class RailEnv(Environment): if replace_agents then regenerate the agents static. Relies on the rail_generator returning agent_static lists (pos, dir, target) """ + + # TODO can we not put 'self.rail_generator(..)' into 'if regen_rail or self.rail is None' condition? rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets) if optionals and 'distance_maps' in optionals: @@ -312,7 +314,9 @@ class RailEnv(Environment): if self.dones["__all__"]: self.rewards_dict = {i: r + global_reward for i, r in self.rewards_dict.items()} info_dict = { - 'actionable_agents': {i: False for i in range(self.get_num_agents())} + 'action_required': {i: False for i in range(self.get_num_agents())}, + 'malfunction': {i: 0 for i in range(self.get_num_agents())}, + 'speed': {i: 0 for i in range(self.get_num_agents())} } return self._get_observations(), self.rewards_dict, self.dones, info_dict @@ -425,18 +429,17 @@ class RailEnv(Environment): if agent.speed_data['position_fraction'] >= 1.0: - # Perform stored action to transition to the next cell + # Perform stored action to transition to the next cell as soon as cell is free cell_free, new_cell_valid, new_direction, new_position, transition_valid = \ self._check_action_on_agent(agent.speed_data['transition_action_on_cellexit'], agent) - # Check that everything is still free and that the agent can move - if all([new_cell_valid, transition_valid, cell_free]): + if all([new_cell_valid, transition_valid, cell_free]) and agent.malfunction_data['malfunction'] == 0: agent.position = new_position agent.direction = new_direction agent.speed_data['position_fraction'] = 0.0 - # else: - # # If the agent cannot move due to any reason, we set its state to not moving - # agent.moving = False + elif not transition_valid or not new_cell_valid: + # If the agent cannot move due to an invalid transition, we set its state to not moving + agent.moving = False if np.equal(agent.position, agent.target).all(): self.dones[i_agent] = True @@ -454,15 +457,20 @@ class RailEnv(Environment): for k in self.dones.keys(): self.dones[k] = True - actionable_agents = {i: self.agents[i].speed_data['position_fraction'] <= epsilon \ - for i in range(self.get_num_agents()) - } + action_required_agents = { + i: self.agents[i].speed_data['position_fraction'] <= epsilon for i in range(self.get_num_agents()) + } + malfunction_agents = { + i: self.agents[i].malfunction_data['malfunction'] for i in range(self.get_num_agents()) + } + speed_agents = {i: self.agents[i].speed_data['speed'] for i in range(self.get_num_agents())} + info_dict = { - 'actionable_agents': actionable_agents + 'action_required': action_required_agents, + 'malfunction': malfunction_agents, + 'speed': speed_agents } - for i, agent in enumerate(self.agents): - print(" {}: {}".format(i, agent.position)) return self._get_observations(), self.rewards_dict, self.dones, info_dict def _check_action_on_agent(self, action, agent): diff --git a/flatland/envs/rail_generators.py b/flatland/envs/rail_generators.py index c23593463c2b679cfd09fcbcf390c3d5a05acde4..8573c25c31f59d9ddb3d9341d891aa1eae231b8e 100644 --- a/flatland/envs/rail_generators.py +++ b/flatland/envs/rail_generators.py @@ -629,13 +629,13 @@ def sparse_rail_generator(num_cities=5, num_intersections=4, num_trainstations=2 available_nodes_full = np.delete(available_nodes_full, delete_idx, 0) # Priority city to intersection connections - if False and current_node < num_cities and len(available_intersections) > 0: + if current_node < num_cities and len(available_intersections) > 0: available_nodes = available_intersections delete_idx = np.where(available_cities == current_node) available_cities = np.delete(available_cities, delete_idx, 0) # Priority intersection to city connections - elif False and current_node >= num_cities and len(available_cities) > 0: + elif current_node >= num_cities and len(available_cities) > 0: available_nodes = available_cities delete_idx = np.where(available_intersections == current_node) available_intersections = np.delete(available_intersections, delete_idx, 0) diff --git a/tests/test_flatland_envs_sparse_rail_generator.py b/tests/test_flatland_envs_sparse_rail_generator.py index 4f481dba0b370fdb10c5494f182a68434948f4f8..4645d80aadc3eb247d8b60b1c8456fc250d7feb8 100644 --- a/tests/test_flatland_envs_sparse_rail_generator.py +++ b/tests/test_flatland_envs_sparse_rail_generator.py @@ -29,7 +29,7 @@ def test_sparse_rail_generator(): # TODO test assertions! -def test_rail_env_actionable(): +def test_rail_env_action_required_info(): np.random.seed(0) speed_ration_map = {1.: 0.25, # Fast passenger train 1. / 2.: 0.25, # Fast freight train @@ -54,55 +54,105 @@ def test_rail_env_actionable(): number_of_agents=10, obs_builder_object=GlobalObsForRailEnv()) np.random.seed(0) - env_only_if_actionable = RailEnv(width=50, - height=50, - rail_generator=sparse_rail_generator(num_cities=10, # Number of cities in map - num_intersections=10, - # Number of interesections in map - num_trainstations=50, - # Number of possible start/targets on map - min_node_dist=6, # Minimal distance of nodes - node_radius=3, - # Proximity of stations to city center - num_neighb=3, - # Number of connections to other cities - seed=5, # Random seed - grid_mode=False - # Ordered distribution of nodes - ), - schedule_generator=sparse_schedule_generator(speed_ration_map), - number_of_agents=10, - obs_builder_object=GlobalObsForRailEnv()) + env_only_if_action_required = RailEnv(width=50, + height=50, + rail_generator=sparse_rail_generator(num_cities=10, # Number of cities in map + num_intersections=10, + # Number of interesections in map + num_trainstations=50, + # Number of possible start/targets on map + min_node_dist=6, # Minimal distance of nodes + node_radius=3, + # Proximity of stations to city center + num_neighb=3, + # Number of connections to other cities + seed=5, # Random seed + grid_mode=False + # Ordered distribution of nodes + ), + schedule_generator=sparse_schedule_generator(speed_ration_map), + number_of_agents=10, + obs_builder_object=GlobalObsForRailEnv()) env_renderer = RenderTool(env_always_action, gl="PILSVG", ) for step in range(100): print("step {}".format(step)) action_dict_always_action = dict() - action_dict_only_if_actionable = dict() + action_dict_only_if_action_required = dict() # Chose an action for each agent in the environment for a in range(env_always_action.get_num_agents()): action = np.random.choice(np.arange(4)) action_dict_always_action.update({a: action}) - if step == 0 or info_only_if_actionable['actionable_agents'][a]: - action_dict_only_if_actionable.update({a: action}) + if step == 0 or info_only_if_action_required['action_required'][a]: + action_dict_only_if_action_required.update({a: action}) else: - print("[{}] not actionable {}, speed_data={}".format(step, a, env_always_action.agents[a].speed_data)) + print("[{}] not action_required {}, speed_data={}".format(step, a, env_always_action.agents[a].speed_data)) obs_always_action, rewards_always_action, done_always_action, info_always_action = env_always_action.step( action_dict_always_action) - obs_only_if_actionable, rewards_only_if_actionable, done_only_if_actionable, info_only_if_actionable = env_only_if_actionable.step( - action_dict_only_if_actionable) + obs_only_if_action_required, rewards_only_if_action_required, done_only_if_action_required, info_only_if_action_required = env_only_if_action_required.step( + action_dict_only_if_action_required) for a in range(env_always_action.get_num_agents()): - assert len(obs_always_action[a]) == len(obs_only_if_actionable[a]) + assert len(obs_always_action[a]) == len(obs_only_if_action_required[a]) for i in range(len(obs_always_action[a])): - assert np.array_equal(obs_always_action[a][i], obs_only_if_actionable[a][i]) - assert np.array_equal(rewards_always_action[a], rewards_only_if_actionable[a]) - assert np.array_equal(done_always_action[a], done_only_if_actionable[a]) - assert info_always_action['actionable_agents'][a] == info_only_if_actionable['actionable_agents'][a] + assert np.array_equal(obs_always_action[a][i], obs_only_if_action_required[a][i]) + assert np.array_equal(rewards_always_action[a], rewards_only_if_action_required[a]) + assert np.array_equal(done_always_action[a], done_only_if_action_required[a]) + assert info_always_action['action_required'][a] == info_only_if_action_required['action_required'][a] env_renderer.render_env(show=True, show_observations=False, show_predictions=False) if done_always_action['__all__']: break + + +def test_rail_env_malfunction_speed_info(): + np.random.seed(0) + stochastic_data = {'prop_malfunction': 0.5, # Percentage of defective agents + 'malfunction_rate': 30, # Rate of malfunction occurence + 'min_duration': 3, # Minimal duration of malfunction + 'max_duration': 10 # Max duration of malfunction + } + env = RailEnv(width=50, + height=50, + rail_generator=sparse_rail_generator(num_cities=10, # Number of cities in map + num_intersections=10, + # Number of interesections in map + num_trainstations=50, + # Number of possible start/targets on map + min_node_dist=6, # Minimal distance of nodes + node_radius=3, + # Proximity of stations to city center + num_neighb=3, + # Number of connections to other cities + seed=5, # Random seed + grid_mode=False # Ordered distribution of nodes + ), + schedule_generator=sparse_schedule_generator(), + number_of_agents=10, + obs_builder_object=GlobalObsForRailEnv(), + stochastic_data=stochastic_data) + + env_renderer = RenderTool(env, gl="PILSVG", ) + for step in range(100): + action_dict = dict() + # Chose an action for each agent in the environment + for a in range(env.get_num_agents()): + action = np.random.choice(np.arange(4)) + action_dict.update({a: action}) + + obs, rewards, done, info = env.step( + action_dict) + + assert 'malfunction' in info + for a in range(env.get_num_agents()): + assert info['malfunction'][a] >= 0 + assert info['speed'][a] >= 0 and info['speed'][a] <= 1 + assert info['speed'][a] == env.agents[a].speed_data['speed'] + + env_renderer.render_env(show=True, show_observations=False, show_predictions=False) + + if done['__all__']: + break