Merge branch 'update_rst_docs' into 'master'

Update rst docs See merge request flatland/flatland!140

Merge branch 'update_rst_docs' into 'master'
Update rst docs See merge request flatland/flatland!140
a447381f · Erik Nygren · 4df3a938 · 88432963 · a447381f
Commit a447381f authored 5 years ago by Erik Nygren
--- a/docs/intro_observationbuilder.rst
+++ b/docs/intro_observationbuilder.rst
@@ -84,7 +84,8 @@ Note that this simple strategy fails when multiple agents are present, as each a
        """
        def __init__(self):
            super().__init__(max_depth=0)
-            # We set max_depth=0 in because we only need to look at the current position of the agent to decide what direction is shortest.
+            # We set max_depth=0 in because we only need to look at the current 
+            # position of the agent to decide what direction is shortest.
            self.observation_space = [3]
        def reset(self):
@@ -120,7 +121,8 @@ Note that this simple strategy fails when multiple agents are present, as each a
    env = RailEnv(width=7,
                  height=7,
-                  rail_generator=complex_rail_generator(nr_start_goal=10, nr_extra=1, min_dist=8, max_dist=99999, seed=0),
+                  rail_generator=complex_rail_generator(nr_start_goal=10, nr_extra=1, \
+                    min_dist=8, max_dist=99999, seed=0),
                  number_of_agents=2,
                  obs_builder_object=SingleAgentNavigationObs())
@@ -154,3 +156,138 @@ navigation to target, and shows the path taken as an animation.
 The code examples above appear in the example file `custom_observation_example.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/custom_observation_example.py>`_. You can run it using :code:`python examples/custom_observation_example.py` from the root folder of the flatland repo.  The two examples are run one after the other.
+Example 3 : Using custom predictors and rendering observation
+--------------
+Because the re-scheduling task of the Flatland-Challenge_ requires some short time planning we allow the possibility to use custom predictors that help predict upcoming conflicts and help agent solve them in a timely manner.
+In the **Flatland Environment** we included an initial predictor ShortestPathPredictorForRailEnv_ to give you an idea what you can do with these predictors.
+Any custom predictor can be passed to the observation builder and then be used to build the observation. In this example_ we illustrate how an observation builder can be used to detect conflicts using a predictor.
+The observation is incomplete as it only contains information about potential conflicts and has no feature about the agent objectives.
+In addition to using your custom predictor you can also make your custom observation ready for rendering. (This can be done in a similar way for your predictor).
+All you need to do in order to render your custom observation is to populate  :code:`self.env.dev_obs_dict[handle]` for every agent (all handles). (For the predictor use  :code:`self.env.dev_pred_dict[handle]`).
+In contrast to the previous examples we also implement the :code:`def get_many(self, handles=None)` function for this custom observation builder. The reasoning here is that we want to call the predictor only once per :code:`env.step()`. The base implementation of :code:`def get_many(self, handles=None)` will call the :code:`get(handle)` function for all handles, which mean that it normally does not need to be reimplemented, except for cases as the one below.
+.. _ShortestPathPredictorForRailEnv: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/predictions.py#L81
+.. _example: https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/custom_observation_example.py#L110
+.. code-block:: python
+    class ObservePredictions(TreeObsForRailEnv):
+        """
+        We use the provided ShortestPathPredictor to illustrate the usage of predictors in your custom observation.
+        We derive our observation builder from TreeObsForRailEnv, to exploit the existing implementation to compute
+        the minimum distances from each grid node to each agent's target.
+        This is necessary so that we can pass the distance map to the ShortestPathPredictor
+        Here we also want to highlight how you can visualize your observation
+        """
+        def __init__(self, predictor):
+            super().__init__(max_depth=0)
+            self.observation_space = [10]
+            self.predictor = predictor
+        def reset(self):
+            # Recompute the distance map, if the environment has changed.
+            super().reset()
+        def get_many(self, handles=None):
+            '''
+            Because we do not want to call the predictor seperately for every agent we implement the get_many function
+            Here we can call the predictor just ones for all the agents and use the predictions to generate our observations
+            :param handles:
+            :return:
+            '''
+            self.predictions = self.predictor.get(custom_args={'distance_map': self.distance_map})
+            self.predicted_pos = {}
+            for t in range(len(self.predictions[0])):
+                pos_list = []
+                for a in handles:
+                    pos_list.append(self.predictions[a][t][1:3])
+                # We transform (x,y) coodrinates to a single integer number for simpler comparison
+                self.predicted_pos.update({t: coordinate_to_position(self.env.width, pos_list)})
+            observations = {}
+            # Collect all the different observation for all the agents
+            for h in handles:
+                observations[h] = self.get(h)
+            return observations
+        def get(self, handle):
+            '''
+            Lets write a simple observation which just indicates whether or not the own predicted path
+            overlaps with other predicted paths at any time. This is useless for the task of navigation but might
+            help when looking for conflicts. A more complex implementation can be found in the TreeObsForRailEnv class
+            Each agent recieves an observation of length 10, where each element represents a prediction step and its value
+            is:
+             - 0 if no overlap is happening
+             - 1 where n i the number of other paths crossing the predicted cell
+            :param handle: handeled as an index of an agent
+            :return: Observation of handle
+            '''
+            observation = np.zeros(10)
+            # We are going to track what cells where considered while building the obervation and make them accesible
+            # For rendering
+            visited = set()
+            for _idx in range(10):
+                # Check if any of the other prediction overlap with agents own predictions
+                x_coord = self.predictions[handle][_idx][1]
+                y_coord = self.predictions[handle][_idx][2]
+                # We add every observed cell to the observation rendering
+                visited.add((x_coord, y_coord))
+                if self.predicted_pos[_idx][handle] in np.delete(self.predicted_pos[_idx], handle, 0):
+                    # We detect if another agent is predicting to pass through the same cell at the same predicted time
+                    observation[handle] = 1
+            # This variable will be access by the renderer to visualize the observation
+            self.env.dev_obs_dict[handle] = visited
+            return observation
+We can then use this new observation builder and the renderer to visualize the observation of each agent.
+.. code-block:: python
+    # Initiate the Predictor
+    CustomPredictor = ShortestPathPredictorForRailEnv(10)
+    # Pass the Predictor to the observation builder
+    CustomObsBuilder = ObservePredictions(CustomPredictor)
+    # Initiate Environment
+    env = RailEnv(width=10,
+                  height=10,
+                  rail_generator=complex_rail_generator(nr_start_goal=5, nr_extra=1, min_dist=8, max_dist=99999, seed=0),
+                  number_of_agents=3,
+                  obs_builder_object=CustomObsBuilder)
+    obs = env.reset()
+    env_renderer = RenderTool(env, gl="PILSVG")
+    # We render the initial step and show the obsered cells as colored boxes
+    env_renderer.render_env(show=True, frames=True, show_observations=True, show_predictions=False)
+    action_dict = {}
+    for step in range(100):
+        for a in range(env.get_num_agents()):
+            action = np.random.randint(0, 5)
+            action_dict[a] = action
+        obs, all_rewards, done, _ = env.step(action_dict)
+        print("Rewards: ", all_rewards, "  [done=", done, "]")
+        env_renderer.render_env(show=True, frames=True, show_observations=True, show_predictions=False)
+        time.sleep(0.5)