Skip to content
Snippets Groups Projects
Commit c94b0dbf authored by Erik Nygren's avatar Erik Nygren :bullettrain_front:
Browse files

Merge branch '135_ObsBuilderTut_Tweaks' into 'master'

135 observation builder tutorial tweaks

See merge request flatland/flatland!135
parents b7662ea1 b36069f9
No related branches found
No related tags found
No related merge requests found
...@@ -9,7 +9,7 @@ Welcome to flatland's documentation! ...@@ -9,7 +9,7 @@ Welcome to flatland's documentation!
installation installation
about_flatland about_flatland
gettingstarted gettingstarted
getting_start_with_observationbuilder intro_observationbuilder
localevaluation localevaluation
modules modules
FAQ FAQ
......
...@@ -5,7 +5,8 @@ Getting Started with custom observations ...@@ -5,7 +5,8 @@ Getting Started with custom observations
Overview Overview
-------------- --------------
One of the main objectives of the Flatland-Challenge_ is to find a suitable observation (relevant features for the problem at hand) to solve the task. Therefore **Flatland** was build with as much flexibility as possible when it comes to building your custom observations. Observations in Flatland environments are fully customizable. Whenever an environment needs to compute new observations for each agent, it queries an object derived from the :code:`ObservationBuilder` base class, which takes the current state of the environment and returns the desired observation. One of the main objectives of the Flatland-Challenge_ is to find a suitable observation (relevant features for the problem at hand) to solve the task. Therefore **Flatland** was built with as much flexibility as possible when it comes to building your custom observations: observations in Flatland environments are fully customizable.
Whenever an environment needs to compute new observations for each agent, it queries an object derived from the :code:`ObservationBuilder` base class, which takes the current state of the environment and returns the desired observation.
.. _Flatland-Challenge: https://www.aicrowd.com/challenges/flatland-challenge .. _Flatland-Challenge: https://www.aicrowd.com/challenges/flatland-challenge
...@@ -16,9 +17,9 @@ In this first example we implement all the functions necessary for the observati ...@@ -16,9 +17,9 @@ In this first example we implement all the functions necessary for the observati
Custom observation builder objects need to derive from the `flatland.core.env_observation_builder.ObservationBuilder`_ Custom observation builder objects need to derive from the `flatland.core.env_observation_builder.ObservationBuilder`_
base class and must implement two methods, :code:`reset(self)` and :code:`get(self, handle)`. base class and must implement two methods, :code:`reset(self)` and :code:`get(self, handle)`.
.. _`flatland.core.env_observation_builder.ObservationBuilder` : https://gitlab.aicrowd.com/flatland/flatland/blob/obsbuildertut/flatland/core/env_observation_builder.py#L13 .. _`flatland.core.env_observation_builder.ObservationBuilder` : https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/core/env_observation_builder.py#L13
Following is a simple example that returns observation vectors of size :code:`observation_space = 5` featuring only the ID (handle) of the agent whose Below is a simple example that returns observation vectors of size :code:`observation_space = 5` featuring only the ID (handle) of the agent whose
observation vector is being computed: observation vector is being computed:
.. code-block:: python .. code-block:: python
...@@ -38,7 +39,7 @@ observation vector is being computed: ...@@ -38,7 +39,7 @@ observation vector is being computed:
observation = handle * np.ones((self.observation_space[0],)) observation = handle * np.ones((self.observation_space[0],))
return observation return observation
We can pass our custom observation builder :code:`SimpleObs` to the :code:`RailEnv` creator as follows: We can pass an instance of our custom observation builder :code:`SimpleObs` to the :code:`RailEnv` creator as follows:
.. code-block:: python .. code-block:: python
...@@ -48,19 +49,22 @@ We can pass our custom observation builder :code:`SimpleObs` to the :code:`RailE ...@@ -48,19 +49,22 @@ We can pass our custom observation builder :code:`SimpleObs` to the :code:`RailE
number_of_agents=3, number_of_agents=3,
obs_builder_object=SimpleObs()) obs_builder_object=SimpleObs())
Anytime :code:`env.reset()` or :code:`env.step()` is called the observation builder will return the custom observation of all agents initialized in the env. Anytime :code:`env.reset()` or :code:`env.step()` is called, the observation builder will return the custom observation of all agents initialized in the env.
In the next example we want to highlight how you can derive from already implemented observation builders and how to access internal variables of **Flatland**. In the next example we highlight how to derive from existing observation builders and how to access internal variables of **Flatland**.
Example 2 : Single-agent navigation Example 2 : Single-agent navigation
-------------- --------------
Observation builder objects can also derive existing implementations of classes derived from the ObservationBuilder Observation builder objects can of course derive from existing concrete subclasses of ObservationBuilder.
base class. For example, it may be useful to derive observations from the TreeObsForRailEnv_ implemented observation For example, it may be useful to extend the TreeObsForRailEnv_ observation builder.
builder. An advantage of this class is that on :code:`reset()`, it pre-computes the length of the shortest paths from all A feature of this class is that on :code:`reset()`, it pre-computes the lengths of the shortest paths from all
cells and orientations to the target of each agent, e.g. a distance map for each agent. cells and orientations to the target of each agent, i.e. a distance map for each agent.
In this example we want to exploit these distance maps by implementing and observation builder that shows the current shortest path for each agent as a binary observation vector of length 3, whose components represent the possible directions an agent can take (LEFT, FORWARD, RIGHT). All values of the observation vector are set to :code:`0` except for the shortest direction where it is set to :code:`1`.
Using this observation with highly engineer features indicating the agents shortest path an agent can then learn to take the corresponding action at each time-step, or we could even hardcode the optimal policy. Please do note, however, that this simple strategy fails when multiple agents are present, as each agent would only attempt its greedier solution, which is not usually Pareto-optimal in this context. In this example we exploit these distance maps by implementing an observation builder that shows the current shortest path for each agent as a one-hot observation vector of length 3, whose components represent the possible directions an agent can take (LEFT, FORWARD, RIGHT). All values of the observation vector are set to :code:`0` except for the shortest direction where it is set to :code:`1`.
Using this observation with highly engineered features indicating the agent's shortest path, an agent can then learn to take the corresponding action at each time-step; or we could even hardcode the optimal policy.
Note that this simple strategy fails when multiple agents are present, as each agent would only attempt its greedy solution, which is not usually `Pareto-optimal <https://en.wikipedia.org/wiki/Pareto_efficiency>`_ in this context.
.. _TreeObsForRailEnv: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py#L14 .. _TreeObsForRailEnv: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py#L14
...@@ -80,7 +84,7 @@ Using this observation with highly engineer features indicating the agents short ...@@ -80,7 +84,7 @@ Using this observation with highly engineer features indicating the agents short
""" """
def __init__(self): def __init__(self):
super().__init__(max_depth=0) super().__init__(max_depth=0)
# We set max_depth=0 in because we only need to look at the current position of the agent to deside what direction is shortest. # We set max_depth=0 in because we only need to look at the current position of the agent to decide what direction is shortest.
self.observation_space = [3] self.observation_space = [3]
def reset(self): def reset(self):
...@@ -88,7 +92,8 @@ Using this observation with highly engineer features indicating the agents short ...@@ -88,7 +92,8 @@ Using this observation with highly engineer features indicating the agents short
super().reset() super().reset()
def get(self, handle): def get(self, handle):
# Here we acces agent information of the instantiated environment. Any information of the environment can be accessed but not changed! # Here we access agent information from the environment.
# Information from the environment can be accessed but not changed!
agent = self.env.agents[handle] agent = self.env.agents[handle]
possible_transitions = self.env.rail.get_transitions(*agent.position, agent.direction) possible_transitions = self.env.rail.get_transitions(*agent.position, agent.direction)
...@@ -124,7 +129,7 @@ Using this observation with highly engineer features indicating the agents short ...@@ -124,7 +129,7 @@ Using this observation with highly engineer features indicating the agents short
print(obs[i]) print(obs[i])
Finally, the following is an example of hard-coded navigation for single agents that achieves optimal single-agent Finally, the following is an example of hard-coded navigation for single agents that achieves optimal single-agent
navigation to target, and show the taken path as an animation. navigation to target, and shows the path taken as an animation.
.. code-block:: python .. code-block:: python
...@@ -147,5 +152,5 @@ navigation to target, and show the taken path as an animation. ...@@ -147,5 +152,5 @@ navigation to target, and show the taken path as an animation.
env_renderer.render_env(show=True, frames=True, show_observations=False) env_renderer.render_env(show=True, frames=True, show_observations=False)
time.sleep(0.1) time.sleep(0.1)
The code examples above appear in the example file `custom_observation_example.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/custom_observation_example.py>`_. You can run it using :code:`python examples/custom_observation_example.py` from the root folder of the flatland repo. The two examples are run one after the other.
import random import random
import time
import numpy as np import numpy as np
from flatland.envs.observations import TreeObsForRailEnv from flatland.envs.observations import TreeObsForRailEnv
...@@ -94,8 +94,19 @@ obs, all_rewards, done, _ = env.step({0: 0, 1: 1}) ...@@ -94,8 +94,19 @@ obs, all_rewards, done, _ = env.step({0: 0, 1: 1})
for i in range(env.get_num_agents()): for i in range(env.get_num_agents()):
print(obs[i]) print(obs[i])
env_renderer = RenderTool(env) env = RailEnv(width=50,
env_renderer.render_env(show=True, frames=True, show_observations=False) height=50,
env_renderer.render_env(show=True, frames=True, show_observations=False) rail_generator=random_rail_generator(),
number_of_agents=1,
obs_builder_object=SingleAgentNavigationObs())
obs, all_rewards, done, _ = env.step({0: 0})
env_renderer = RenderTool(env, gl="PILSVG")
env_renderer.render_env(show=True, frames=True, show_observations=True)
for step in range(100):
action = np.argmax(obs[0])+1
obs, all_rewards, done, _ = env.step({0:action})
print("Rewards: ", all_rewards, " [done=", done, "]")
env_renderer.render_env(show=True, frames=True, show_observations=True)
time.sleep(0.1)
x = input()
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment