Compare revisions

a6c4ae6a · a6c4ae6a · a6c4ae6a · a6c4ae6a · a6c4ae6a · a6c4ae6a
--- a/docs/gettingstarted.rst
+++ b/docs/gettingstarted.rst
-=====
-Getting Started
-=====
+Getting Started Tutorial
+========================

 Overview
--------------
+--------

-Following are three short tutorials to help new users get acquainted with how 
-to create RailEnvs, how to train simple DQN agents on them, and how to customize 
+Following are three short tutorials to help new users get acquainted with how
+to create RailEnvs, how to train simple DQN agents on them, and how to customize
 them.

 To use flatland in a project:
@@ -16,20 +15,20 @@ To use flatland in a project:
    import flatland


-Part 1 : Basic Usage
--------------
+Simple Example 1 : Basic Usage
+------------------------------
+The basic usage of RailEnv environments consists in creating a RailEnv object
+endowed with a rail generator, that generates new rail networks on each reset,
+and an observation generator object, that is supplied with environment-specific
+information at each time step and provides a suitable observation vector to the
+agents. After the RailEnv environment is created, one need to call reset() on the
+environment in order to fully initialize the environment

-The basic usage of RailEnv environments consists in creating a RailEnv object 
-endowed with a rail generator, that generates new rail networks on each reset, 
-and an observation generator object, that is supplied with environment-specific 
-information at each time step and provides a suitable observation vector to the 
-agents.
+The simplest rail generators are envs.rail_generators.rail_from_manual_specifications_generator
+and envs.rail_generators.random_rail_generator.

-The simplest rail generators are envs.generators.rail_from_manual_specifications_generator 
-and envs.generators.random_rail_generator.
-
-The first one accepts a list of lists whose each element is a 2-tuple, whose 
-entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and 
+The first one accepts a list of lists whose each element is a 2-tuple, whose
+entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and
 the desired clockwise rotation of the cell contents (0, 90, 180 or 270 degrees).
 For example,

@@ -45,9 +44,10 @@ For example,
                  rail_generator=rail_from_manual_specifications_generator(specs),
                  number_of_agents=1,
                  obs_builder_object=TreeObsForRailEnv(max_depth=2))
+    env.reset()

-Alternatively, a random environment can be generated (optionally specifying 
-weights for each cell type to increase or decrease their proportion in the 
+Alternatively, a random environment can be generated (optionally specifying
+weights for each cell type to increase or decrease their proportion in the
 generated rail networks).

 .. code-block:: python
@@ -64,7 +64,7 @@ generated rail networks).
                              0.2,  # Case 8 - turn left
                              0.2,  # Case 9 - turn right
                              1.0]  # Case 10 - mirrored switch
-    
+
    # Example generate a random rail
    env = RailEnv(width=10,
                  height=10,
@@ -73,6 +73,7 @@ generated rail networks).
                            ),
                  number_of_agents=3,
                  obs_builder_object=TreeObsForRailEnv(max_depth=2))
+    env.reset()

 Environments can be rendered using the utils.rendertools utilities, for example:

@@ -82,8 +83,8 @@ Environments can be rendered using the utils.rendertools utilities, for example:
    env_renderer.render_env(show=True)


-Finally, the environment can be run by supplying the environment step function 
-with a dictionary of actions whose keys are agents' handles (returned by 
+Finally, the environment can be run by supplying the environment step function
+with a dictionary of actions whose keys are agents' handles (returned by
 env.get_agent_handles() ) and the corresponding values the selected actions.
 For example, for a 2-agents environment:

@@ -93,34 +94,32 @@ For example, for a 2-agents environment:
    action_dict = {handles[0]:0, handles[1]:0}
    obs, all_rewards, done, _ = env.step(action_dict)

-where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents' 
-handles, whose values correspond to the relevant observations, rewards and terminal 
-status for each agent. Further, the 'dones' dictionary returns an extra key 
+where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents'
+handles, whose values correspond to the relevant observations, rewards and terminal
+status for each agent. Further, the 'dones' dictionary returns an extra key
 '__all__' that is set to True after all agents have reached their goals.


-In the specific case a TreeObsForRailEnv observation builder is used, it is 
-possible to print a representation of the returned observations with the 
+In the specific case a TreeObsForRailEnv observation builder is used, it is
+possible to print a representation of the returned observations with the
 following code. Also, tree observation data is displayed by RenderTool by default.

 .. code-block:: python

    for i in range(env.get_num_agents()):
        env.obs_builder.util_print_obs_subtree(
-                tree=obs[i], 
-                num_features_per_node=5
+                tree=obs[i],
                )

-The complete code for this part of the Getting Started guide can be found in 
+The complete code for this part of the Getting Started guide can be found in

 * `examples/simple_example_1.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_1.py>`_
 * `examples/simple_example_2.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_2.py>`_
-* `examples/simple_example_3.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_3.py>`_
-


 Part 2 : Training a Simple an Agent on Flatland
--------------
+---------------------------------------------------------
+
 This is a brief tutorial on how to train an agent on Flatland.
 Here we use a simple random agent to illustrate the process on how to interact with the environment.
 The corresponding code can be found in examples/training_example.py and in the baselines repository
@@ -130,7 +129,8 @@ We start by importing the necessary Flatland libraries

 .. code-block:: python

-    from flatland.envs.generators import complex_rail_generator
+    from flatland.envs.rail_generators import complex_rail_generator
+    from flatland.envs.schedule_generators import complex_schedule_generator
    from flatland.envs.rail_env import RailEnv

 The complex_rail_generator is used in order to guarantee feasible railway network configurations for training.
@@ -141,32 +141,33 @@ Next we configure the difficulty of our task by modifying the complex_rail_gener
    env = RailEnv(  width=15,
                    height=15,
                    rail_generator=complex_rail_generator(
-                                        nr_start_goal=10, 
-                                        nr_extra=10, 
-                                        min_dist=10, 
-                                        max_dist=99999, 
-                                        seed=0),
+                                        nr_start_goal=10,
+                                        nr_extra=10,
+                                        min_dist=10,
+                                        max_dist=99999,
+                                        seed=1),
                    number_of_agents=5)
-              
+    env.reset()
+
 The difficulty of a railway network depends on the dimensions (`width` x `height`) and the number of agents in the network.
 By varying the number of start and goal connections (nr_start_goal) and the number of extra railway elements added (nr_extra)
 the number of alternative paths of each agents can be modified. The more possible paths an agent has to reach its target the easier the task becomes.
 Here we don't specify any observation builder but rather use the standard tree observation. If you would like to use a custom obervation please follow
- the instructions in the next tutorial.
-Feel free to vary these parameters to see how your own agent holds up on different setting. The evalutation set of railway configurations will 
+the instructions in the next tutorial.
+Feel free to vary these parameters to see how your own agent holds up on different setting. The evalutation set of railway configurations will
 cover the whole spectrum from easy to complex tasks.

 Once we are set with the environment we can load our preferred agent from either RLlib or any other ressource. Here we use a random agent to illustrate the code.

 .. code-block:: python

-    agent = RandomAgent(env.action_space, env.observation_space)
+    agent = RandomAgent(state_size, action_size)

 We start every trial by resetting the environment

 .. code-block:: python

-    obs = env.reset()
+    obs, info = env.reset()

 Which provides the initial observation for all agents (obs = array of all observations).
 In order for the environment to step forward in time we need a dictionar of actions for all active agents.
@@ -182,80 +183,8 @@ This dictionary is then passed to the environment which checks the validity of a
 .. code-block:: python

    next_obs, all_rewards, done, _ = env.step(action_dict)
-    
+
 The environment returns an array of new observations, reward dictionary for all agents as well as a flag for which agents are done.
 This information can be used to update the policy of your agent and if done['__all__'] == True the episode terminates.

-Part 3 : Customizing Observations and Level Generators
--------------
-
-Example code for generating custom observations given a RailEnv and to generate 
-random rail maps are available in examples/custom_observation_example.py and 
-examples/custom_railmap_example.py .
-
-Custom observations can be produced by deriving a new object from the 
-core.env_observation_builder.ObservationBuilder base class, for example as follows:
-
-.. code-block:: python
-
-    class CustomObs(ObservationBuilder):
-        def __init__(self):
-            self.observation_space = [5]
-    
-        def reset(self):
-            return
-    
-        def get(self, handle):
-            observation = handle*np.ones((5,))
-            return observation
-
-It is important that an observation_space is defined with a list of dimensions 
-of the returned observation tensors. get() returns the observation for each agent, 
-of handle 'handle'.
-
-A RailEnv environment can then be created as usual:
-
-.. code-block:: python
-
-    env = RailEnv(width=7,
-                  height=7,
-                  rail_generator=random_rail_generator(),
-                  number_of_agents=3,
-                  obs_builder_object=CustomObs())
-
-As for generating custom rail maps, the RailEnv class accepts a rail_generator 
-argument that must be a function with arguments `width`, `height`, `num_agents`, 
-and `num_resets=0`, and that has to return a GridTransitionMap object (the rail map),
-and three lists of tuples containing the (row,column) coordinates of each of 
-num_agent agents, their initial orientation **(0=North, 1=East, 2=South, 3=West)**, 
-and the position of their targets.
-
-For example, the following custom rail map generator returns an empty map of 
-size (height, width), with no agents (regardless of num_agents):
-
-.. code-block:: python
-
-    def custom_rail_generator():
-        def generator(width, height, num_agents=0, num_resets=0):
-            rail_trans = RailEnvTransitions()
-            grid_map = GridTransitionMap(width=width, height=height, transitions=rail_trans)
-            rail_array = grid_map.grid
-            rail_array.fill(0)
-    
-            agents_positions = []
-            agents_direction = []
-            agents_target = []
-    
-            return grid_map, agents_positions, agents_direction, agents_target
-        return generator
-
-It is worth to note that helpful utilities to manage RailEnv environments and their 
-related data structures are available in 'envs.env_utils'. In particular, 
-envs.env_utils.get_rnd_agents_pos_tgt_dir_on_rail is fairly handy to fill in 
-random (but consistent) agents along with their targets and initial directions, 
-given a rail map (GridTransitionMap object) and the desired number of agents:
-
-.. code-block:: python
-    agents_position, agents_direction, agents_target = get_rnd_agents_pos_tgt_dir_on_rail(
-        rail_map,
-        num_agents)
+The full source code of this example can be found in `examples/training_example.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/training_example.py>`_.
--- a/docs/tutorials/02_observationbuilder.rst
+++ b/docs/tutorials/02_observationbuilder.rst
--- a/docs/tutorials/03_rail_and_schedule_generator.md
+++ b/docs/tutorials/03_rail_and_schedule_generator.md
+# Level Generation Tutorial
+
+We are currently working on different new level generators and you can expect that the levels in the submission testing will not all come from just one but rather different level generators to be sure that the controllers can handle any railway specific challenge.
+
+Let's have a look at the `sparse_rail_generator`.
+
+## Sparse Rail Generator
+![Example_Sparse](https://i.imgur.com/DP8sIyx.png)
+
+The idea behind the sparse rail generator is to mimic classic railway structures where dense nodes (cities) are sparsely connected to each other and where you have to manage traffic flow between the nodes efficiently.
+The cities in this level generator are much simplified in comparison to real city networks but it mimics parts of the problems faced in daily operations of any railway company.
+
+There are a few parameters you can tune to build your own map and test different complexity levels of the levels.
+**Warning** some combinations of parameters do not go well together and will lead to infeasible level generation.
+In the worst case, the level generator currently issues a warning when it cannot build the environment according to the parameters provided.
+This will lead to a crash of the whole env.
+We are currently working on improvements here and are **happy for any suggestions from your side**.
+
+To build an environment you instantiate a `RailEnv` as follows:
+
+```python
+ Initialize the generator
+rail_generator=sparse_rail_generator(
+    num_cities=10,  # Number of cities in map
+    num_intersections=10,  # Number of interesections in map
+    num_trainstations=50,  # Number of possible start/targets on map
+    min_node_dist=6,  # Minimal distance of nodes
+    node_radius=3,  # Proximity of stations to city center
+    num_neighb=3,  # Number of connections to other cities
+    seed=5,  # Random seed
+    grid_mode=False  # Ordered distribution of nodes
+)
+
+ Build the environment
+env = RailEnv(
+    width=50,
+    height=50,
+    rail_generator=rail_generator
+    schedule_generator=sparse_schedule_generator(),
+    number_of_agents=10,
+    obs_builder_object=TreeObsForRailEnv(max_depth=3,predictor=shortest_path_predictor)
+)
+ Call reset on the environment
+env.reset()
+```
+
+You can see that you now need both a `rail_generator` and a `schedule_generator` to generate a level. These need to work nicely together. The `rail_generator` will only generate the railway infrastructure and provide hints to the `schedule_generator` about where to place agents. The `schedule_generator` will then generate a schedule, meaning it places agents at different train stations and gives them tasks by providing individual targets.
+
+You can tune the following parameters in the `sparse_rail_generator`:
+
+- `num_cities` is the number of cities on a map. Cities are the only nodes that can host start and end points for agent tasks (Train stations). Here you have to be carefull that the number is not too high as all the cities have to fit on the map. When `grid_mode=False` you have to be carefull when chosing `min_node_dist` because leves will fails if not all cities (and intersections) can be placed with at least `min_node_dist` between them.
+- `num_intersections` is the number of nodes that don't hold any trainstations. They are also the first priority that a city connects to. We use these to allow for sparse connections between cities.
+- `num_trainstations` defines the *Total* number of trainstations in the network. This also sets the max number of allowed agents in the environment. This is also a delicate parameter as there is only a limitid amount of space available around nodes and thus if the number is too high the level generation will fail. *Important*: Only the number of agents provided to the environment will actually produce active train stations. The others will just be present as dead-ends (See figures below).
+- `min_node_dist` is only used if `grid_mode=False` and represents the minimal distance between two nodes.
+- `node_radius` defines the extent of a city. Each trainstation is placed at a distance to the closes city node that is smaller or equal to this number.
+- `num_neighb`defines the number of neighbouring nodes that connect to each other. Thus this changes the connectivity and thus the amount of alternative routes in the network.
+- `grid_mode` True -> Nodes evenly distriubted in env, False-> Random distribution of nodes
+- `enhance_intersection`: True -> Extra rail elements added at intersections
+- `seed` is used to initialize the random generator
+
+
+If you run into any bugs with sets of parameters please let us know.
+
+Here is a network with `grid_mode=False` and the parameters from above.
+
+![sparse_random](https://i.imgur.com/Xg7nifF.png)
+
+and here with `grid_mode=True`
+
+![sparse_ordered](https://i.imgur.com/jyA7Pt4.png)
+
+## Example code
+
+To see all the changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/flatland_2_0_example.py).
--- a/docs/tutorials/04_stochasticity.md
+++ b/docs/tutorials/04_stochasticity.md
+# Stochasticity Tutorial
+
+Another area where we improved **Flat**land 2.0 are stochastic events added during the episodes.
+This is very common for railway networks where the initial plan usually needs to be rescheduled during operations as minor events such as delayed departure from trainstations, malfunctions on trains or infrastructure or just the weather lead to delayed trains.
+
+We implemted a poisson process to simulate delays by stopping agents at random times for random durations. The parameters necessary for the stochastic events can be provided when creating the environment.
+
+```python
+# Use a the malfunction generator to break agents from time to time
+
+stochastic_data = {
+    'prop_malfunction': 0.5,  # Percentage of defective agents
+    'malfunction_rate': 30,  # Rate of malfunction occurence
+    'min_duration': 3,  # Minimal duration of malfunction
+    'max_duration': 10  # Max duration of malfunction
+}
+```
+
+The parameters are as follows:
+
+- `prop_malfunction` is the proportion of agents that can malfunction. `1.0` means that each agent can break.
+- `malfunction_rate` is the mean rate of the poisson process in number of environment steps.
+- `min_duration` and `max_duration` set the range of malfunction durations. They are sampled uniformly
+
+You can introduce stochasticity by simply creating the env as follows:
+
+```python
+env = RailEnv(
+    ...
+    stochastic_data=stochastic_data,  # Malfunction data generator
+    ...
+)
+```
+In your controller, you can check whether an agent is malfunctioning:
+```python
+obs, rew, done, info = env.step(actions)
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['malfunction'][a] == 0:
+        action_dict.update({a: ...})
+
+# Custom observation builder
+tree_observation = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv())
+
+# Different agent types (trains) with different speeds.
+speed_ration_map = {1.: 0.25,  # Fast passenger train
+                    1. / 2.: 0.25,  # Fast freight train
+                    1. / 3.: 0.25,  # Slow commuter train
+                    1. / 4.: 0.25}  # Slow freight train
+
+env = RailEnv(width=50,
+              height=50,
+              rail_generator=sparse_rail_generator(num_cities=20,  # Number of cities in map (where train stations are)
+                                                   num_intersections=5,  # Number of intersections (no start / target)
+                                                   num_trainstations=15,  # Number of possible start/targets on map
+                                                   min_node_dist=3,  # Minimal distance of nodes
+                                                   node_radius=2,  # Proximity of stations to city center
+                                                   num_neighb=4,  # Number of connections to other cities/intersections
+                                                   seed=15,  # Random seed
+                                                   grid_mode=True,
+                                                   enhance_intersection=True
+                                                   ),
+              schedule_generator=sparse_schedule_generator(speed_ration_map),
+              number_of_agents=10,
+              stochastic_data=stochastic_data,  # Malfunction data generator
+              obs_builder_object=tree_observation)
+env.reset()
+```
+
+You will quickly realize that this will lead to unforeseen difficulties which means that **your controller** needs to observe the environment at all times to be able to react to the stochastic events.
+
+## Example code
+
+To see all the changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/flatland_2_0_example.py).
--- a/docs/tutorials/05_multispeed.md
+++ b/docs/tutorials/05_multispeed.md
+# Different speed profiles Tutorial
+
+One of the main contributions to the complexity of railway network operations stems from the fact that all trains travel at different speeds while sharing a very limited railway network.
+In **Flat**land 2.0 this feature will be enabled as well and will lead to much more complex configurations. Here we count on your support if you find bugs or improvements  :).
+
+The different speed profiles can be generated using the `schedule_generator`, where you can actually chose as many different speeds as you like.
+Keep in mind that the *fastest speed* is 1 and all slower speeds must be between 1 and 0.
+For the submission scoring you can assume that there will be no more than 5 speed profiles.
+
+
+
+Later versions of **Flat**land might have varying speeds during episodes. Therefore, we return the agent speeds.
+Notice that we do not guarantee that the speed will be computed at each step, but if not costly we will return it at each step.
+In your controller, you can get the agents' speed from the `info` returned by `step`:
+```python
+obs, rew, done, info = env.step(actions)
+...
+for a in range(env.get_num_agents()):
+    speed = info['speed'][a]
+```
+
+## Actions and observation with different speed levels
+
+Because the different speeds are implemented as fractions the agents ability to perform actions has been updated.
+We **do not allow actions to change within the cell **.
+This means that each agent can only chose an action to be taken when entering a cell.
+This action is then executed when a step to the next cell is valid. For example
+
+- Agent enters switch and choses to deviate left. Agent fractional speed is 1/4 and thus the agent will take 4 time steps to complete its journey through the cell. On the 4th time step the agent will leave the cell deviating left as chosen at the entry of the cell.
+    - All actions chosen by the agent during its travels within a cell are ignored
+    - Agents can make observations at any time step. Make sure to discard observations without any information. See this [example](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/training_navigation.py) for a simple implementation.
+- The environment checks if agent is allowed to move to next cell only at the time of the switch to the next cell
+
+In your controller, you can check whether an agent requires an action by checking `info`:
+```python
+obs, rew, done, info = env.step(actions)
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['action_required'][a] and info['malfunction'][a] == 0:
+        action_dict.update({a: ...})
+
+```
+Notice that `info['action_required'][a]` does not mean that the action will have an effect:
+if the next cell is blocked or the agent breaks down, the action cannot be performed and an action will be required again in the next step.
+
+## Rail Generators and Schedule Generators
+The separation between rail generator and schedule generator reflects the organisational separation in the railway domain
+- Infrastructure Manager (IM): is responsible for the layout and maintenance of tracks
+- Railway Undertaking (RU): operates trains on the infrastructure
+Usually, there is a third organisation, which ensures discrimination-free access to the infrastructure for concurrent requests for the infrastructure in a **schedule planning phase**.
+However, in the **Flat**land challenge, we focus on the re-scheduling problem during live operations.
+
+Technically,
+```python
+RailGeneratorProduct = Tuple[GridTransitionMap, Optional[Any]]
+RailGenerator = Callable[[int, int, int, int], RailGeneratorProduct]
+
+AgentPosition = Tuple[int, int]
+Schedule = collections.namedtuple('Schedule',   'agent_positions '
+                                                'agent_directions '
+                                                'agent_targets '
+                                                'agent_speeds '
+                                                'agent_malfunction_rates '
+                                                'max_episode_steps')
+ScheduleGenerator = Callable[[GridTransitionMap, int, Optional[Any], Optional[int]], Schedule]
+```
+
+We can then produce `RailGenerator`s by currying:
+```python
+def sparse_rail_generator(num_cities=5, num_intersections=4, num_trainstations=2, min_node_dist=20, node_radius=2,
+                          num_neighb=3, grid_mode=False, enhance_intersection=False, seed=1):
+
+    def generator(width, height, num_agents, num_resets=0):
+
+        # generate the grid and (optionally) some hints for the schedule_generator
+        ...
+
+        return grid_map, {'agents_hints': {
+            'num_agents': num_agents,
+            'agent_start_targets_nodes': agent_start_targets_nodes,
+            'train_stations': train_stations
+        }}
+
+    return generator
+```
+And, similarly, `ScheduleGenerator`s:
+```python
+def sparse_schedule_generator(speed_ratio_map: Mapping[float, float] = None) -> ScheduleGenerator:
+    def generator(rail: GridTransitionMap, num_agents: int, hints: Any = None):
+        # place agents:
+        # - initial position
+        # - initial direction
+        # - (initial) speed
+        # - malfunction
+        ...
+
+        return agents_position, agents_direction, agents_target, speeds, agents_malfunction
+
+    return generator
+```
+Notice that the `rail_generator` may pass `agents_hints` to the  `schedule_generator` which the latter may interpret.
+For instance, the way the `sparse_rail_generator` generates the grid, it already determines the agent's goal and target.
+Hence, `rail_generator` and `schedule_generator` have to match if `schedule_generator` presupposes some specific `agents_hints`.
+
+The environment's `reset` takes care of applying the two generators:
+```python
+    def __init__(self,
+            ...
+             rail_generator: RailGenerator = random_rail_generator(),
+             schedule_generator: ScheduleGenerator = random_schedule_generator(),
+             ...
+             ):
+        self.rail_generator: RailGenerator = rail_generator
+        self.schedule_generator: ScheduleGenerator = schedule_generator
+
+    def reset(self, regenerate_rail=True, regenerate_schedule=True):
+        rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets)
+
+        ...
+
+        if replace_agents:
+            agents_hints = None
+            if optionals and 'agents_hints' in optionals:
+                agents_hints = optionals['agents_hints']
+            self.agents_static = EnvAgentStatic.from_lists(
+                self.schedule_generator(self.rail, self.get_num_agents(), hints=agents_hints))
+```
+
+
+## Example code
+
+To see all the changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/flatland_2_0_example.py).
--- a/docs/tutorials/06_round_2_starter_help.md
+++ b/docs/tutorials/06_round_2_starter_help.md
+# How to get started in Round 2
+
+- [Environment Changes](#environment-changes)
+- [Level generation](#level-generation)
+- [Observations](#observations)
+- [Predictions](#predictions)
+
+## Environment Changes
+There have been some major changes in how agents are being handled in the environment in this Flatland update.
+### Agents
+Agents are no more permant entities in the environment. Now agents will be removed from the environment as soon as they finsish their task. To keep interactions with the environment as simple as possible we do not modify the dimensions of the observation vectors nor the number of agents. Agents that have finished do not require any special treatment from the controller. Any action provided to these agents is simply ignored, just like before.
+
+Start positions of agents are *not unique* anymore. This means that many agents can start from the same position on the railway grid. It is important to keep in mind that whatever agent moves first will block the rest of the agents from moving into the same cell. Thus, the controller can already decide the ordering of the agents from the first step.
+
+## Level Generation
+The levels are now generated using the `sparse_rail_generator` and the `sparse_schedule_generator`
+### Rail Generation
+The rail generation is done in a sequence of steps:
+1. A number of city centers are placed in a a grid of size `(height, width)`
+2. Each city is connected to two neighbouring cities
+3. Internal parallel tracks are generated in each city
+
+
+### Schedule Generation
+The `sparse_schedule_generator` produces tasks for the agents by selecting a starting city and a target city. The agent is then placed on an even track number on the starting city and faced such that a path exists to the target city. The task for the agent is to reach the target position as fast as possible.
+
+In the future we will update how these schedules are generated to allow for more complex tasks
+
+## Observations
+Observations have been updated to reflect the novel features and behaviors of Flatland. Have a look at [observation](https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py) or the documentation for more details on the observations.
+
+## Predicitons
\ No newline at end of file
--- a/env_data/tests/Level_distance_map_shortest_path.mpk
+++ b/env_data/tests/Level_distance_map_shortest_path.mpk
--- a/env_data/tests/Test_2_Level_0.pkl
+++ b/env_data/tests/Test_2_Level_0.pkl
--- a/env_data/tests/Test_9_Level_1.pkl
+++ b/env_data/tests/Test_9_Level_1.pkl
--- a/env_data/tests/service_test/Test_0/Level_0.pkl
+++ b/env_data/tests/service_test/Test_0/Level_0.pkl
--- a/env_data/tests/service_test/Test_0/Level_1.pkl
+++ b/env_data/tests/service_test/Test_0/Level_1.pkl
--- a/env_data/tests/service_test/metadata.csv
+++ b/env_data/tests/service_test/metadata.csv
+test_id,env_id,n_agents,x_dim,y_dim,n_cities,max_rail_pairs_in_city,n_envs_run,seed,grid_mode,max_rails_between_cities,malfunction_duration_min,malfunction_duration_max,malfunction_interval,speed_ratios
+Test_0,Level_0,7,30,30,2,2,10,335971,False,2,20,50,540,"{1.0: 0.25, 0.5: 0.25, 0.33: 0.25, 0.25: 0.25}"
+Test_0,Level_1,7,30,30,2,2,10,335972,False,2,20,50,540,"{1.0: 0.25, 0.5: 0.25, 0.33: 0.25, 0.25: 0.25}"
--- a/env_data/tests/test-10x10.mpk
+++ b/env_data/tests/test-10x10.mpk
--- a/env_data/tests/test_001.pkl
+++ b/env_data/tests/test_001.pkl
--- a/env_data/tests/test_002.mpk
+++ b/env_data/tests/test_002.mpk
--- a/env_data/tests/test_env_loop.pkl
+++ b/env_data/tests/test_env_loop.pkl
--- a/examples/complex_rail_benchmark.py
+++ b/examples/complex_rail_benchmark.py
-"""Run benchmarks on complex rail flatland."""
-import random
-
-import numpy as np
-
-from flatland.envs.generators import complex_rail_generator
-from flatland.envs.rail_env import RailEnv
-
-
-def run_benchmark():
-    """Run benchmark on a small number of agents in complex rail environment."""
-    random.seed(1)
-    np.random.seed(1)
-
-    # Example generate a random rail
-    env = RailEnv(width=15, height=15,
-                  rail_generator=complex_rail_generator(nr_start_goal=5, nr_extra=20, min_dist=12),
-                  number_of_agents=5)
-
-    n_trials = 20
-    action_dict = dict()
-    action_prob = [0] * 4
-
-    def max_lt(seq, val):
-        """
-        Return greatest item in seq for which item < val applies.
-        None is returned if seq was empty or all items in seq were >= val.
-        """
-
-        idx = len(seq) - 1
-        while idx >= 0:
-            if seq[idx] < val and seq[idx] >= 0:
-                return seq[idx]
-            idx -= 1
-        return None
-
-    for trials in range(1, n_trials + 1):
-
-        # Reset environment
-        obs = env.reset()
-
-        for a in range(env.get_num_agents()):
-            norm = max(1, max_lt(obs[a], np.inf))
-            obs[a] = np.clip(np.array(obs[a]) / norm, -1, 1)
-
-        # Run episode
-        for step in range(100):
-            # Action
-            for a in range(env.get_num_agents()):
-                action = np.random.randint(0, 4)
-                action_prob[action] += 1
-                action_dict.update({a: action})
-
-            # Environment step
-            next_obs, all_rewards, done, _ = env.step(action_dict)
-            for a in range(env.get_num_agents()):
-                norm = max(1, max_lt(next_obs[a], np.inf))
-                next_obs[a] = np.clip(np.array(next_obs[a]) / norm, -1, 1)
-
-            if done['__all__']:
-                break
-        if trials % 100 == 0:
-            action_prob = [1] * 4
-
-
-if __name__ == "__main__":
-    run_benchmark()
--- a/examples/custom_observation_example.py
+++ b/examples/custom_observation_example.py
-import random
-
-import numpy as np
-
-from flatland.core.env_observation_builder import ObservationBuilder
-from flatland.envs.generators import random_rail_generator
-from flatland.envs.rail_env import RailEnv
-
-random.seed(100)
-np.random.seed(100)
-
-
-class CustomObs(ObservationBuilder):
-    def __init__(self):
-        self.observation_space = [5]
-
-    def reset(self):
-        return
-
-    def get(self, handle):
-        observation = handle * np.ones((5,))
-        return observation
-
-
-env = RailEnv(width=7,
-              height=7,
-              rail_generator=random_rail_generator(),
-              number_of_agents=3,
-              obs_builder_object=CustomObs())
-
-# Print the observation vector for each agents
-obs, all_rewards, done, _ = env.step({0: 0})
-for i in range(env.get_num_agents()):
-    print("Agent ", i, "'s observation: ", obs[i])
--- a/examples/custom_observation_example_01_SimpleObs.py
+++ b/examples/custom_observation_example_01_SimpleObs.py
+import random
+
+import numpy as np
+
+from flatland.core.env_observation_builder import ObservationBuilder
+from flatland.envs.line_generators import sparse_line_generator
+from flatland.envs.rail_env import RailEnv
+from flatland.envs.rail_generators import sparse_rail_generator
+
+random.seed(100)
+np.random.seed(100)
+
+
+class SimpleObs(ObservationBuilder):
+    """
+    Simplest observation builder. The object returns observation vectors with 5 identical components,
+    all equal to the ID of the respective agent.
+    """
+
+    def __init__(self):
+        super().__init__()
+
+    def reset(self):
+        return
+
+    def get(self, handle: int = 0) -> np.ndarray:
+        observation = handle * np.ones((5,))
+        return observation
+
+
+def create_env():
+    nAgents = 3
+    n_cities = 2
+    max_rails_between_cities = 2
+    max_rails_in_city = 4
+    seed = 0
+    env = RailEnv(
+        width=20,
+        height=30,
+        rail_generator=sparse_rail_generator(
+            max_num_cities=n_cities,
+            seed=seed,
+            grid_mode=True,
+            max_rails_between_cities=max_rails_between_cities,
+            max_rail_pairs_in_city=max_rails_in_city
+        ),
+        line_generator=sparse_line_generator(),
+        number_of_agents=nAgents,
+        obs_builder_object=SimpleObs()
+    )
+    return env
+
+
+def main():
+    env = create_env()
+    env.reset()
+
+    # Print the observation vector for each agents
+    obs, all_rewards, done, _ = env.step({0: 0})
+    for i in range(env.get_num_agents()):
+        print("Agent ", i, "'s observation: ", obs[i])
+
+
+if __name__ == '__main__':
+    main()
--- a/examples/custom_observation_example_02_SingleAgentNavigationObs.py
+++ b/examples/custom_observation_example_02_SingleAgentNavigationObs.py
+import getopt
+import random
+import sys
+import time
+from typing import List
+
+import numpy as np
+
+from flatland.core.env_observation_builder import ObservationBuilder
+from flatland.core.grid.grid4_utils import get_new_position
+from flatland.envs.line_generators import sparse_line_generator
+from flatland.envs.rail_env import RailEnv
+from flatland.envs.rail_generators import sparse_rail_generator
+from flatland.utils.misc import str2bool
+from flatland.utils.rendertools import RenderTool
+
+random.seed(100)
+np.random.seed(100)
+
+
+class SingleAgentNavigationObs(ObservationBuilder):
+    """
+    We build a representation vector with 3 binary components, indicating which of the 3 available directions
+    for each agent (Left, Forward, Right) lead to the shortest path to its target.
+    E.g., if taking the Left branch (if available) is the shortest route to the agent's target, the observation vector
+    will be [1, 0, 0].
+    """
+
+    def __init__(self):
+        super().__init__()
+
+    def reset(self):
+        pass
+
+    def get(self, handle: int = 0) -> List[int]:
+        agent = self.env.agents[handle]
+
+        if agent.position:
+            possible_transitions = self.env.rail.get_transitions(*agent.position, agent.direction)
+        else:
+            possible_transitions = self.env.rail.get_transitions(*agent.initial_position, agent.direction)
+
+        num_transitions = np.count_nonzero(possible_transitions)
+
+        # Start from the current orientation, and see which transitions are available;
+        # organize them as [left, forward, right], relative to the current orientation
+        # If only one transition is possible, the forward branch is aligned with it.
+        if num_transitions == 1:
+            observation = [0, 1, 0]
+        else:
+            min_distances = []
+            for direction in [(agent.direction + i) % 4 for i in range(-1, 2)]:
+                if possible_transitions[direction]:
+                    new_position = get_new_position(agent.position, direction)
+                    min_distances.append(
+                        self.env.distance_map.get()[handle, new_position[0], new_position[1], direction])
+                else:
+                    min_distances.append(np.inf)
+
+            observation = [0, 0, 0]
+            observation[np.argmin(min_distances)] = 1
+
+        return observation
+
+
+def create_env():
+    nAgents = 1
+    n_cities = 2
+    max_rails_between_cities = 2
+    max_rails_in_city = 4
+    seed = 0
+    env = RailEnv(
+        width=30,
+        height=40,
+        rail_generator=sparse_rail_generator(
+            max_num_cities=n_cities,
+            seed=seed,
+            grid_mode=True,
+            max_rails_between_cities=max_rails_between_cities,
+            max_rail_pairs_in_city=max_rails_in_city
+        ),
+        line_generator=sparse_line_generator(),
+        number_of_agents=nAgents,
+        obs_builder_object=SingleAgentNavigationObs()
+    )
+    return env
+
+
+def custom_observation_example_02_SingleAgentNavigationObs(sleep_for_animation, do_rendering):
+    env = create_env()
+    obs, info = env.reset()
+
+    env_renderer = None
+    if do_rendering:
+        env_renderer = RenderTool(env)
+        env_renderer.render_env(show=True, frames=True, show_observations=False)
+
+    for step in range(100):
+        action = np.argmax(obs[0]) + 1
+        obs, all_rewards, done, _ = env.step({0: action})
+        print("Rewards: ", all_rewards, "  [done=", done, "]")
+
+        if env_renderer is not None:
+            env_renderer.render_env(show=True, frames=True, show_observations=True)
+        if sleep_for_animation:
+            time.sleep(0.1)
+        if done["__all__"]:
+            break
+    if env_renderer is not None:
+        env_renderer.close_window()
+
+
+def main(args):
+    try:
+        opts, args = getopt.getopt(args, "", ["sleep-for-animation=", "do_rendering=", ""])
+    except getopt.GetoptError as err:
+        print(str(err))  # will print something like "option -a not recognized"
+        sys.exit(2)
+    sleep_for_animation = True
+    do_rendering = True
+    for o, a in opts:
+        if o in ("--sleep-for-animation"):
+            sleep_for_animation = str2bool(a)
+        elif o in ("--do_rendering"):
+            do_rendering = str2bool(a)
+        else:
+            assert False, "unhandled option"
+
+    # execute example
+    custom_observation_example_02_SingleAgentNavigationObs(sleep_for_animation, do_rendering)
+
+
+if __name__ == '__main__':
+    if 'argv' in globals():
+        main(argv)
+    else:
+        main(sys.argv[1:])
No results found