Compare revisions

a6c4ae6a · a6c4ae6a · a6c4ae6a · a6c4ae6a · a6c4ae6a · a6c4ae6a
--- a/docs/specifications/intro_observation_actions.rst
+++ b/docs/specifications/intro_observation_actions.rst
+
+Observation and Action Spaces
+----------------------------
+This is an introduction to the three standard observations and the action space of **Flatland**.
+
+Action Space
+^^^^^^^^^^^^
+Flatland is a railway simulation. Thus the actions of an agent are strongly limited to the railway network. This means that in many cases not all actions are valid.
+The possible actions of an agent are
+
+- ``0`` **Do Nothing**:  If the agent is moving it continues moving, if it is stopped it stays stopped
+- ``1`` **Deviate Left**: If the agent is at a switch with a transition to its left, the agent will chose th eleft path. Otherwise the action has no effect. If the agent is stopped, this action will start agent movement again if allowed by the transitions.
+- ``2`` **Go Forward**: This action will start the agent when stopped. This will move the agent forward and chose the go straight direction at switches.
+- ``3`` **Deviate Right**: Exactly the same as deviate left but for right turns.
+- ``4`` **Stop**: This action causes the agent to stop.
+
+Observation Spaces
+^^^^^^^^^^^^^^^^^^
+In the **Flatland** environment we have included three basic observations to get started. The figure below illustrates the observation range of the different basic observation: ``Global``, ``Local Grid`` and ``Local Tree``.
+
+.. image:: https://i.imgur.com/oo8EIYv.png
+    :height: 100
+    :width: 200
+
+   
+Global Observation
+~~~~~~~~~~~~~~~~~~
+Gives a global observation of the entire rail environment.
+
+The observation is composed of the following elements:
+    
+- transition map array with dimensions (``env.height``, ``env.width``, ``16``), assuming **16 bits encoding of transitions**.
+- Two 2D arrays (``map_height``, ``map_width``, ``2``) containing respectively the position of the given agent target and the positions of the other agents' targets.
+- A 3D array (``map_height``, ``map_width``, ``8``) with the **first 4 channels** containing the **one hot encoding** of the direction of the given agent and the second 4 channels containing the positions of the other agents at their position coordinates.
+
+We encourage you to enhance this observation with any layer you think might help solve the problem.
+It would also be possible to construct a global observation for a super agent that controls all agents at once.
+
+Local Grid Observation
+~~~~~~~~~~~~~~~~~~~~~~
+Gives a local observation of the rail environment around the agent.
+The observation is composed of the following elements:
+
+- transition map array of the local environment around the given agent, with dimensions (``2*view_radius + 1``, ``2*view_radius + 1``, ``16``), assuming **16 bits encoding of transitions**.
+- Two 2D arrays (``2*view_radius + 1``, ``2*view_radius + 1``, ``2``) containing respectively, if they are in the agent's vision range, its target position, the positions of the other targets.
+- A 3D array (``2*view_radius + 1``, ``2*view_radius + 1``, ``4``) containing the one hot encoding of directions of the other agents at their position coordinates, if they are in the agent's vision range.
+- A 4 elements array with one hot encoding of the direction.
+
+Be aware that this observation **does not** contain any clues about target location if target is out of range. Thus navigation on maps where the radius of the observation does not guarantee a visible target at all times will become very difficult.
+We encourage you to come up with creative ways to overcome this problem. In the tree observation below we introduce the concept of distance maps.
+
+Tree Observation
+~~~~~~~~~~~~~~~~
+The tree observation is built by exploiting the graph structure of the railway network. The observation is generated by spanning a **4 branched tree** from the current position of the agent. Each branch follows the allowed transitions (backward branch only allowed at dead-ends) until a cell with multiple allowed transitions is reached. Here the information gathered along the branch is stored as a node in the tree.
+The figure below illustrates how the tree observation is built:
+
+1. From Agent location probe all 4 directions (``L:Blue``, ``F:Green``, ``R:Purple``, ``B:Red``) starting with left and start branches when transition is allowed.
+
+    1. For each branch walk along the allowed transition until you reach a dead-end, switch or the target destination.
+    2. Create a node and fill in the node information as stated below.
+    3. If max depth of tree is not reached and there are possible transitions, start new branches and repeat the steps above.
+2. Fill up all non existing branches with -infinity such that tree size is invariant to the number of possible transitions at branching points.
+
+Note that we always start with the left branch according to the agent orientation. Thus the tree observation is independent of the NESW orientation of cells, and only considers the transitions relative to the agent's orientation.
+
+The colors in the figure bellow illustrate what branch the cell belongs to. If there are multiple colors in a cell, this cell is visited by different branches of the tree observation.
+The right side of the figure shows the resulting tree of the railway network on the left. Cross means no branch was built. If a node has no children it was either a terminal node (dead-end, max depth reached or no transition possible). A circle indicates a node filled with the corresponding information stated below in Node Information.
+
+
+.. image:: https://i.imgur.com/sGBBhzJ.png
+    :height: 100
+    :width: 200
+    
+    
+Node Information
+~~~~~~~~~~~~~~~~
+Each node is filled with information gathered along the path to the node. Currently each node contains 9 features:
+
+- 1: if own target lies on the explored branch the current distance from the agent in number of cells is stored.
+- 2: if another agent's target is detected, the distance in number of cells from the current agent position is stored.
+- 3: if another agent is detected, the distance in number of cells from the current agent position is stored.
+- 4: possible conflict detected (This only works when we use a predictor and will not be important in this tutorial)
+- 5: if an unusable switch (for the agent) is detected we store the distance. An unusable switch is a switch where the agent does not have any choice of path, but other agents coming from different directions might. 
+- 6: This feature stores the distance (in number of cells) to the next node (e.g. switch or target or dead-end)
+- 7: minimum remaining travel distance from this node to the agent's target given the direction of the agent if this path is chosen
+- 8: agent in the same direction found on path to node
+
+    - ``n`` = number of agents present in the same direction (possible future use: number of other agents in the same direction in this branch)
+    - ``0`` = no agent present in the same direction
+- 9: agent in the opposite direction on path to node
+
+    - ``n`` = number of agents present in the opposite direction to the observing agent
+    - ``0`` = no agent present in other direction to the observing agent
+
+
+
--- a/docs/specifications/railway.md
+++ b/docs/specifications/railway.md
+## Railway Specifications
+
+### Overview
+
+Flatland is usually a two-dimensional environment intended for multi-agent problems, in particular it should serve as a benchmark for many multi-agent reinforcement learning approaches.
+
+The environment can host a broad array of diverse problems reaching from disease spreading to train traffic management.
+
+This documentation illustrates the dynamics and possibilities of Flatland environment and introduces the details of the train traffic management implementation.
+
+
+### Environment
+
+Before describing the Flatland at hand, let us first define terms which will be used in this specification. Flatland is grid-like n-dimensional space of any size. A cell is the elementary element of the grid.  The cell is defined as a location where any objects can be located at. The term agent is defined as an entity that can move within the grid and must solve tasks. An agent can move in any arbitrary direction on well-defined transitions from cells to cell. The cell where the agent is located at must have enough capacity to hold the agent on. Every agent reserves exact one capacity or resource. The capacity of a cell is usually one. Thus usually only one agent can be at same time located at a given cell. The agent movement possibility can be restricted by limiting the allowed transitions.
+
+Flatland is a discrete time simulation. A discrete time simulation performs all actions with constant time step. In Flatland the simulation step moves the time forward in equal duration of time. At each step the agents can choose an action. For the chosen action the attached transition will be executed. While executing a transition Flatland checks whether the requested transition is valid. If the transition is valid the transition will update the agents position. In case the transition call is not allowed the agent will not move.
+
+In general each cell has a only one cell type attached. With the help of the cell type the allowed transitions can be defined for all agents.
+
+Flatland supports many different types of agents. In consequence the cell type can be further defined per agent type. In consequence the allowed transition for a agent at a given cell is now defined by the cell type and agent's type.
+
+For each agent type Flatland can have a different action space.
+
+
+#### Grid
+
+A rectangular grid of integer shape (dim_x, dim_y) defines the spatial dimensions of the environment.
+
+Within this documentation we use North, East, West, South as orientation indicator where North is Up, South is Down, West is left and East is Right.
+
+![single_cell](https://drive.google.com/uc?export=view&id=1O6jl2Ha14TV3Wuob5EbaowdYZiFt3aDW)
+
+
+Cells are enumerated starting from NW, East-West axis is the second coordinate, North-South is the first coordinate as commonly used in matrix notation.
+
+Two cells $`i`$ and $`j`$ ($`i \neq j`$) are considered neighbors when the Euclidean distance between them is $`|\vec{x_i}-\vec{x_j}<= \sqrt{2}|`$. This means that the grid does not wrap around as if on a torus. (Two cells are considered neighbors when they share one edge or on node.)
+
+![cell_table](https://drive.google.com/uc?export=view&id=109cD1uihDvTWnQ7PPTxC9AiNphlsY92r)
+
+For each cell the allowed transitions to all neighboring 4 cells are defined. This can be extended to include transition probabilities as well.
+
+
+#### Tile Types
+
+###### Railway Grid
+
+Each Cell within the simulation grid consists of a distinct tile type which in turn limit the movement possibilities of the agent through the cell. For railway specific problem 8 basic tile types can be defined which describe a rail network. As a general fact in railway network when on navigation choice must be taken at maximum two options are available.
+
+The following image gives an overview of the eight basic types. These can be rotated in steps of 45° and mirrored along the North-South of East-West axis. Please refer to Appendix A for a complete list of tiles.
+
+
+![cell_types](https://drive.google.com/uc?export=view&id=164iowmfRQ9O34hquxLhO2xxt49NE473P)
+
+
+As a general consistency rule, it can be said that each connection out of a tile must be joined by a connection of a neighboring tile.
+
+![consistency_rule](https://drive.google.com/uc?export=view&id=1iaMIokHZ9BscMJ_Vi9t8QX_-8DzOjBKE)
+
+In the image above on the left picture there is an inconsistency at the eastern end of cell (3,2) since the there is no valid neighbor for cell (3,2). In the right picture a Cell (3,2) consists of a dead-end which leaves no unconnected transitions.
+
+Case 0 represents a wall, thus no agent can occupy the tile at any time.
+
+Case 1 represent a passage through the tile. While on the tile the agent on can make no navigation decision. The agent can only decide to either continue, i.e. passing on to the next connected tile, wait or move backwards (moving the tile visited before).
+
+Case 2 represents a simple switch thus when coming the top position (south in the example) a navigation choice (West or North) must be taken. Generally the straight transition (S->N in the example) is less costly than the bent transition. Therefore in Case 2 the two choices may be rewarded differently. Case 6 is identical to case 2 from a topological point of view, however the is no preferred choice when coming from South.
+
+Case 3 can be seen as a superposition of Case 1. As with any other tile at maximum one agent can occupy the cell at a given time.
+
+Case 4 represents a single-slit switch. In the example a navigation choice is possible when coming from West or South.
+
+In Case 5 coming from all direction a navigation choice must be taken.
+
+Case 7 represents a deadend, thus only stop or backwards motion is possible when an agent occupies this cell.
+
+
+###### Tile Types of Wall-Based Cell Games (Theseus and Minotaur's puzzle, Labyrinth Game)
+
+The Flatland approach can also be used the describe a variety of cell based logic games. While not going into any detail at all it is still worthwhile noting that the games are usually visualized using cell grid with wall describing forbidden transitions (negative formulation).
+
+![minotaurus](https://drive.google.com/uc?export=view&id=1WbU6YGopLKqAjVD6-r9UhCIzDfLisb5U)
+
+Left: Wall-based Grid definition (negative definition), Right: lane-based Grid definition (positive definition)
+
+
+## Train Traffic Management
+
+
+#### Problem Definition
+
+Additionally, due to the dynamics of train traffic, each transition probability is symmetric in this environment. This means that neighboring cells will always have the same transition probability to each other.
+
+Furthermore, each cell is exclusive and can only be occupied by one agent at any given time.
+
+
+### Observations
+
+In this early stage of the project it is very difficult to come up with the necessary observation space in order to solve all train related problems. Given our early experiments we therefore propose different observation methods and hope to investigate further options with the crowdsourcing challenge. Below we compare global observation with local observations and discuss the differences in performance and flexibility.
+
+
+#### Global Observation
+
+Global observations, specifically on a grid like environment, benefit from the vast research results on learning from pixels and the advancements in convolutional neural network algorithms. The observation can simply be generated from the environment state and not much additional computation is necessary to generate the state.
+
+It is reasonable to assume that an observation of the full environment is beneficiary for good global solutions. Early experiments also showed promising result on small toy examples.
+
+However, we run into problems when scalability and flexibility become an important factor. Already on small toy examples we could show that flexibility quickly becomes an issue when the problem instances differ too much. When scaling the problem instances the decision performance of the algorithm diminishes and re-training becomes necessary.
+
+Given the complexity of real-world railway networks (especially in Switzerland), we do not believe that a global observation is suited for this problem.
+
+
+#### Local Observation
+
+Given that scalability and speed are the main requirements for our use cases local observations offer an interesting novel approach. Local observations require some additional computations to be extracted from the environment state but could in theory be performed in parallel for each agent.
+
+With early experiments (presentation GTC, details below) we could show that even with local observations multiple agents can find feasible, global solutions and most importantly scale seamlessly to larger problem instances.
+
+Below we highlight two different forms of local observations and elaborate on their benefits.
+
+
+##### Local Field of View
+
+This form of observation is very similar to the global view approach, in that it consists of a grid like input. In this setup each agent has its own observation that depends on its current location in the environment.
+
+Given an agents location, the observation is simply a $`n \times m`$ grid around the agent. The observation grid does not need to be symmetric or squared not does it need to center around the agent.
+
+**Benefits** of this approach again come from the vast research findings using convolutional neural networks and the comparably small computational effort to generate each observation.
+
+**Drawbacks** mostly come from the specific details of train traffic dynamics, most notably the limited degrees of freedom. Considering, that the actions and directions an agent can chose in any given cell, it becomes clear that a grid like observation around an agent will not contain much useful information, as most of the observed cells are not reachable nor play a significant role in for the agents decisions.
+
+![local_grid](https://drive.google.com/uc?export=view&id=1kZzinMOs7hlPaSJJeIiaQ7lAz2erXuHx)
+
+##### Tree Search
+
+From our past experiences and the nature of railway networks (they are a graph) it seems most suitable to use a local tree search as an observation for the agents.
+
+A tree search on a grid of course will be computationally very expensive compared to a simple rectangular observation. Luckily, the limited allowed transition in the railway implementation, vastly reduce the complexity of the tree search. The figure below illustrates the observed tiles when using a local tree search. The information contained in such an observation is much higher than in the proposed grid observation above.
+
+**Benefit** of this approach is the incorporation of allowed transitions into the observation generation and thus an improvement of information density in the observation. From our experience this is currently the most suited observation space for the problem.
+
+**Drawback **is** **mostly the computational cost to generate the observation tree for each agent. Depending on how we model the tree search we will be able to perform all searches in parallel. Because the agents are not able to see the global system, the environment needs to provide some information about the global environment locally to the agent e.g. position of destination.
+
+**Unclear** is whether or not we should rotate the tree search according to the agent such that decisions are always made according to the direction of travel of the agent.
+
+
+![local_tree](https://drive.google.com/uc?export=view&id=1biob77VFskCsa3HwNsDH-gks9k965JEb)
+_Figure 3: A local tree search moves along the allowed transitions, originating from the agents position. This observation contains much more relevant information but has a higher computational cost. This figure illustrates an agent that can move east from its current position. The thick lines indicate the allowed transitions to a depth of eight._
+
+We have gained some insights into using and aggregating the information along the tree search. This should be part of the early investigation while implementing Flatland. One possibility would also be to leave this up to the participants of the Flatland challenge.
+
+
+#### Communication
+
+Given the complexity and the high dependence of the multi-agent system a communication form might be necessary. This needs to be investigated und following constraints:
+
+*   Communication must converge in a feasible time
+*   Communication…
+
+Depending on the game configuration every agent can be informed about the position of the other agents present in the respective observation range. For a local observation space the agent knows the distance to the next agent (defined with the agent type) in each direction. If no agent is present the the distance can simply be -1 or null.
+
+
+#### Action Negotiation
+
+In order to avoid illicit situations ( for example agents crashing into each other) the intended actions for each agent in the observation range is known. Depending on the known movement intentions new movement intention must be generated by the agents. This is called a negotiation round. After a fixed amount of negotiation round the last intended action is executed for each agent. An illicit situation results in ending the game with a fixed low rewards.
+
+
+### Actions
+
+
+#### Navigation
+
+The agent can be located at any cell except on case 0 cells. The agent can move along the rails to another unoccupied cell or it can just wait where he is currently located at.
+
+Flatland is a discrete time simulation. A discrete time simulation performs all actions in a discrete time with constant time step. In Flatland the simulation step is fixed and the time moves forward in equal duration of time. At each step every agent can choose an action. For the chosen action the attached transition will be executed. While executing a transition Flatland checks whether the requested transition is valid. If the transition is valid the transition will update the agents position. In case the transition call is not allowed the agent will not move.
+
+If the agent calls an action and the attached transition is not allowed at current cell the agent will not move. Waiting at current cell is always a valid action.  The waiting action is an action which has the transition from current cell to going-to cell equal current cell attached.
+
+An agent can move with a definable maximum speed. The default and absolute maximum speed is one spatial unit per time step. If an agent is defined to move slower, it can take a navigation action only ever N steps with N being an integer. For the transition to be made the same action must be taken N times consecutively. An agent can also have a maximum speed of 0 defined, thus it can never take a navigation step. This would be the case where an agent represents a good to be transported which can never move on its own.
+
+An agent can be defined to be picked up/dropped off by another agent or to pick up/drop off another agent. When agent A is picked up by another agent B it is said that A is linked to B. The linked agent loses all its navigation possibilities. On the other side it inherits the position from the linking agent for the time being linked. Linking and unlinking between two agents is only possible the participating agents have the same space-time coordinates for the linking and unlinking action.
+
+
+#### Transportation
+
+In railway the transportation of goods or passengers is essential. Consequently agents can transport goods or passengers. It's depending on the agent's type. If the agent is a freight train, it will transport goods. It's passenger train it will transport passengers only.  But the transportation capacity for both kind of trains limited. Passenger trains have a maximum number of seats restriction. The freight trains have a maximal number of tons restriction.
+
+Passenger can take or switch trains only at stations. Passengers are agents with traveling needs.  A common passenger like to move from a starting location to a destination and it might like using trains or walking. Consequently a future Flatland must also support passenger movement (walk) in the grid and not only by using train. The goal of a passenger is to reach in an optimal manner its destination.  The quality of traveling is measured by the reward function.
+
+Goods will be only transported over the railway network. Goods are agents with transportation needs. They can start their transportation chain at any station. Each good has a station as the destination attached. The destination is the end of the transportation. It's the transportation goal. Once a good reach its destination it will disappear. Disappearing mean the goods leave Flatland. Goods can't move independently on the grid. They can only move by using trains. They can switch trains at any stations. The goal of the system is to find for goods the right trains to get a feasible transportation chain.  The quality of the transportation chain is measured by the reward function.
+
+
+### Environment Rules
+
+*   Depending the cell type a cell must have a given number of neighbouring cells of a given type. \
+
+*   There mustn't exists a state where the occupation capacity of a cell is violated.   \
+
+*   An Agent can move at maximum by one cell at a time step. \
+
+*   Agents related to each other through transport (one carries another) must be at the same place the same time.
+
+
+### Environment Configuration
+
+The environment should allow for a broad class of problem instances. Thus the configuration file for each problem instance should contain:
+
+*   Cell types allowed
+*   Agent types allowed
+*   Objects allowed
+*   Level generator to use
+*   Episodic or non-episodic task
+*   Duration
+*   Reward function
+*   Observation types allowed
+*   Actions allowed
+*   Dimensions of Environment?
+
+For the train traffic the configurations should be as follows:
+
+Cell types: Case 0 - 7
+
+Agent Types allowed: Active Agents with Speed 1 and no goals, Passive agents with goals
+
+Object allowed: None
+
+Level Generator to use: ?
+
+Reward function: as described below
+
+Observation Type: Local, Targets known
+
+It should be check prior to solving the problem that the Goal location for each agent can be reached.
+
+
+### Reward Function
+
+
+#### Railway-specific Use-Cases
+
+A first idea for a Cost function for generic applicability is as follows. For each agent and each goal sum up
+
+
+
+*   The timestep when the goal has been reached when not target time is given in the goal.
+*   The absolute value of the difference between the target time and the arrival time of the agent.
+
+An additional refinement proven meaningful for situations where not target time is given is to weight the longest arrival time higher as the sum off all arrival times.
+
+
+#### Further Examples (Games)
+
+
+### Initialization
+
+Given that we want a generalizable agent to solve the problem, training must be performed on a diverse training set. We therefore need a level generator which can create novel tasks for to be solved in a reliable and fast fashion.
+
+
+#### Level Generator
+
+Each problem instance can have its own level generator.
+
+The inputs to the level generator should be:
+
+
+*   Spatial and temporal dimensions of environment
+*   Reward type
+    *   Over all task
+    *   Collaboration or competition
+*   Number of agents
+*   Further level parameters
+    *   Environment complexity
+    *   Stochasticity and error
+*   Random or pre designed environment
+
+The output of the level generator should be:
+
+
+*   Feasible environment
+*   Observation setup for require number of agents
+*   Initial rewards, positions and observations
+
+
+### Railway Use Cases
+
+In this section we define a few simple tasks related to railway traffic that we believe would be well suited for a crowdsourcing challenge. The tasks are ordered according to their complexity. The Flatland repo must at least support all these types of use cases.
+
+
+#### Simple Navigation
+
+In order to onboard the broad reinforcement learning community this task is intended as an introduction to the Railway@Flatland environment.
+
+
+##### Task
+
+A single agent is placed at an arbitrary (permitted) cell and is given a target cell (reachable by the rules of Flatand). The task is to arrive at the target destination in as little time steps as possible.
+
+
+##### Actions
+
+In this task an agent can perform transitions ( max 3 possibilities) or stop. Therefore, the agent can chose an action in the range $`a \in [0,4] `$.
+
+
+##### Reward
+
+The reward is -1 for each time step and 10 if the agent stops at the destination. We might add -1 for invalid moves to speed up exploration and learning.
+
+
+##### Observation
+
+If we chose a local observation scheme, we need to provide some information about the distance to the target to the agent. This could either be achieved by a distance map, by using waypoints or providing a broad sense of direction to the agent.
+
+
+#### Multi Agent Navigation and Dispatching
+
+This task is intended as a natural extension of the navigation task.
+
+
+##### Task
+
+A number of agents ($`n`$-agents) are placed at an arbitrary (permitted) cell and given individual target cells (reachable by the rules of Flatand). The task is to arrive at the target destination in as little time steps as possible as a group. This means that the goal is to minimize the longest path of *ALL* agents.
+
+
+##### Actions
+
+In this task an agent can perform transitions ( max 3 possibilities) or stop. Therefore, the agent can chose an action in the range $`a \in [0,4] `$.
+
+##### Reward
+
+The reward is -1 for each time step and 10 if all the agents stop at the destination. We can further punish collisions between agents and illegal moves to speed up learning.
+
+
+##### Observation
+
+If we chose a local observation scheme, we need to provide some information about the distance to the target to the agent. This could either be achieved by a distance map or by using waypoints.
+
+The agents must see each other in their tree searches.
+
+
+##### Previous learnings
+
+Training an agent by himself first to understand the main task turned out to be beneficial.
+
+It might be necessary to add the "intended" paths of each agent to the observation in order to get intelligent multi agent behavior.
+
+A communication layer might be necessary to improve agent performance.
+
+
+#### Multi Agent Navigation and Dispatching with Schedule
+
+
+#### Transport Chains (Transportation of goods and passengers)
+
+### Benefits of Transition Model
+
+Using a grid world with 8 transition possibilities to the neighboring cells constitutes a very flexible environment, which can model many different types of problems.
+
+Considering the recent advancements in machine learning, this approach also allows to make use of convolutions in order to process observation states of agents. For the specific case of railway simulation the grid world unfortunately also brings a few drawbacks.
+
+Most notably the railway network only offers action possibilities at elements where there are more than two transition probabilities. Thus, if using a less dense graph than a grid, the railway network could be represented in a simpler graph. However, we believe that moving from grid-like example where many transitions are allowed towards the railway network with fewer transitions would be the simplest approach for the broad reinforcement learning community.
+
+
+
+
+## Rail Generators and Schedule Generators
+The separation between rail generator and schedule generator reflects the organisational separation in the railway domain
+- Infrastructure Manager (IM): is responsible for the layout and maintenance of tracks
+- Railway Undertaking (RU): operates trains on the infrastructure
+Usually, there is a third organisation, which ensures discrimination-free access to the infrastructure for concurrent requests for the infrastructure in a **schedule planning phase**.
+However, in the **Flat**land challenge, we focus on the re-scheduling problem during live operations.
+
+Technically,
+```python
+RailGeneratorProduct = Tuple[GridTransitionMap, Optional[Any]]
+RailGenerator = Callable[[int, int, int, int], RailGeneratorProduct]
+
+AgentPosition = Tuple[int, int]
+Schedule = collections.namedtuple('Schedule',   'agent_positions '
+                                                'agent_directions '
+                                                'agent_targets '
+                                                'agent_speeds '
+                                                'agent_malfunction_rates '
+                                                'max_episode_steps')
+ScheduleGenerator = Callable[[GridTransitionMap, int, Optional[Any], Optional[int]], Schedule]
+```
+
+We can then produce `RailGenerator`s by currying:
+```python
+def sparse_rail_generator(num_cities=5, num_intersections=4, num_trainstations=2, min_node_dist=20, node_radius=2,
+                          num_neighb=3, grid_mode=False, enhance_intersection=False, seed=1):
+
+    def generator(width, height, num_agents, num_resets=0):
+
+        # generate the grid and (optionally) some hints for the schedule_generator
+        ...
+
+        return grid_map, {'agents_hints': {
+            'num_agents': num_agents,
+            'agent_start_targets_nodes': agent_start_targets_nodes,
+            'train_stations': train_stations
+        }}
+
+    return generator
+```
+And, similarly, `ScheduleGenerator`s:
+```python
+def sparse_schedule_generator(speed_ratio_map: Mapping[float, float] = None) -> ScheduleGenerator:
+    def generator(rail: GridTransitionMap, num_agents: int, hints: Any = None):
+        # place agents:
+        # - initial position
+        # - initial direction
+        # - (initial) speed
+        # - malfunction
+        ...
+
+        return agents_position, agents_direction, agents_target, speeds, agents_malfunction
+
+    return generator
+```
+Notice that the `rail_generator` may pass `agents_hints` to the  `schedule_generator` which the latter may interpret.
+For instance, the way the `sparse_rail_generator` generates the grid, it already determines the agent's goal and target.
+Hence, `rail_generator` and `schedule_generator` have to match if `schedule_generator` presupposes some specific `agents_hints`.
+
+The environment's `reset` takes care of applying the two generators:
+```python
+    def __init__(self,
+            ...
+             rail_generator: RailGenerator = random_rail_generator(),
+             schedule_generator: ScheduleGenerator = random_schedule_generator(),
+             ...
+             ):
+        self.rail_generator: RailGenerator = rail_generator
+        self.schedule_generator: ScheduleGenerator = schedule_generator
+
+    def reset(self, regenerate_rail=True, regenerate_schedule=True):
+        rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets)
+
+        ...
+
+        if replace_agents:
+            agents_hints = None
+            if optionals and 'agents_hints' in optionals:
+                agents_hints = optionals['agents_hints']
+            self.agents_static = EnvAgentStatic.from_lists(
+                self.schedule_generator(self.rail, self.get_num_agents(), hints=agents_hints))
+```
+
+
+### RailEnv Speeds
+One of the main contributions to the complexity of railway network operations stems from the fact that all trains travel at different speeds while sharing a very limited railway network.
+
+The different speed profiles can be generated using the `schedule_generator`, where you can actually chose as many different speeds as you like.
+Keep in mind that the *fastest speed* is 1 and all slower speeds must be between 1 and 0.
+For the submission scoring you can assume that there will be no more than 5 speed profiles.
+
+
+Currently (as of **Flat**land 2.0), an agent keeps its speed over the whole episode.
+
+Because the different speeds are implemented as fractions the agents ability to perform actions has been updated.
+We **do not allow actions to change within the cell **.
+This means that each agent can only chose an action to be taken when entering a cell (ie. positional fraction is 0).
+There is some real railway specific considerations such as reserved blocks that are similar to this behavior.
+But more importantly we disabled this to simplify the use of machine learning algorithms with the environment.
+If we allow stop actions in the middle of cells. then the controller needs to make much more observations and not only at cell changes.
+(Not set in stone and could be updated if the need arises).
+
+The chosen action is then executed when a step to the next cell is valid. For example
+
+- Agent enters switch and choses to deviate left. Agent fractional speed is 1/4 and thus the agent will take 4 time steps to complete its journey through the cell. On the 4th time step the agent will leave the cell deviating left as chosen at the entry of the cell.
+    - All actions chosen by the agent during its travels within a cell are ignored
+    - Agents can make observations at any time step. Make sure to discard observations without any information. See this [example](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/training_navigation.py) for a simple implementation.
+- The environment checks if agent is allowed to move to next cell only at the time of the switch to the next cell
+
+In your controller, you can check whether an agent requires an action by checking `info`:
+```python
+obs, rew, done, info = env.step(actions)
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['action_required'][a]:
+        action_dict.update({a: ...})
+
+```
+Notice that `info['action_required'][a]`
+* if the agent breaks down (see stochasticity below) on entering the cell (no distance elpased in the cell), an action required as long as the agent is broken down;
+when it gets back to work, the action chosen just before will be taken and executed at the end of the cell; you may check whether the agent
+gets healthy again in the next step by checking `info['malfunction'][a] == 1`.
+* when the agent has spent enough time in the cell, the next cell may not be free and the agent has to wait.
+
+
+Since later versions of **Flat**land might have varying speeds during episodes.
+Therefore, we return the agents' speed - in your controller, you can get the agents' speed from the `info` returned by `step`:
+```python
+obs, rew, done, info = env.step(actions)
+...
+for a in range(env.get_num_agents()):
+    speed = info['speed'][a]
+```
+Notice that we do not guarantee that the speed will be computed at each step, but if not costly we will return it at each step.
+
+
+
+
+
+
+
+
+
+### RailEnv Malfunctioning / Stochasticity
+
+Stochastic events may happen during the episodes.
+This is very common for railway networks where the initial plan usually needs to be rescheduled during operations as minor events such as delayed departure from trainstations, malfunctions on trains or infrastructure or just the weather lead to delayed trains.
+
+We implemted a poisson process to simulate delays by stopping agents at random times for random durations. The parameters necessary for the stochastic events can be provided when creating the environment.
+
+```python
+## Use a the malfunction generator to break agents from time to time
+
+stochastic_data = {
+    'prop_malfunction': 0.5,  # Percentage of defective agents
+    'malfunction_rate': 30,  # Rate of malfunction occurence
+    'min_duration': 3,  # Minimal duration of malfunction
+    'max_duration': 10  # Max duration of malfunction
+}
+```
+
+The parameters are as follows:
+
+- `prop_malfunction` is the proportion of agents that can malfunction. `1.0` means that each agent can break.
+- `malfunction_rate` is the mean rate of the poisson process in number of environment steps.
+- `min_duration` and `max_duration` set the range of malfunction durations. They are sampled uniformly
+
+You can introduce stochasticity by simply creating the env as follows:
+
+```python
+env = RailEnv(
+    ...
+    stochastic_data=stochastic_data,  # Malfunction data generator
+    ...
+)
+env.reset()
+```
+In your controller, you can check whether an agent is malfunctioning:
+```python
+obs, rew, done, info = env.step(actions)
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['malfunction'][a] == 0:
+        action_dict.update({a: ...})
+
+## Custom observation builder
+tree_observation = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv())
+
+## Different agent types (trains) with different speeds.
+speed_ration_map = {1.: 0.25,  # Fast passenger train
+                    1. / 2.: 0.25,  # Fast freight train
+                    1. / 3.: 0.25,  # Slow commuter train
+                    1. / 4.: 0.25}  # Slow freight train
+
+env = RailEnv(width=50,
+              height=50,
+              rail_generator=sparse_rail_generator(num_cities=20,  # Number of cities in map (where train stations are)
+                                                   num_intersections=5,  # Number of intersections (no start / target)
+                                                   num_trainstations=15,  # Number of possible start/targets on map
+                                                   min_node_dist=3,  # Minimal distance of nodes
+                                                   node_radius=2,  # Proximity of stations to city center
+                                                   num_neighb=4,  # Number of connections to other cities/intersections
+                                                   seed=15,  # Random seed
+                                                   grid_mode=True,
+                                                   enhance_intersection=True
+                                                   ),
+              schedule_generator=sparse_schedule_generator(speed_ration_map),
+              number_of_agents=10,
+              stochastic_data=stochastic_data,  # Malfunction data generator
+              obs_builder_object=tree_observation)
+env.reset()
+```
+
+
+### Observation Builders
+Every `RailEnv` has an `obs_builder`. The `obs_builder` has full access to the `RailEnv`.
+The `obs_builder` is called in the `step()` function to produce the observations.
+
+```python
+env = RailEnv(
+    ...
+    obs_builder_object=TreeObsForRailEnv(
+        max_depth=2,
+       predictor=ShortestPathPredictorForRailEnv(max_depth=10)
+    ),
+    ...
+)
+env.reset()
+```
+
+The two principal observation builders provided are global and tree.
+
+#### Global Observation Builder
+`GlobalObsForRailEnv` gives a global observation of the entire rail environment.
+* transition map array with dimensions (env.height, env.width, 16), assuming 16 bits encoding of transitions.
+
+* Two 2D arrays (map_height, map_width, 2) containing respectively the position of the given agent target and the positions of the other agents targets.
+
+* A 3D array (map_height, map_width, 4) wtih
+  - first channel containing the agents position and direction
+  - second channel containing the other agents positions and diretions
+  - third channel containing agent malfunctions
+  - fourth channel containing agent fractional speeds
+
+#### Tree Observation Builder
+`TreeObsForRailEnv` computes the current observation for each agent.
+
+The observation vector is composed of 4 sequential parts, corresponding to data from the up to 4 possible
+movements in a `RailEnv` (up to because only a subset of possible transitions are allowed in RailEnv).
+The possible movements are sorted relative to the current orientation of the agent, rather than NESW as for
+the transitions. The order is:
+
+```console
+    [data from 'left'] + [data from 'forward'] + [data from 'right'] + [data from 'back']
+```
+
+Each branch data is organized as:
+
+```console
+    [root node information] +
+    [recursive branch data from 'left'] +
+    [... from 'forward'] +
+    [... from 'right] +
+    [... from 'back']
+```
+
+Each node information is composed of 9 features:
+
+1. if own target lies on the explored branch the current distance from the agent in number of cells is stored.
+
+2. if another agents target is detected the distance in number of cells from the agents current location
+    is stored
+
+3. if another agent is detected the distance in number of cells from current agent position is stored.
+
+4. possible conflict detected
+    tot_dist = Other agent predicts to pass along this cell at the same time as the agent, we store the
+     distance in number of cells from current agent position
+```console
+    0 = No other agent reserve the same cell at similar time
+```
+5. if an not usable switch (for agent) is detected we store the distance.
+
+6. This feature stores the distance in number of cells to the next branching  (current node)
+
+7. minimum distance from node to the agent's target given the direction of the agent if this path is chosen
+
+8. agent in the same direction
+```console
+    n = number of agents present same direction
+        (possible future use: number of other agents in the same direction in this branch)
+    0 = no agent present same direction
+```
+9. agent in the opposite direction
+```console
+    n = number of agents present other direction than myself (so conflict)
+        (possible future use: number of other agents in other direction in this branch, ie. number of conflicts)
+    0 = no agent present other direction than myself
+```
+
+10. malfunctioning/blokcing agents
+```console
+    n = number of time steps the oberved agent remains blocked
+```
+
+11. slowest observed speed of an agent in same direction
+```console
+    1 if no agent is observed
+
+    min_fractional speed otherwise
+```
+Missing/padding nodes are filled in with -inf (truncated).
+Missing values in present node are filled in with +inf (truncated).
+
+
+In case of the root node, the values are [0, 0, 0, 0, distance from agent to target, own malfunction, own speed]
+In case the target node is reached, the values are [0, 0, 0, 0, 0].
+
+
+### Predictors
+Predictors make predictions on future agents' moves based on the current state of the environment.
+They are decoupled from observation builders in order to be encapsulate the functionality and to make it re-usable.
+
+For instance, `TreeObsForRailEnv` optionally uses the predicted the predicted trajectories while exploring
+the branches of an agent's future moves to detect future conflicts.
+
+The general call structure is as follows:
+```python
+RailEnv.step()
+               -> ObservationBuilder.get_many()
+                                                ->  self.predictor.get()
+                                                    self.get()
+                                                    self.get()
+                                                    ...
+```
+
+
+### Maximum number of allowed time steps in an episode
+
+Whenever the schedule within RailEnv is generated, the maximum number of allowed time steps in an episode is calculated
+according to the following formula:
+
+```python
+
+RailEnv._max_episode_steps = timedelay_factor * alpha * (env.width + env.height + ratio_nr_agents_to_nr_cities)
+
+```
+
+where the following default values are used `timedelay_factor=4`, `alpha=2` and `ratio_nr_agents_to_nr_cities=20`
+
+If participants want to use their own formula they have to overwrite the method `compute_max_episode_steps()` from the class `RailEnv`
--- a/docs/specifications/rendering.md
+++ b/docs/specifications/rendering.md
+## Rendering Specifications
+
+### Scope
+This doc specifies the software to meet the requirements in the Visualization requirements doc.
+
+### References
+- [Visualization Requirements](visualization)
+- [Core Spec](./core)
+
+### Interfaces
+#### Interface with Environment Component
+
+- Environment produces the Env Snapshot data structure (TBD)
+- Renderer reads the Env Snapshot
+- Connection between Env and Renderer, either:
+    - Environment “invokes” the renderer in-process
+    - Renderer “connects” to the environment
+        - Eg Env acts as a server, Renderer as a client
+- Either
+    - The Env sends a Snapshot to the renderer and waits for rendering
+- Or:
+    - The Env puts snapshots into a rendering queue
+    - The renderer blocks / waits on the queue, waiting for a new snapshot to arrive
+        - If several snapshots are waiting, delete and skip them and just render the most recent
+        - Delete the snapshot after rendering
+- Optionally
+    - Render every frame / time step
+    - Or, render frames without blocking environment
+        - Render frames in separate process / thread
+
+##### Environment Snapshot
+
+#### Data Structure
+
+A definitions of the data structure is to be defined in Core requirements or Interfaces doc.
+
+
+
+##### Example only
+ 
+Top-level dictionary
+ - World nd-array
+    - Each element represents available transitions in a cell
+ - List of agents
+    - Agent location, orientation, movement (forward / stop / turn?)
+    - Observation
+        - Rectangular observation
+            - Maybe just dimensions - width + height (ie no need for contents)
+            - Can be highlighted in display as per minigrid
+        - Tree-based observation
+            - TBD
+
+#### Existing Tools / Libraries
+1. Pygame
+    1. Very easy to use. Like dead simple to add sprites etc. [Link](https://studywolf.wordpress.com/2015/03/06/arm-visualization-with pygame/)
+    2. No inbuilt support for threads/processes. Does get faster if using pypy/pysco.
+2. PyQt
+    1. Somewhat simple, a little more verbose to use the different modules.
+    2. Multi-threaded via QThread! Yay! (Doesn’t block main thread that does the real work), [Link](https://nikolak.com/pyqt-threading-tutorial/)
+
+##### How to structure the code
+
+1. Define draw functions/classes for each primitive
+    1. Primitives: Agents (Trains), Railroad, Grass, Houses etc.
+2. Background. Initialize the background before starting the episode.
+    1. Static objects in the scenes, directly draw those primitives once and cache.
+
+##### Proposed Interfaces
+To-be-filled
+
+#### Technical Graphics Considerations
+
+##### Overlay dynamic primitives over the background at each time step.
+
+No point trying to figure out changes. Need to explicitly draw every primitive anyways (that’s how these renders work).
--- a/docs/specifications/visualization.md
+++ b/docs/specifications/visualization.md
+## Visualization
+
+![logo](https://drive.google.com/uc?export=view&id=1rstqMPJXFJd9iD46z1A5Rus-W0Ww6O8i)
+
+
+### Introduction & Scope
+
+Broad requirements for human-viewable display of a single Flatland Environment.
+
+
+#### Context
+
+Shows this software component in relation to some of the other components.  We name the component the "Renderer".  Multiple agents interact with a single Environment.  A renderer interacts with the environment, and displays on screen, and/or into movie or image files.
+
+
+
+<p id="gdcalert2" ><span style="color: red; font-weight: bold">>>>>>  gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See <a href="https://github.com/evbacher/gd2md-html/wiki/Google-Drawings-by-reference">Google Drawings by reference</a> for details. The img URL below is a placeholder. </span><br>(<a href="#">Back to top</a>)(<a href="#gdcalert3">Next alert</a>)<br><span style="color: red; font-weight: bold">>>>>> </span></p>
+
+
+![drawing](https://docs.google.com/a/google.com/drawings/d/12345/export/png)
+
+
+### Requirements
+
+
+#### Primary Requirements
+
+
+
+1. Visualize or Render the state of the environment
+    1. Read an Environment + Agent Snapshot provided by the Environment component
+    2. Display onto a local screen in real-time (or near real-time)
+    3. Include all the agents
+    4. Illustrate the agent observations (typically subsets of the grid / world)
+    5. 2d-rendering only
+2. Output visualisation into movie / image files for use in later animation
+3. Should not impose control-flow constraints on Environment
+    6. Should not force env to respond to events
+    7. Should not drive the "main loop" of Inference or training 
+
+
+#### Secondary / Optional Requirements 
+
+
+
+1. During training (possibly across multiple processes or machines / OS instances), display a single training environment,
+    1. without holding up the other environments in the training.
+    2. Some training environments may be remote to the display machine (eg using GCP / AWS)
+    3. Attach to / detach from running environment / training cluster without restarting training.
+2. Provide a switch to make use of graphics / artwork provided by graphic artist
+    4. Fast / compact mode for general use
+    5. Beauty mode for publicity / demonstrations
+3. Provide a switch between smooth / continuous animation of an agent (slower) vs jumping from cell to cell (faster)
+    6. Smooth / continuous translation between cells
+    7. Smooth / continuous rotation 
+4. Speed - ideally capable of 60fps (see performance metrics)
+5. Window view - only render part of the environment, or a single agent and agents nearby.
+    8. May not be feasible to render very large environments
+    9. Possibly more than one window, ie one for each selected agent
+    10. Window(s) can be tied to agents, ie they move around with the agent, and optionally rotate with the agent.
+6. Interactive scaling
+    11. eg wide view, narrow / enlarged view
+    12. eg with mouse scrolling & zooming
+7. Minimize necessary skill-set for participants
+    13. Python API to gui toolkit, no need for C/C++
+8. View on various media:
+    14. Linux & Windows local display
+    15. Browser
+
+
+#### Performance Metrics
+
+Here are some performance metrics which the Renderer should meet.
+
+
+<table>
+  <tr>
+   <td>
+   </td>
+   <td><p style="text-align: right">
+### Per second</p>
+
+   </td>
+   <td><p style="text-align: right">
+Target Time (ms)</p>
+
+   </td>
+   <td><p style="text-align: right">
+Prototype time (ms)</p>
+
+   </td>
+  </tr>
+  <tr>
+   <td>Write an agent update (ie env as client providing an agent update)
+   </td>
+   <td>
+   </td>
+   <td><p style="text-align: right">
+0.1</p>
+
+   </td>
+   <td>
+   </td>
+  </tr>
+  <tr>
+   <td>Draw an environment window 20x20
+   </td>
+   <td><p style="text-align: right">
+60</p>
+
+   </td>
+   <td><p style="text-align: right">
+16</p>
+
+   </td>
+   <td>
+   </td>
+  </tr>
+  <tr>
+   <td>Draw an environment window 50 x 50
+   </td>
+   <td><p style="text-align: right">
+10</p>
+
+   </td>
+   <td>
+   </td>
+   <td>
+   </td>
+  </tr>
+  <tr>
+   <td>Draw an agent update on an existing environment window.  5 agents visible.
+   </td>
+   <td>
+   </td>
+   <td><p style="text-align: right">
+1</p>
+
+   </td>
+   <td>
+   </td>
+  </tr>
+</table>
+
+
+
+#### Example Visualization
+
+
+### Reference Documents
+
+Link to this doc: https://docs.google.com/document/d/1Y4Mw0Q6r8PEOvuOZMbxQX-pV2QKDuwbZJBvn18mo9UU/edit#
+
+
+#### Core Specification
+
+This specifies the system containing the environment and agents - this will be able to run independently of the renderer.
+
+[https://docs.google.com/document/d/1RN162b8wSfYTBblrdE6-Wi_zSgQTvVm6ZYghWWKn5t8/edit](https://docs.google.com/document/d/1RN162b8wSfYTBblrdE6-Wi_zSgQTvVm6ZYghWWKn5t8/edit)
+
+The data structure which the renderer needs to read initially resides here.
+
+
+#### Visualization Specification
+
+This will specify the software which will meet the requirements documented here.
+
+[https://docs.google.com/document/d/1XYOe_aUIpl1h_RdHnreACvevwNHAZWT0XHDL0HsfzRY/edit#](https://docs.google.com/document/d/1XYOe_aUIpl1h_RdHnreACvevwNHAZWT0XHDL0HsfzRY/edit#)
+
+
+#### Interface Specification
+
+This will specify the interfaces through which the different components communicate
+
+
+### Non-requirements - to be deleted below here.
+
+The below has been copied into the spec doc.    Comments may be lost.  I'm only preserving it to save the comments for a few days - they don't cut & paste into the other doc!
+
+
+#### Interface with Environment Component
+
+
+
+*   Environment produces the Env Snapshot data structure (TBD)
+*   Renderer reads the Env Snapshot
+*   Connection between Env and Renderer, either:
+    *   Environment "invokes" the renderer in-process
+    *   Renderer "connects" to the environment
+        *   Eg Env acts as a server, Renderer as a client
+*   Either
+    *   The Env sends a Snapshot to the renderer and waits for rendering
+*   Or:
+    *   The Env puts snapshots into a rendering queue
+    *   The renderer blocks / waits on the queue, waiting for a new snapshot to arrive
+        *   If several snapshots are waiting, delete and skip them and just render the most recent
+        *   Delete the snapshot after rendering
+*   Optionally
+    *   Render every frame / time step
+    *   Or, render frames without blocking environment
+        *   Render frames in separate process / thread
+
+
+###### Environment Snapshot
+
+**Data Structure**
+
+A definitions of the data structure is to be defined in Core requirements.
+
+It is a requirement of the Renderer component that it can read this data structure.
+
+**Example only**
+
+Top-level dictionary
+
+
+
+*   World nd-array
+    *   Each element represents available transitions in a cell
+*   List of agents
+    *   Agent location, orientation, movement (forward / stop / turn?)
+    *   Observation
+        *   Rectangular observation
+            *   Maybe just dimensions - width + height (ie no need for contents)
+            *   Can be highlighted in display as per minigrid
+        *   Tree-based observation
+            *   TBD
+
+
+#### Investigation into Existing Tools / Libraries
+
+
+
+1. Pygame
+    1. Very easy to use. Like dead simple to add sprites etc. ([https://studywolf.wordpress.com/2015/03/06/arm-visualization-with-pygame/](https://studywolf.wordpress.com/2015/03/06/arm-visualization-with-pygame/))
+    2. No inbuilt support for threads/processes. Does get faster if using pypy/pysco.
+2. PyQt
+    3. Somewhat simple, a little more verbose to use the different modules.
+    4. Multi-threaded via QThread! Yay! (Doesn't block main thread that does the real work), ([https://nikolak.com/pyqt-threading-tutorial/](https://nikolak.com/pyqt-threading-tutorial/))
+
+**How to structure the code**
+
+
+
+1. Define draw functions/classes for each primitive
+    1. Primitives: Agents (Trains), Railroad, Grass, Houses etc.
+2. Background. Initialize the background before starting the episode.
+    2. Static objects in the scenes, directly draw those primitives once and cache.
+
+**Proposed Interfaces**
+
+To-be-filled
+
+
+#### Technical Graphics Considerations
+
+
+###### Overlay dynamic primitives over the background at each time step.
+
+No point trying to figure out changes. Need to explicitly draw every primitive anyways (that's how these renders work).
--- a/docs/gettingstarted.rst
+++ b/docs/gettingstarted.rst
-=====
-Getting Started
-=====
+Getting Started Tutorial
+========================

 Overview
--------------
+--------

-Following are three short tutorials to help new users get acquainted with how 
-to create RailEnvs, how to train simple DQN agents on them, and how to customize 
+Following are three short tutorials to help new users get acquainted with how
+to create RailEnvs, how to train simple DQN agents on them, and how to customize
 them.

 To use flatland in a project:
@@ -16,20 +15,20 @@ To use flatland in a project:
    import flatland


-Part 1 : Basic Usage
--------------
+Simple Example 1 : Basic Usage
+------------------------------
+The basic usage of RailEnv environments consists in creating a RailEnv object
+endowed with a rail generator, that generates new rail networks on each reset,
+and an observation generator object, that is supplied with environment-specific
+information at each time step and provides a suitable observation vector to the
+agents. After the RailEnv environment is created, one need to call reset() on the
+environment in order to fully initialize the environment

-The basic usage of RailEnv environments consists in creating a RailEnv object 
-endowed with a rail generator, that generates new rail networks on each reset, 
-and an observation generator object, that is supplied with environment-specific 
-information at each time step and provides a suitable observation vector to the 
-agents.
+The simplest rail generators are envs.rail_generators.rail_from_manual_specifications_generator
+and envs.rail_generators.random_rail_generator.

-The simplest rail generators are envs.generators.rail_from_manual_specifications_generator 
-and envs.generators.random_rail_generator.
-
-The first one accepts a list of lists whose each element is a 2-tuple, whose 
-entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and 
+The first one accepts a list of lists whose each element is a 2-tuple, whose
+entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and
 the desired clockwise rotation of the cell contents (0, 90, 180 or 270 degrees).
 For example,

@@ -45,9 +44,10 @@ For example,
                  rail_generator=rail_from_manual_specifications_generator(specs),
                  number_of_agents=1,
                  obs_builder_object=TreeObsForRailEnv(max_depth=2))
+    env.reset()

-Alternatively, a random environment can be generated (optionally specifying 
-weights for each cell type to increase or decrease their proportion in the 
+Alternatively, a random environment can be generated (optionally specifying
+weights for each cell type to increase or decrease their proportion in the
 generated rail networks).

 .. code-block:: python
@@ -64,24 +64,27 @@ generated rail networks).
                              0.2,  # Case 8 - turn left
                              0.2,  # Case 9 - turn right
                              1.0]  # Case 10 - mirrored switch
-    
+
    # Example generate a random rail
    env = RailEnv(width=10,
                  height=10,
-                  rail_generator=random_rail_generator(cell_type_relative_proportion=transition_probability),
+                  rail_generator=random_rail_generator(
+                            cell_type_relative_proportion=transition_probability
+                            ),
                  number_of_agents=3,
                  obs_builder_object=TreeObsForRailEnv(max_depth=2))
+    env.reset()

 Environments can be rendered using the utils.rendertools utilities, for example:

 .. code-block:: python

    env_renderer = RenderTool(env)
-    env_renderer.renderEnv(show=True)
+    env_renderer.render_env(show=True)


-Finally, the environment can be run by supplying the environment step function 
-with a dictionary of actions whose keys are agents' handles (returned by 
+Finally, the environment can be run by supplying the environment step function
+with a dictionary of actions whose keys are agents' handles (returned by
 env.get_agent_handles() ) and the corresponding values the selected actions.
 For example, for a 2-agents environment:

@@ -91,39 +94,43 @@ For example, for a 2-agents environment:
    action_dict = {handles[0]:0, handles[1]:0}
    obs, all_rewards, done, _ = env.step(action_dict)

-where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents' 
-handles, whose values correspond to the relevant observations, rewards and terminal 
-status for each agent. Further, the 'dones' dictionary returns an extra key 
+where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents'
+handles, whose values correspond to the relevant observations, rewards and terminal
+status for each agent. Further, the 'dones' dictionary returns an extra key
 '__all__' that is set to True after all agents have reached their goals.


-In the specific case a TreeObsForRailEnv observation builder is used, it is 
-possible to print a representation of the returned observations with the 
+In the specific case a TreeObsForRailEnv observation builder is used, it is
+possible to print a representation of the returned observations with the
 following code. Also, tree observation data is displayed by RenderTool by default.

 .. code-block:: python

    for i in range(env.get_num_agents()):
-        env.obs_builder.util_print_obs_subtree(tree=obs[i], num_features_per_node=5)
+        env.obs_builder.util_print_obs_subtree(
+                tree=obs[i],
+                )

-The complete code for this part of the Getting Started guide can be found in 
-examples/simple_example_1.py, examples/simple_example_2.py and 
-examples/simple_example_3.py
+The complete code for this part of the Getting Started guide can be found in

+* `examples/simple_example_1.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_1.py>`_
+* `examples/simple_example_2.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_2.py>`_


 Part 2 : Training a Simple an Agent on Flatland
--------------
+---------------------------------------------------------
+
 This is a brief tutorial on how to train an agent on Flatland.
 Here we use a simple random agent to illustrate the process on how to interact with the environment.
 The corresponding code can be found in examples/training_example.py and in the baselines repository
-you find a tutorial to train a DQN agent to solve the navigation task.
+you find a tutorial to train a `DQN <https://arxiv.org/abs/1312.5602>`_ agent to solve the navigation task.

 We start by importing the necessary Flatland libraries

 .. code-block:: python

-    from flatland.envs.generators import complex_rail_generator
+    from flatland.envs.rail_generators import complex_rail_generator
+    from flatland.envs.schedule_generators import complex_schedule_generator
    from flatland.envs.rail_env import RailEnv

 The complex_rail_generator is used in order to guarantee feasible railway network configurations for training.
@@ -131,30 +138,36 @@ Next we configure the difficulty of our task by modifying the complex_rail_gener

 .. code-block:: python

-    env = RailEnv(width=15,
-              height=15,
-              rail_generator=complex_rail_generator(nr_start_goal=10, nr_extra=10, min_dist=10, max_dist=99999, seed=0),
-              number_of_agents=5)
-              
-The difficulty of a railway network depends on the dimensions (width x height) and the number of agents in the network.
+    env = RailEnv(  width=15,
+                    height=15,
+                    rail_generator=complex_rail_generator(
+                                        nr_start_goal=10,
+                                        nr_extra=10,
+                                        min_dist=10,
+                                        max_dist=99999,
+                                        seed=1),
+                    number_of_agents=5)
+    env.reset()
+
+The difficulty of a railway network depends on the dimensions (`width` x `height`) and the number of agents in the network.
 By varying the number of start and goal connections (nr_start_goal) and the number of extra railway elements added (nr_extra)
 the number of alternative paths of each agents can be modified. The more possible paths an agent has to reach its target the easier the task becomes.
 Here we don't specify any observation builder but rather use the standard tree observation. If you would like to use a custom obervation please follow
- the instructions in the next tutorial.
-Feel free to vary these parameters to see how your own agent holds up on different setting. The evalutation set of railway configurations will 
+the instructions in the next tutorial.
+Feel free to vary these parameters to see how your own agent holds up on different setting. The evalutation set of railway configurations will
 cover the whole spectrum from easy to complex tasks.

 Once we are set with the environment we can load our preferred agent from either RLlib or any other ressource. Here we use a random agent to illustrate the code.

 .. code-block:: python

-    agent = RandomAgent(env.action_space, env.observation_space)
+    agent = RandomAgent(state_size, action_size)

 We start every trial by resetting the environment

 .. code-block:: python

-    obs = env.reset()
+    obs, info = env.reset()

 Which provides the initial observation for all agents (obs = array of all observations).
 In order for the environment to step forward in time we need a dictionar of actions for all active agents.
@@ -170,80 +183,8 @@ This dictionary is then passed to the environment which checks the validity of a
 .. code-block:: python

    next_obs, all_rewards, done, _ = env.step(action_dict)
-    
+
 The environment returns an array of new observations, reward dictionary for all agents as well as a flag for which agents are done.
 This information can be used to update the policy of your agent and if done['__all__'] == True the episode terminates.

-Part 3 : Customizing Observations and Level Generators
--------------
-
-Example code for generating custom observations given a RailEnv and to generate 
-random rail maps are available in examples/custom_observation_example.py and 
-examples/custom_railmap_example.py .
-
-Custom observations can be produced by deriving a new object from the 
-core.env_observation_builder.ObservationBuilder base class, for example as follows:
-
-.. code-block:: python
-
-    class CustomObs(ObservationBuilder):
-        def __init__(self):
-            self.observation_space = [5]
-    
-        def reset(self):
-            return
-    
-        def get(self, handle):
-            observation = handle*np.ones((5,))
-            return observation
-
-It is important that an observation_space is defined with a list of dimensions 
-of the returned observation tensors. get() returns the observation for each agent, 
-of handle 'handle'.
-
-A RailEnv environment can then be created as usual:
-
-.. code-block:: python
-
-    env = RailEnv(width=7,
-                  height=7,
-                  rail_generator=random_rail_generator(),
-                  number_of_agents=3,
-                  obs_builder_object=CustomObs())
-
-As for generating custom rail maps, the RailEnv class accepts a rail_generator 
-argument that must be a function with arguments 'width', 'height', 'num_agents', 
-and 'num_resets=0', and that has to return a GridTransitionMap object (the rail map),
-and three lists of tuples containing the (row,column) coordinates of each of 
-num_agent agents, their initial orientation (0=North, 1=East, 2=South, 3=West), 
-and the position of their targets.
-
-For example, the following custom rail map generator returns an empty map of 
-size (height, width), with no agents (regardless of num_agents):
-
-.. code-block:: python
-
-    def custom_rail_generator():
-        def generator(width, height, num_agents=0, num_resets=0):
-            rail_trans = RailEnvTransitions()
-            grid_map = GridTransitionMap(width=width, height=height, transitions=rail_trans)
-            rail_array = grid_map.grid
-            rail_array.fill(0)
-    
-            agents_positions = []
-            agents_direction = []
-            agents_target = []
-    
-            return grid_map, agents_positions, agents_direction, agents_target
-        return generator
-
-It is worth to note that helpful utilities to manage RailEnv environments and their 
-related data structures are available in 'envs.env_utils'. In particular, 
-envs.env_utils.get_rnd_agents_pos_tgt_dir_on_rail is fairly handy to fill in 
-random (but consistent) agents along with their targets and initial directions, 
-given a rail map (GridTransitionMap object) and the desired number of agents:
-
-.. code-block:: python
-    agents_position, agents_direction, agents_target = get_rnd_agents_pos_tgt_dir_on_rail(
-        rail_map,
-        num_agents)
+The full source code of this example can be found in `examples/training_example.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/training_example.py>`_.
--- a/docs/tutorials/02_observationbuilder.rst
+++ b/docs/tutorials/02_observationbuilder.rst
+Custom observations and custom predictors Tutorial
+==================================================
+
+Overview
+--------
+
+One of the main objectives of the Flatland-Challenge_ is to find a suitable observation (relevant features for the problem at hand) to solve the task. Therefore **Flatland** was built with as much flexibility as possible when it comes to building your custom observations: observations in Flatland environments are fully customizable.
+Whenever an environment needs to compute new observations for each agent, it queries an object derived from the :code:`ObservationBuilder` base class, which takes the current state of the environment and returns the desired observation.
+
+
+.. _Flatland-Challenge: https://www.aicrowd.com/challenges/flatland-challenge
+
+Example 1 : Simple (but useless) observation
+--------------------------------------------
+In this first example we implement all the functions necessary for the observation builder to be valid and work with **Flatland**.
+Custom observation builder objects need to derive from the `flatland.core.env_observation_builder.ObservationBuilder`_
+base class and must implement two methods, :code:`reset(self)` and :code:`get(self, handle)`.
+
+.. _`flatland.core.env_observation_builder.ObservationBuilder` : https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/core/env_observation_builder.py#L13
+
+Below is a simple example that returns observation vectors of size 5 featuring only the ID (handle) of the agent whose
+observation vector is being computed:
+
+.. code-block:: python
+
+    class SimpleObs(ObservationBuilder):
+        """
+        Simplest observation builder. The object returns observation vectors with 5 identical components,
+        all equal to the ID of the respective agent.
+        """
+
+        def reset(self):
+            return
+
+        def get(self, handle):
+            observation = handle * np.ones(5)
+            return observation
+
+We can pass an instance of our custom observation builder :code:`SimpleObs` to the :code:`RailEnv` creator as follows:
+
+.. code-block:: python
+
+    env = RailEnv(width=7,
+                  height=7,
+                  rail_generator=random_rail_generator(),
+                  number_of_agents=3,
+                  obs_builder_object=SimpleObs())
+    env.reset()
+
+Anytime :code:`env.reset()` or :code:`env.step()` is called, the observation builder will return the custom observation of all agents initialized in the env.
+In the next example we highlight how to derive from existing observation builders and how to access internal variables of **Flatland**.
+
+
+Example 2 : Single-agent navigation
+-------------------------------------
+
+Observation builder objects can of course derive from existing concrete subclasses of ObservationBuilder.
+For example, it may be useful to extend the TreeObsForRailEnv_ observation builder.
+A feature of this class is that on :code:`reset()`, it pre-computes the lengths of the shortest paths from all
+cells and orientations to the target of each agent, i.e. a distance map for each agent.
+
+In this example we exploit these distance maps by implementing an observation builder that shows the current shortest path for each agent as a one-hot observation vector of length 3, whose components represent the possible directions an agent can take (LEFT, FORWARD, RIGHT). All values of the observation vector are set to :code:`0` except for the shortest direction where it is set to :code:`1`.
+
+Using this observation with highly engineered features indicating the agent's shortest path, an agent can then learn to take the corresponding action at each time-step; or we could even hardcode the optimal policy.
+Note that this simple strategy fails when multiple agents are present, as each agent would only attempt its greedy solution, which is not usually `Pareto-optimal <https://en.wikipedia.org/wiki/Pareto_efficiency>`_ in this context.
+
+.. _TreeObsForRailEnv: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py#L14
+
+.. code-block:: python
+
+    from flatland.envs.observations import TreeObsForRailEnv
+
+    class SingleAgentNavigationObs(TreeObsForRailEnv):
+        """
+        We derive our observation builder from TreeObsForRailEnv, to exploit the existing implementation to compute
+        the minimum distances from each grid node to each agent's target.
+
+        We then build a representation vector with 3 binary components, indicating which of the 3 available directions
+        for each agent (Left, Forward, Right) lead to the shortest path to its target.
+        E.g., if taking the Left branch (if available) is the shortest route to the agent's target, the observation vector
+        will be [1, 0, 0].
+        """
+        def __init__(self):
+            super().__init__(max_depth=0)
+            # We set max_depth=0 in because we only need to look at the current
+            # position of the agent to decide what direction is shortest.
+
+        def reset(self):
+            # Recompute the distance map, if the environment has changed.
+            super().reset()
+
+        def get(self, handle):
+            # Here we access agent information from the environment.
+            # Information from the environment can be accessed but not changed!
+            agent = self.env.agents[handle]
+
+            possible_transitions = self.env.rail.get_transitions(*agent.position, agent.direction)
+            num_transitions = np.count_nonzero(possible_transitions)
+
+            # Start from the current orientation, and see which transitions are available;
+            # organize them as [left, forward, right], relative to the current orientation
+            # If only one transition is possible, the forward branch is aligned with it.
+            if num_transitions == 1:
+                observation = [0, 1, 0]
+            else:
+                min_distances = []
+                for direction in [(agent.direction + i) % 4 for i in range(-1, 2)]:
+                    if possible_transitions[direction]:
+                        new_position = self._new_position(agent.position, direction)
+                        min_distances.append(self.env.distance_map.get()[handle, new_position[0], new_position[1], direction])
+                    else:
+                        min_distances.append(np.inf)
+
+                observation = [0, 0, 0]
+                observation[np.argmin(min_distances)] = 1
+
+            return observation
+
+    env = RailEnv(width=7,
+                  height=7,
+                  rail_generator=complex_rail_generator(nr_start_goal=10, nr_extra=1, \
+                    min_dist=8, max_dist=99999, seed=1),
+                  number_of_agents=2,
+                  obs_builder_object=SingleAgentNavigationObs())
+    env.reset()
+
+    obs, all_rewards, done, _ = env.step({0: 0, 1: 1})
+    for i in range(env.get_num_agents()):
+        print(obs[i])
+
+Finally, the following is an example of hard-coded navigation for single agents that achieves optimal single-agent
+navigation to target, and shows the path taken as an animation.
+
+.. code-block:: python
+
+    env = RailEnv(width=50,
+                  height=50,
+                  rail_generator=random_rail_generator(),
+                  number_of_agents=1,
+                  obs_builder_object=SingleAgentNavigationObs())
+    env.reset()
+
+    obs, all_rewards, done, _ = env.step({0: 0})
+
+    env_renderer = RenderTool(env, gl="PILSVG")
+    env_renderer.render_env(show=True, frames=True, show_observations=False)
+
+    for step in range(100):
+        action = np.argmax(obs[0])+1
+        obs, all_rewards, done, _ = env.step({0:action})
+        print("Rewards: ", all_rewards, "  [done=", done, "]")
+
+        env_renderer.render_env(show=True, frames=True, show_observations=False)
+        time.sleep(0.1)
+
+The code examples above appear in the example file `custom_observation_example.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/custom_observation_example.py>`_. You can run it using :code:`python examples/custom_observation_example.py` from the root folder of the flatland repo.  The two examples are run one after the other.
+
+Example 3 : Using custom predictors and rendering observation
+-------------------------------------------------------------
+
+Because the re-scheduling task of the Flatland-Challenge_ requires some short time planning we allow the possibility to use custom predictors that help predict upcoming conflicts and help agent solve them in a timely manner.
+In the **Flatland Environment** we included an initial predictor ShortestPathPredictorForRailEnv_ to give you an idea what you can do with these predictors.
+
+Any custom predictor can be passed to the observation builder and then be used to build the observation. In this example_ we illustrate how an observation builder can be used to detect conflicts using a predictor.
+
+The observation is incomplete as it only contains information about potential conflicts and has no feature about the agent objectives.
+
+In addition to using your custom predictor you can also make your custom observation ready for rendering. (This can be done in a similar way for your predictor).
+All you need to do in order to render your custom observation is to populate  :code:`self.env.dev_obs_dict[handle]` for every agent (all handles). (For the predictor use  :code:`self.env.dev_pred_dict[handle]`).
+
+In contrast to the previous examples we also implement the :code:`def get_many(self, handles=None)` function for this custom observation builder. The reasoning here is that we want to call the predictor only once per :code:`env.step()`. The base implementation of :code:`def get_many(self, handles=None)` will call the :code:`get(handle)` function for all handles, which mean that it normally does not need to be reimplemented, except for cases as the one below.
+
+.. _ShortestPathPredictorForRailEnv: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/predictions.py#L81
+.. _example: https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/custom_observation_example.py#L110
+
+.. code-block:: python
+
+    class ObservePredictions(TreeObsForRailEnv):
+        """
+        We use the provided ShortestPathPredictor to illustrate the usage of predictors in your custom observation.
+
+        We derive our observation builder from TreeObsForRailEnv, to exploit the existing implementation to compute
+        the minimum distances from each grid node to each agent's target.
+
+        This is necessary so that we can pass the distance map to the ShortestPathPredictor
+
+        Here we also want to highlight how you can visualize your observation
+        """
+
+        def __init__(self, predictor):
+            super().__init__(max_depth=0)
+            self.predictor = predictor
+
+        def reset(self):
+            # Recompute the distance map, if the environment has changed.
+            super().reset()
+
+        def get_many(self, handles=None):
+            '''
+            Because we do not want to call the predictor seperately for every agent we implement the get_many function
+            Here we can call the predictor just ones for all the agents and use the predictions to generate our observations
+            :param handles:
+            :return:
+            '''
+
+            self.predictions = self.predictor.get()
+
+            self.predicted_pos = {}
+            for t in range(len(self.predictions[0])):
+                pos_list = []
+                for a in handles:
+                    pos_list.append(self.predictions[a][t][1:3])
+                # We transform (x,y) coodrinates to a single integer number for simpler comparison
+                self.predicted_pos.update({t: coordinate_to_position(self.env.width, pos_list)})
+            observations = {}
+
+            # Collect all the different observation for all the agents
+            for h in handles:
+                observations[h] = self.get(h)
+            return observations
+
+        def get(self, handle):
+            '''
+            Lets write a simple observation which just indicates whether or not the own predicted path
+            overlaps with other predicted paths at any time. This is useless for the task of navigation but might
+            help when looking for conflicts. A more complex implementation can be found in the TreeObsForRailEnv class
+
+            Each agent recieves an observation of length 10, where each element represents a prediction step and its value
+            is:
+             - 0 if no overlap is happening
+             - 1 where n i the number of other paths crossing the predicted cell
+
+            :param handle: handeled as an index of an agent
+            :return: Observation of handle
+            '''
+
+            observation = np.zeros(10)
+
+            # We are going to track what cells where considered while building the obervation and make them accesible
+            # For rendering
+
+            visited = set()
+            for _idx in range(10):
+                # Check if any of the other prediction overlap with agents own predictions
+                x_coord = self.predictions[handle][_idx][1]
+                y_coord = self.predictions[handle][_idx][2]
+
+                # We add every observed cell to the observation rendering
+                visited.add((x_coord, y_coord))
+                if self.predicted_pos[_idx][handle] in np.delete(self.predicted_pos[_idx], handle, 0):
+                    # We detect if another agent is predicting to pass through the same cell at the same predicted time
+                    observation[handle] = 1
+
+            # This variable will be access by the renderer to visualize the observation
+            self.env.dev_obs_dict[handle] = visited
+
+            return observation
+
+We can then use this new observation builder and the renderer to visualize the observation of each agent.
+
+
+.. code-block:: python
+
+    # Initiate the Predictor
+    CustomPredictor = ShortestPathPredictorForRailEnv(10)
+
+    # Pass the Predictor to the observation builder
+    CustomObsBuilder = ObservePredictions(CustomPredictor)
+
+    # Initiate Environment
+    env = RailEnv(width=10,
+                  height=10,
+                  rail_generator=complex_rail_generator(nr_start_goal=5, nr_extra=1, min_dist=8, max_dist=99999, seed=1),
+                  number_of_agents=3,
+                  obs_builder_object=CustomObsBuilder)
+    env.reset()
+
+    obs, info = env.reset()
+    env_renderer = RenderTool(env, gl="PILSVG")
+
+    # We render the initial step and show the obsered cells as colored boxes
+    env_renderer.render_env(show=True, frames=True, show_observations=True, show_predictions=False)
+
+    action_dict = {}
+    for step in range(100):
+        for a in range(env.get_num_agents()):
+            action = np.random.randint(0, 5)
+            action_dict[a] = action
+        obs, all_rewards, done, _ = env.step(action_dict)
+        print("Rewards: ", all_rewards, "  [done=", done, "]")
+        env_renderer.render_env(show=True, frames=True, show_observations=True, show_predictions=False)
+        time.sleep(0.5)
+
+How to access environment and agent data for observation builders
+------------------------------------------------------------------
+
+When building your custom observation builder, you might want to aggregate and define your own features that are different from the raw env data. In this section we introduce how such information can be accessed and how you can build your own features out of them.
+
+Transitions maps
+~~~~~~~~~~~~~~~~
+
+The transition maps build the base for all movement in the environment. They contain all the information about allowed transitions for the agent at any given position. Because railway movement is limited to the railway tracks, these are important features for any controller that want to interact with the environment. All functionality and features of a transition map can be found here_.
+
+.. _here: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/core/transition_map.py
+
+**Get Transitions for cell**
+
+To access the possible transitions at any given cell there are different possibilites:
+
+1. You provide a cell position and a orientation in that cell (usually the orientation of the agent) and call :code:`cell_transitions = env.rail.get_transitions(*position, direction)` and in return you get a 4d vector with the transition probability ordered as :code:`[North, East, South, West]` given the initial orientation. The position is a tuple of the form :code:`(x, y)` where :code:`x in [0, height]` and :code:`y in [0, width]`. This can be used for branching in a tree search and when looking for all possible allowed paths of an agent as it will provide a simple way to get the possible trajectories.
+
+2. When more detailed information about the cell in general is necessary you can also get the full transitions of a cell by calling :code:`transition_int = env.rail.get_full_transitions(*position)`. This will return an :code:`int16` for the cell representing the allowed transitions. To understand the transitions returned it is best to represent it as a binary number :code:`bin(transition_int)`, where the bits have to following meaning: :code:`NN NE NS NW EN EE ES EW SN SE SS SW WN WE WS WW`. For example the binary code :code:`1000 0000 0010 0000`, represents a straigt where an agent facing north can transition north and an agent facing south can transition south and no other transitions are possible. To get a better feeling what the binary representations of the elements look like go to this Link_
+
+.. _Link: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/core/grid/rail_env_grid.py#L29
+
+
+These two objects can be used for example to detect switches that are usable by other agents but not the observing agent itself. This can be an important feature when actions have to be taken in order to avoid conflicts.
+
+.. code-block:: python
+
+    cell_transitions = self.env.rail.get_transitions(*position, direction)
+    transition_bit = bin(self.env.rail.get_full_transitions(*position))
+
+    total_transitions = transition_bit.count("1")
+    num_transitions = np.count_nonzero(cell_transitions)
+
+    # Detect Switches that can only be used by other agents.
+    if total_transitions > 2 > num_transitions:
+        unusable_switch_detected = True
+
+
+Agent information
+~~~~~~~~~~~~~~~~~~
+
+The agents are represented as an agent class and are provided when the environment is instantiated. Because agents can have different properties it is helpful to know how to access this information.
+
+You can simply acces the three main types of agent information in the following ways with :code:`agent = env.agents[handle]`:
+
+**Agent basic information**
+All the agent in the initiated environment can be found in the :code:`env.agents` class. Given the index of the agent you have acces to:
+
+- Agent position :code:`agent.position` which returns the current coordinates :code:`(x, y)` of the agent.
+- Agent target :code:`agent.target`  which returns the target coordinates :code:`(x, y)`.
+- Agent direction :code:`agent.direction` which is an int representing the current orientation :code:`{0: North, 1: East, 2: South, 3: West}`
+- Agent moving :code:`agent.moving` where 0 means the agent is currently not moving and 1 indicates agent is moving.
+
+**Agent speed information**
+
+Beyond the basic agent information we can also access more details about the agents type by looking at speed data:
+
+- Agent max speed :code:`agent.speed_data["speed"]` wich defines the traveling speed when the agent is moving.
+- Agent position fraction :code:`agent.speed_data["position_fraction"]` which is a number between 0 and 1 and indicates when the move to the next cell will occur. Each speed of an agent is 1 or a smaller fraction. At each :code:`env.step()` the agent moves at its fractional speed forwards and only changes to the next cell when the cumulated fractions are :code:`agent.speed_data["position_fraction"] >= 1.`
+- Agent can move at different speed which can be set up by modifying the agent.speed_data within the schedule_generator. For example refer this _Link_Schedule_Generators.
+
+.. _Link_Schedule_Generators: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/schedule_generators.py#L59
+
+**Agent malfunction information**
+
+Similar to the speed data you can also access individual data about the malfunctions of an agent. All data is available through :code:`agent.malfunction_data` with:
+
+- Indication how long the agent is still malfunctioning :code:`'malfunction'` by an integer counting down at each time step. 0 means the agent is ok and can move.
+- Possion rate at which malfunctions happen for this agent :code:`'malfunction_rate'`
+- Number of steps untill next malfunction will occur :code:`'next_malfunction'`
+- Number of malfunctions an agent have occured for this agent so far :code:`nr_malfunctions'`
+
--- a/docs/tutorials/03_rail_and_schedule_generator.md
+++ b/docs/tutorials/03_rail_and_schedule_generator.md
+# Level Generation Tutorial
+
+We are currently working on different new level generators and you can expect that the levels in the submission testing will not all come from just one but rather different level generators to be sure that the controllers can handle any railway specific challenge.
+
+Let's have a look at the `sparse_rail_generator`.
+
+## Sparse Rail Generator
+![Example_Sparse](https://i.imgur.com/DP8sIyx.png)
+
+The idea behind the sparse rail generator is to mimic classic railway structures where dense nodes (cities) are sparsely connected to each other and where you have to manage traffic flow between the nodes efficiently.
+The cities in this level generator are much simplified in comparison to real city networks but it mimics parts of the problems faced in daily operations of any railway company.
+
+There are a few parameters you can tune to build your own map and test different complexity levels of the levels.
+**Warning** some combinations of parameters do not go well together and will lead to infeasible level generation.
+In the worst case, the level generator currently issues a warning when it cannot build the environment according to the parameters provided.
+This will lead to a crash of the whole env.
+We are currently working on improvements here and are **happy for any suggestions from your side**.
+
+To build an environment you instantiate a `RailEnv` as follows:
+
+```python
+ Initialize the generator
+rail_generator=sparse_rail_generator(
+    num_cities=10,  # Number of cities in map
+    num_intersections=10,  # Number of interesections in map
+    num_trainstations=50,  # Number of possible start/targets on map
+    min_node_dist=6,  # Minimal distance of nodes
+    node_radius=3,  # Proximity of stations to city center
+    num_neighb=3,  # Number of connections to other cities
+    seed=5,  # Random seed
+    grid_mode=False  # Ordered distribution of nodes
+)
+
+ Build the environment
+env = RailEnv(
+    width=50,
+    height=50,
+    rail_generator=rail_generator
+    schedule_generator=sparse_schedule_generator(),
+    number_of_agents=10,
+    obs_builder_object=TreeObsForRailEnv(max_depth=3,predictor=shortest_path_predictor)
+)
+ Call reset on the environment
+env.reset()
+```
+
+You can see that you now need both a `rail_generator` and a `schedule_generator` to generate a level. These need to work nicely together. The `rail_generator` will only generate the railway infrastructure and provide hints to the `schedule_generator` about where to place agents. The `schedule_generator` will then generate a schedule, meaning it places agents at different train stations and gives them tasks by providing individual targets.
+
+You can tune the following parameters in the `sparse_rail_generator`:
+
+- `num_cities` is the number of cities on a map. Cities are the only nodes that can host start and end points for agent tasks (Train stations). Here you have to be carefull that the number is not too high as all the cities have to fit on the map. When `grid_mode=False` you have to be carefull when chosing `min_node_dist` because leves will fails if not all cities (and intersections) can be placed with at least `min_node_dist` between them.
+- `num_intersections` is the number of nodes that don't hold any trainstations. They are also the first priority that a city connects to. We use these to allow for sparse connections between cities.
+- `num_trainstations` defines the *Total* number of trainstations in the network. This also sets the max number of allowed agents in the environment. This is also a delicate parameter as there is only a limitid amount of space available around nodes and thus if the number is too high the level generation will fail. *Important*: Only the number of agents provided to the environment will actually produce active train stations. The others will just be present as dead-ends (See figures below).
+- `min_node_dist` is only used if `grid_mode=False` and represents the minimal distance between two nodes.
+- `node_radius` defines the extent of a city. Each trainstation is placed at a distance to the closes city node that is smaller or equal to this number.
+- `num_neighb`defines the number of neighbouring nodes that connect to each other. Thus this changes the connectivity and thus the amount of alternative routes in the network.
+- `grid_mode` True -> Nodes evenly distriubted in env, False-> Random distribution of nodes
+- `enhance_intersection`: True -> Extra rail elements added at intersections
+- `seed` is used to initialize the random generator
+
+
+If you run into any bugs with sets of parameters please let us know.
+
+Here is a network with `grid_mode=False` and the parameters from above.
+
+![sparse_random](https://i.imgur.com/Xg7nifF.png)
+
+and here with `grid_mode=True`
+
+![sparse_ordered](https://i.imgur.com/jyA7Pt4.png)
+
+## Example code
+
+To see all the changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/flatland_2_0_example.py).
--- a/docs/tutorials/04_stochasticity.md
+++ b/docs/tutorials/04_stochasticity.md
+# Stochasticity Tutorial
+
+Another area where we improved **Flat**land 2.0 are stochastic events added during the episodes.
+This is very common for railway networks where the initial plan usually needs to be rescheduled during operations as minor events such as delayed departure from trainstations, malfunctions on trains or infrastructure or just the weather lead to delayed trains.
+
+We implemted a poisson process to simulate delays by stopping agents at random times for random durations. The parameters necessary for the stochastic events can be provided when creating the environment.
+
+```python
+# Use a the malfunction generator to break agents from time to time
+
+stochastic_data = {
+    'prop_malfunction': 0.5,  # Percentage of defective agents
+    'malfunction_rate': 30,  # Rate of malfunction occurence
+    'min_duration': 3,  # Minimal duration of malfunction
+    'max_duration': 10  # Max duration of malfunction
+}
+```
+
+The parameters are as follows:
+
+- `prop_malfunction` is the proportion of agents that can malfunction. `1.0` means that each agent can break.
+- `malfunction_rate` is the mean rate of the poisson process in number of environment steps.
+- `min_duration` and `max_duration` set the range of malfunction durations. They are sampled uniformly
+
+You can introduce stochasticity by simply creating the env as follows:
+
+```python
+env = RailEnv(
+    ...
+    stochastic_data=stochastic_data,  # Malfunction data generator
+    ...
+)
+```
+In your controller, you can check whether an agent is malfunctioning:
+```python
+obs, rew, done, info = env.step(actions)
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['malfunction'][a] == 0:
+        action_dict.update({a: ...})
+
+# Custom observation builder
+tree_observation = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv())
+
+# Different agent types (trains) with different speeds.
+speed_ration_map = {1.: 0.25,  # Fast passenger train
+                    1. / 2.: 0.25,  # Fast freight train
+                    1. / 3.: 0.25,  # Slow commuter train
+                    1. / 4.: 0.25}  # Slow freight train
+
+env = RailEnv(width=50,
+              height=50,
+              rail_generator=sparse_rail_generator(num_cities=20,  # Number of cities in map (where train stations are)
+                                                   num_intersections=5,  # Number of intersections (no start / target)
+                                                   num_trainstations=15,  # Number of possible start/targets on map
+                                                   min_node_dist=3,  # Minimal distance of nodes
+                                                   node_radius=2,  # Proximity of stations to city center
+                                                   num_neighb=4,  # Number of connections to other cities/intersections
+                                                   seed=15,  # Random seed
+                                                   grid_mode=True,
+                                                   enhance_intersection=True
+                                                   ),
+              schedule_generator=sparse_schedule_generator(speed_ration_map),
+              number_of_agents=10,
+              stochastic_data=stochastic_data,  # Malfunction data generator
+              obs_builder_object=tree_observation)
+env.reset()
+```
+
+You will quickly realize that this will lead to unforeseen difficulties which means that **your controller** needs to observe the environment at all times to be able to react to the stochastic events.
+
+## Example code
+
+To see all the changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/flatland_2_0_example.py).
--- a/docs/tutorials/05_multispeed.md
+++ b/docs/tutorials/05_multispeed.md
+# Different speed profiles Tutorial
+
+One of the main contributions to the complexity of railway network operations stems from the fact that all trains travel at different speeds while sharing a very limited railway network.
+In **Flat**land 2.0 this feature will be enabled as well and will lead to much more complex configurations. Here we count on your support if you find bugs or improvements  :).
+
+The different speed profiles can be generated using the `schedule_generator`, where you can actually chose as many different speeds as you like.
+Keep in mind that the *fastest speed* is 1 and all slower speeds must be between 1 and 0.
+For the submission scoring you can assume that there will be no more than 5 speed profiles.
+
+
+
+Later versions of **Flat**land might have varying speeds during episodes. Therefore, we return the agent speeds.
+Notice that we do not guarantee that the speed will be computed at each step, but if not costly we will return it at each step.
+In your controller, you can get the agents' speed from the `info` returned by `step`:
+```python
+obs, rew, done, info = env.step(actions)
+...
+for a in range(env.get_num_agents()):
+    speed = info['speed'][a]
+```
+
+## Actions and observation with different speed levels
+
+Because the different speeds are implemented as fractions the agents ability to perform actions has been updated.
+We **do not allow actions to change within the cell **.
+This means that each agent can only chose an action to be taken when entering a cell.
+This action is then executed when a step to the next cell is valid. For example
+
+- Agent enters switch and choses to deviate left. Agent fractional speed is 1/4 and thus the agent will take 4 time steps to complete its journey through the cell. On the 4th time step the agent will leave the cell deviating left as chosen at the entry of the cell.
+    - All actions chosen by the agent during its travels within a cell are ignored
+    - Agents can make observations at any time step. Make sure to discard observations without any information. See this [example](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/training_navigation.py) for a simple implementation.
+- The environment checks if agent is allowed to move to next cell only at the time of the switch to the next cell
+
+In your controller, you can check whether an agent requires an action by checking `info`:
+```python
+obs, rew, done, info = env.step(actions)
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['action_required'][a] and info['malfunction'][a] == 0:
+        action_dict.update({a: ...})
+
+```
+Notice that `info['action_required'][a]` does not mean that the action will have an effect:
+if the next cell is blocked or the agent breaks down, the action cannot be performed and an action will be required again in the next step.
+
+## Rail Generators and Schedule Generators
+The separation between rail generator and schedule generator reflects the organisational separation in the railway domain
+- Infrastructure Manager (IM): is responsible for the layout and maintenance of tracks
+- Railway Undertaking (RU): operates trains on the infrastructure
+Usually, there is a third organisation, which ensures discrimination-free access to the infrastructure for concurrent requests for the infrastructure in a **schedule planning phase**.
+However, in the **Flat**land challenge, we focus on the re-scheduling problem during live operations.
+
+Technically,
+```python
+RailGeneratorProduct = Tuple[GridTransitionMap, Optional[Any]]
+RailGenerator = Callable[[int, int, int, int], RailGeneratorProduct]
+
+AgentPosition = Tuple[int, int]
+Schedule = collections.namedtuple('Schedule',   'agent_positions '
+                                                'agent_directions '
+                                                'agent_targets '
+                                                'agent_speeds '
+                                                'agent_malfunction_rates '
+                                                'max_episode_steps')
+ScheduleGenerator = Callable[[GridTransitionMap, int, Optional[Any], Optional[int]], Schedule]
+```
+
+We can then produce `RailGenerator`s by currying:
+```python
+def sparse_rail_generator(num_cities=5, num_intersections=4, num_trainstations=2, min_node_dist=20, node_radius=2,
+                          num_neighb=3, grid_mode=False, enhance_intersection=False, seed=1):
+
+    def generator(width, height, num_agents, num_resets=0):
+
+        # generate the grid and (optionally) some hints for the schedule_generator
+        ...
+
+        return grid_map, {'agents_hints': {
+            'num_agents': num_agents,
+            'agent_start_targets_nodes': agent_start_targets_nodes,
+            'train_stations': train_stations
+        }}
+
+    return generator
+```
+And, similarly, `ScheduleGenerator`s:
+```python
+def sparse_schedule_generator(speed_ratio_map: Mapping[float, float] = None) -> ScheduleGenerator:
+    def generator(rail: GridTransitionMap, num_agents: int, hints: Any = None):
+        # place agents:
+        # - initial position
+        # - initial direction
+        # - (initial) speed
+        # - malfunction
+        ...
+
+        return agents_position, agents_direction, agents_target, speeds, agents_malfunction
+
+    return generator
+```
+Notice that the `rail_generator` may pass `agents_hints` to the  `schedule_generator` which the latter may interpret.
+For instance, the way the `sparse_rail_generator` generates the grid, it already determines the agent's goal and target.
+Hence, `rail_generator` and `schedule_generator` have to match if `schedule_generator` presupposes some specific `agents_hints`.
+
+The environment's `reset` takes care of applying the two generators:
+```python
+    def __init__(self,
+            ...
+             rail_generator: RailGenerator = random_rail_generator(),
+             schedule_generator: ScheduleGenerator = random_schedule_generator(),
+             ...
+             ):
+        self.rail_generator: RailGenerator = rail_generator
+        self.schedule_generator: ScheduleGenerator = schedule_generator
+
+    def reset(self, regenerate_rail=True, regenerate_schedule=True):
+        rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets)
+
+        ...
+
+        if replace_agents:
+            agents_hints = None
+            if optionals and 'agents_hints' in optionals:
+                agents_hints = optionals['agents_hints']
+            self.agents_static = EnvAgentStatic.from_lists(
+                self.schedule_generator(self.rail, self.get_num_agents(), hints=agents_hints))
+```
+
+
+## Example code
+
+To see all the changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/flatland_2_0_example.py).
--- a/docs/tutorials/06_round_2_starter_help.md
+++ b/docs/tutorials/06_round_2_starter_help.md
+# How to get started in Round 2
+
+- [Environment Changes](#environment-changes)
+- [Level generation](#level-generation)
+- [Observations](#observations)
+- [Predictions](#predictions)
+
+## Environment Changes
+There have been some major changes in how agents are being handled in the environment in this Flatland update.
+### Agents
+Agents are no more permant entities in the environment. Now agents will be removed from the environment as soon as they finsish their task. To keep interactions with the environment as simple as possible we do not modify the dimensions of the observation vectors nor the number of agents. Agents that have finished do not require any special treatment from the controller. Any action provided to these agents is simply ignored, just like before.
+
+Start positions of agents are *not unique* anymore. This means that many agents can start from the same position on the railway grid. It is important to keep in mind that whatever agent moves first will block the rest of the agents from moving into the same cell. Thus, the controller can already decide the ordering of the agents from the first step.
+
+## Level Generation
+The levels are now generated using the `sparse_rail_generator` and the `sparse_schedule_generator`
+### Rail Generation
+The rail generation is done in a sequence of steps:
+1. A number of city centers are placed in a a grid of size `(height, width)`
+2. Each city is connected to two neighbouring cities
+3. Internal parallel tracks are generated in each city
+
+
+### Schedule Generation
+The `sparse_schedule_generator` produces tasks for the agents by selecting a starting city and a target city. The agent is then placed on an even track number on the starting city and faced such that a path exists to the target city. The task for the agent is to reach the target position as fast as possible.
+
+In the future we will update how these schedules are generated to allow for more complex tasks
+
+## Observations
+Observations have been updated to reflect the novel features and behaviors of Flatland. Have a look at [observation](https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py) or the documentation for more details on the observations.
+
+## Predicitons
\ No newline at end of file
--- a/env-data/tests/test-10x10.mpk
+++ b/env-data/tests/test-10x10.mpk
--- a/env_data/__init__.py
+++ b/env_data/__init__.py
--- a/env_data/railway/__init__.py
+++ b/env_data/railway/__init__.py
--- a/env_data/railway/basic_elements_test.pkl
+++ b/env_data/railway/basic_elements_test.pkl
--- a/env-data/railway/complex_scene.pkl
+++ b/env-data/railway/complex_scene.pkl
--- a/env_data/railway/complex_scene_2.pkl
+++ b/env_data/railway/complex_scene_2.pkl
--- a/env-data/railway/example_flatland_000.pkl
+++ b/env-data/railway/example_flatland_000.pkl
--- a/env-data/railway/example_flatland_001.pkl
+++ b/env-data/railway/example_flatland_001.pkl
--- a/env-data/railway/example_network_000.pkl
+++ b/env-data/railway/example_network_000.pkl
--- a/env-data/railway/example_network_001.pkl
+++ b/env-data/railway/example_network_001.pkl
No results found