Merge branch '161-spec-doc-in-sphinx' into 'master'

#80 specifications as markdown moved from wiki Closes #161, #159, and #79 See merge request flatland/flatland!182

Merge branch '161-spec-doc-in-sphinx' into 'master'
#80 specifications as markdown moved from wiki Closes #161, #159, and #79 See merge request flatland/flatland!182
9b298461 · Christian Eichenberger · 77511dfc · 0500d051 · 9b298461 · 9b298461
Commit 9b298461 authored 5 years ago by Christian Eichenberger
--- a/Makefile
+++ b/Makefile
@@ -67,10 +67,11 @@ coverage: ## check code coverage quickly with the default Python
 	$(BROWSER) htmlcov/index.html

 docs: ## generate Sphinx HTML documentation, including API docs
-	rm -f docs/flatland.rst
+	rm -f docs/flatland*.rst
 	rm -f docs/modules.rst
-	sphinx-apidoc -o docs/ flatland
+	sphinx-apidoc --force -a -e -o docs/ flatland
 	$(MAKE) -C docs clean
+	cp *.md docs
 	$(MAKE) -C docs html
 	pydeps --no-config --noshow flatland -o docs/_build/html/flatland.svg
 	$(BROWSER) docs/_build/html/index.html

--- a/changelog.md
+++ b/changelog.md
-# Keeping track of major Flatland Changes
+Keeping track of major Flatland Changes
+=======================================
+
+Changes since Flatland 0.3
+--------------------------

-## Changes since Flatland 0.3
 ### Changes in stock predictors
 The stock `ShortestPathPredictorForRailEnv` now respects the different agent speeds and updates their prediction accordingly.

@@ -68,12 +71,12 @@ The duration of a malfunction is uniformly drawn from the intervall `[min_durati

 The baselines repository is not yet fully updated to handle multi-speed and stochastic events. Training needs to be modified to omitt all states inbetween the states where an agent can chose an action. Simple navigation training is already up to date. See [here](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/training_navigation.py) for more details.

-## Changes since Flatland 0.2
-
+Changes since Flatland 0.2
+--------------------------
 Please list all major changes since the last version:

 - Refactoring of rendering code: CamelCase functions changed to snake_case
 - Tree Observation Added a new Featuer: `unusable_switch` which indicates switches that are not branchingpoints for the observing agent
 - Updated the shortest path predictor
 - Updated conflict detection with predictor
- Episodes length can be set as maximum number of steps allowed.
\ No newline at end of file
+- Episodes length can be set as maximum number of steps allowed.
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -33,7 +33,7 @@ sys.path.insert(0, os.path.abspath('..'))

 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
-extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode', 'sphinx.ext.intersphinx']
+extensions = ['recommonmark', 'sphinx.ext.autodoc', 'sphinx.ext.viewcode', 'sphinx.ext.intersphinx']

 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -41,8 +41,13 @@ templates_path = ['_templates']
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
-# source_suffix = ['.rst', '.md']
-source_suffix = '.rst'
+# https://www.sphinx-doc.org/en/master/usage/markdown.html
+source_suffix = {
+    '.rst': 'restructuredtext',
+    '.txt': 'markdown',
+    '.md': 'markdown',
+}
+

 # The master toctree document.
 master_doc = 'index'

--- a/docs/flatland.baselines.rst
+++ b/docs/flatland.baselines.rst
-flatland.baselines package
-==========================
-
-Submodules
----------
-
-flatland.baselines.dueling\_double\_dqn module
----------------------------------------------
-
-.. automodule:: flatland.baselines.dueling_double_dqn
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.baselines.model module
-------------------------------
-
-.. automodule:: flatland.baselines.model
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.baselines
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.core.grid.rst
+++ b/docs/flatland.core.grid.rst
-flatland.core.grid package
-==========================
-
-Submodules
----------
-
-flatland.core.grid.grid4 module
-------------------------------
-
-.. automodule:: flatland.core.grid.grid4
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid4\_astar module
--------------------------------------
-
-.. automodule:: flatland.core.grid.grid4_astar
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid4\_utils module
--------------------------------------
-
-.. automodule:: flatland.core.grid.grid4_utils
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid8 module
-------------------------------
-
-.. automodule:: flatland.core.grid.grid8
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid\_utils module
-------------------------------------
-
-.. automodule:: flatland.core.grid.grid_utils
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.rail\_env\_grid module
-----------------------------------------
-
-.. automodule:: flatland.core.grid.rail_env_grid
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.core.grid
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.core.rst
+++ b/docs/flatland.core.rst
-flatland.core package
-=====================
-
-Submodules
----------
-
-flatland.core.env module
------------------------
-
-.. automodule:: flatland.core.env
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.transitions module
--------------------------------
-
-.. automodule:: flatland.core.transitions
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.core
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.envs.rst
+++ b/docs/flatland.envs.rst
-flatland.envs package
-=====================
-
-Submodules
----------
-
-flatland.envs.rail\_env module
------------------------------
-
-.. automodule:: flatland.envs.rail_env
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.envs
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.evaluators.rst
+++ b/docs/flatland.evaluators.rst
-flatland.evaluators package
-===========================
-
-Submodules
----------
-
-flatland.evaluators.aicrowd\_helpers module
-------------------------------------------
-
-.. automodule:: flatland.evaluators.aicrowd_helpers
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.evaluators.client module
---------------------------------
-
-.. automodule:: flatland.evaluators.client
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.evaluators.messages module
-----------------------------------
-
-.. automodule:: flatland.evaluators.messages
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.evaluators.service module
----------------------------------
-
-.. automodule:: flatland.evaluators.service
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.evaluators
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.rst
+++ b/docs/flatland.rst
-flatland package
-================
-
-Subpackages
-----------
-
-.. toctree::
-
-   flatland.core
-   flatland.envs
-   flatland.evaluators
-   flatland.utils
-
-Submodules
----------
-
-flatland.cli module
-------------------
-
-.. automodule:: flatland.cli
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/flatland.utils.rst
+++ b/docs/flatland.utils.rst
-flatland.utils package
-======================
-
-Module contents
---------------
-
-.. automodule:: flatland.utils
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -10,11 +10,16 @@ Welcome to flatland's documentation!
   about_flatland
   gettingstarted
   intro_observationbuilder
+   specifications/specifications.md
   localevaluation
   modules
   FAQ
   contributing
   authors
+   changelog.md
+   flatland_2.0.md
+
+

 Indices and tables
 ==================

--- a/docs/specifications/core.md
+++ b/docs/specifications/core.md
+# Core Specifications
+## Environment Class Overview
+
+The Environment class contains all necessary functions for the interactions between the agents and the environment. The base Environment class is derived from rllib.env.MultiAgentEnv (https://github.com/ray-project/ray).
+
+The functions are specific for each realization of Flatland (e.g. Railway, Vaccination,...)
+In particular, we retain the rllib interface in the use of the step() function, that accepts a dictionary of actions indexed by the agents handles (returned by get_agent_handles()) and returns dictionaries of observations, dones and infos.
+
+```python
+class Environment:
+    """Base interface for multi-agent environments in Flatland.
+
+    Agents are identified by agent ids (handles).
+    Examples:
+        >>> obs = env.reset()
+        >>> print(obs)
+        {
+            "train_0": [2.4, 1.6],
+            "train_1": [3.4, -3.2],
+        }
+        >>> obs, rewards, dones, infos = env.step(
+            action_dict={
+                "train_0": 1, "train_1": 0})
+        >>> print(rewards)
+        {
+            "train_0": 3,
+            "train_1": -1,
+        }
+        >>> print(dones)
+        {
+            "train_0": False,    # train_0 is still running
+            "train_1": True,     # train_1 is done
+            "__all__": False,    # the env is not done
+        }
+        >>> print(infos)
+        {
+            "train_0": {},  # info for train_0
+            "train_1": {},  # info for train_1
+        }
+    """
+
+    def __init__(self):
+        pass
+
+    def reset(self):
+        """
+        Resets the env and returns observations from agents in the environment.
+
+        Returns:
+        obs : dict
+            New observations for each agent.
+        """
+        raise NotImplementedError()
+
+    def step(self, action_dict):
+        """
+        Performs an environment step with simultaneous execution of actions for
+        agents in action_dict.
+        Returns observations from agents in the environment.
+        The returns are dicts mapping from agent_id strings to values.
+
+        Parameters
+        -------
+        action_dict : dict
+            Dictionary of actions to execute, indexed by agent id.
+
+        Returns
+        -------
+        obs : dict
+            New observations for each ready agent.
+        rewards: dict
+            Reward values for each ready agent.
+        dones : dict
+            Done values for each ready agent. The special key "__all__"
+            (required) is used to indicate env termination.
+        infos : dict
+            Optional info values for each agent id.
+        """
+        raise NotImplementedError()
+
+    def render(self):
+        """
+        Perform rendering of the environment.
+        """
+        raise NotImplementedError()
+
+    def get_agent_handles(self):
+        """
+        Returns a list of agents' handles to be used as keys in the step()
+        function.
+        """
+        raise NotImplementedError()
+
+```
--- a/docs/specifications/img/UML_flatland.png
+++ b/docs/specifications/img/UML_flatland.png
--- a/docs/specifications/railway.md
+++ b/docs/specifications/railway.md
--- a/docs/specifications/rendering.md
+++ b/docs/specifications/rendering.md
+# Rendering Specifications
+
+## Scope
+This doc specifies the software to meet the requirements in the Visualization requirements doc.
+
+## References
+- [Visualization Requirements](Specifications/Visualization)
+- [Core Spec](Specifications/Core)
+
+## Interfaces
+### Interface with Environment Component
+
+- Environment produces the Env Snapshot data structure (TBD)
+- Renderer reads the Env Snapshot
+- Connection between Env and Renderer, either:
+    - Environment “invokes” the renderer in-process
+    - Renderer “connects” to the environment
+        - Eg Env acts as a server, Renderer as a client
+- Either
+    - The Env sends a Snapshot to the renderer and waits for rendering
+- Or:
+    - The Env puts snapshots into a rendering queue
+    - The renderer blocks / waits on the queue, waiting for a new snapshot to arrive
+        - If several snapshots are waiting, delete and skip them and just render the most recent
+        - Delete the snapshot after rendering
+- Optionally
+    - Render every frame / time step
+    - Or, render frames without blocking environment
+        - Render frames in separate process / thread
+
+#### Environment Snapshot
+
+### Data Structure
+
+A definitions of the data structure is to be defined in Core requirements or Interfaces doc.
+
+
+
+##### Example only
+ 
+Top-level dictionary
+ - World nd-array
+    - Each element represents available transitions in a cell
+ - List of agents
+    - Agent location, orientation, movement (forward / stop / turn?)
+    - Observation
+        - Rectangular observation
+            - Maybe just dimensions - width + height (ie no need for contents)
+            - Can be highlighted in display as per minigrid
+        - Tree-based observation
+            - TBD
+
+### Existing Tools / Libraries
+1. Pygame
+    1. Very easy to use. Like dead simple to add sprites etc. [Link](https://studywolf.wordpress.com/2015/03/06/arm-visualization-with pygame/)
+    2. No inbuilt support for threads/processes. Does get faster if using pypy/pysco.
+2. PyQt
+    1. Somewhat simple, a little more verbose to use the different modules.
+    2. Multi-threaded via QThread! Yay! (Doesn’t block main thread that does the real work), [Link](https://nikolak.com/pyqt-threading-tutorial/)
+
+#### How to structure the code
+
+1. Define draw functions/classes for each primitive
+    1. Primitives: Agents (Trains), Railroad, Grass, Houses etc.
+2. Background. Initialize the background before starting the episode.
+    1. Static objects in the scenes, directly draw those primitives once and cache.
+
+#### Proposed Interfaces
+To-be-filled
+
+### Technical Graphics Considerations
+
+#### Overlay dynamic primitives over the background at each time step.
+
+No point trying to figure out changes. Need to explicitly draw every primitive anyways (that’s how these renders work).
\ No newline at end of file
--- a/docs/specifications/specifications.md
+++ b/docs/specifications/specifications.md
+Flatland Environment Specifications
+==========================
+
+In a humand-readable language:
+* code base overview (hand-drawn concept)
+* key concepts (generators, envs) and how are they linked
+* link relevant code base
+
+## Overview
+![UML_flatland.png](img/UML_flatland.png)
+[Diagram Source](https://confluence.sbb.ch/x/pQfsSw)
+## [Core](core)
+
+
+## Rail Generators and Schedule Generators
+The separation between rail generator and schedule generator reflects the organisational separation in the railway domain
+- Infrastructure Manager (IM): is responsible for the layout and maintenance of tracks
+- Railway Undertaking (RU): operates trains on the infrastructure
+Usually, there is a third organisation, which ensures discrimination-free access to the infrastructure for concurrent requests for the infrastructure in a **schedule planning phase**.
+However, in the **Flat**land challenge, we focus on the re-scheduling problem during live operations.
+
+Technically, 
+``` 
+RailGeneratorProduct = Tuple[GridTransitionMap, Optional[Any]]
+RailGenerator = Callable[[int, int, int, int], RailGeneratorProduct]
+
+AgentPosition = Tuple[int, int]
+ScheduleGeneratorProduct = Tuple[List[AgentPosition], List[AgentPosition], List[AgentPosition], List[float]]
+ScheduleGenerator = Callable[[GridTransitionMap, int, Optional[Any]], ScheduleGeneratorProduct]
+```
+
+We can then produce `RailGenerator`s by currying:
+```
+def sparse_rail_generator(num_cities=5, num_intersections=4, num_trainstations=2, min_node_dist=20, node_radius=2,
+                          num_neighb=3, grid_mode=False, enhance_intersection=False, seed=0):
+
+    def generator(width, height, num_agents, num_resets=0):
+    
+        # generate the grid and (optionally) some hints for the schedule_generator
+        ...
+         
+        return grid_map, {'agents_hints': {
+            'num_agents': num_agents,
+            'agent_start_targets_nodes': agent_start_targets_nodes,
+            'train_stations': train_stations
+        }}
+
+    return generator
+```
+And, similarly, `ScheduleGenerator`s:
+```
+def sparse_schedule_generator(speed_ratio_map: Mapping[float, float] = None) -> ScheduleGenerator:
+    def generator(rail: GridTransitionMap, num_agents: int, hints: Any = None):
+        # place agents:
+        # - initial position
+        # - initial direction
+        # - (initial) speed
+        # - malfunction
+        ...
+                
+        return agents_position, agents_direction, agents_target, speeds, agents_malfunction
+
+    return generator
+```
+Notice that the `rail_generator` may pass `agents_hints` to the  `schedule_generator` which the latter may interpret.
+For instance, the way the `sparse_rail_generator` generates the grid, it already determines the agent's goal and target.
+Hence, `rail_generator` and `schedule_generator` have to match if `schedule_generator` presupposes some specific `agents_hints`.
+
+The environment's `reset` takes care of applying the two generators:
+```
+    def __init__(self,
+            ...
+             rail_generator: RailGenerator = random_rail_generator(),
+             schedule_generator: ScheduleGenerator = random_schedule_generator(),
+             ...
+             ):
+        self.rail_generator: RailGenerator = rail_generator
+        self.schedule_generator: ScheduleGenerator = schedule_generator
+        
+    def reset(self, regen_rail=True, replace_agents=True):
+        rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets)
+
+        ...
+
+        if replace_agents:
+            agents_hints = None
+            if optionals and 'agents_hints' in optionals:
+                agents_hints = optionals['agents_hints']
+            self.agents_static = EnvAgentStatic.from_lists(
+                *self.schedule_generator(self.rail, self.get_num_agents(), hints=agents_hints))
+```
+
+
+## RailEnv Speeds
+One of the main contributions to the complexity of railway network operations stems from the fact that all trains travel at different speeds while sharing a very limited railway network. 
+
+The different speed profiles can be generated using the `schedule_generator`, where you can actually chose as many different speeds as you like. 
+Keep in mind that the *fastest speed* is 1 and all slower speeds must be between 1 and 0. 
+For the submission scoring you can assume that there will be no more than 5 speed profiles.
+
+
+Currently (as of **Flat**land 2.0), an agent keeps its speed over the whole episode. 
+
+Because the different speeds are implemented as fractions the agents ability to perform actions has been updated. 
+We **do not allow actions to change within the cell **. 
+This means that each agent can only chose an action to be taken when entering a cell. 
+This action is then executed when a step to the next cell is valid. For example
+
+- Agent enters switch and choses to deviate left. Agent fractional speed is 1/4 and thus the agent will take 4 time steps to complete its journey through the cell. On the 4th time step the agent will leave the cell deviating left as chosen at the entry of the cell.
+    - All actions chosen by the agent during its travels within a cell are ignored
+    - Agents can make observations at any time step. Make sure to discard observations without any information. See this [example](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/training_navigation.py) for a simple implementation.
+- The environment checks if agent is allowed to move to next cell only at the time of the switch to the next cell
+
+In your controller, you can check whether an agent requires an action by checking `info`: 
+```
+obs, rew, done, info = env.step(actions) 
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['action_required'][a]:
+        action_dict.update({a: ...})
+
+```
+Notice that `info['action_required'][a]` 
+* if the agent breaks down (see stochasticity below) on entering the cell (no distance elpased in the cell), an action required as long as the agent is broken down;
+when it gets back to work, the action chosen just before will be taken and executed at the end of the cell; you may check whether the agent
+gets healthy again in the next step by checking `info['malfunction'][a] == 1`.
+* when the agent has spent enough time in the cell, the next cell may not be free and the agent has to wait. 
+
+
+Since later versions of **Flat**land might have varying speeds during episodes. 
+Therefore, we return the agents' speed - in your controller, you can get the agents' speed from the `info` returned by `step`: 
+```
+obs, rew, done, info = env.step(actions) 
+...
+for a in range(env.get_num_agents()):
+    speed = info['speed'][a]
+```
+Notice that we do not guarantee that the speed will be computed at each step, but if not costly we will return it at each step.
+
+
+
+
+
+
+
+
+
+## RailEnv Malfunctioning / Stochasticity
+
+Stochastic events may happen during the episodes. 
+This is very common for railway networks where the initial plan usually needs to be rescheduled during operations as minor events such as delayed departure from trainstations, malfunctions on trains or infrastructure or just the weather lead to delayed trains.
+
+We implemted a poisson process to simulate delays by stopping agents at random times for random durations. The parameters necessary for the stochastic events can be provided when creating the environment.
+
+```
+# Use a the malfunction generator to break agents from time to time
+
+stochastic_data = {
+    'prop_malfunction': 0.5,  # Percentage of defective agents
+    'malfunction_rate': 30,  # Rate of malfunction occurence
+    'min_duration': 3,  # Minimal duration of malfunction
+    'max_duration': 10  # Max duration of malfunction
+}
+```
+
+The parameters are as follows:
+
+- `prop_malfunction` is the proportion of agents that can malfunction. `1.0` means that each agent can break.
+- `malfunction_rate` is the mean rate of the poisson process in number of environment steps.
+- `min_duration` and `max_duration` set the range of malfunction durations. They are sampled uniformly
+
+You can introduce stochasticity by simply creating the env as follows:
+
+```
+env = RailEnv(
+    ...
+    stochastic_data=stochastic_data,  # Malfunction data generator
+    ...    
+)
+```
+In your controller, you can check whether an agent is malfunctioning: 
+```
+obs, rew, done, info = env.step(actions) 
+...
+action_dict = dict()
+for a in range(env.get_num_agents()):
+    if info['malfunction'][a] == 0:
+        action_dict.update({a: ...})
+
+# Custom observation builder
+tree_observation = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv())
+
+# Different agent types (trains) with different speeds.
+speed_ration_map = {1.: 0.25,  # Fast passenger train
+                    1. / 2.: 0.25,  # Fast freight train
+                    1. / 3.: 0.25,  # Slow commuter train
+                    1. / 4.: 0.25}  # Slow freight train
+
+env = RailEnv(width=50,
+              height=50,
+              rail_generator=sparse_rail_generator(num_cities=20,  # Number of cities in map (where train stations are)
+                                                   num_intersections=5,  # Number of intersections (no start / target)
+                                                   num_trainstations=15,  # Number of possible start/targets on map
+                                                   min_node_dist=3,  # Minimal distance of nodes
+                                                   node_radius=2,  # Proximity of stations to city center
+                                                   num_neighb=4,  # Number of connections to other cities/intersections
+                                                   seed=15,  # Random seed
+                                                   grid_mode=True,
+                                                   enhance_intersection=True
+                                                   ),
+              schedule_generator=sparse_schedule_generator(speed_ration_map),
+              number_of_agents=10,
+              stochastic_data=stochastic_data,  # Malfunction data generator
+              obs_builder_object=tree_observation)
+```
+
+
+## Observation Builders
+Every `RailEnv` has an `obs_builder`. The `obs_builder` has full access to the `RailEnv`. 
+The `obs_builder` is called in the `step()` function to produce the observations.
+
+```
+env = RailEnv(
+    ...
+    obs_builder_object=TreeObsForRailEnv(
+        max_depth=2,
+       predictor=ShortestPathPredictorForRailEnv(max_depth=10)
+    ),
+    ...                   
+)
+```
+
+The two principal observation builders provided are global and tree.
+
+### Global Observation Builder
+`GlobalObsForRailEnv` gives a global observation of the entire rail environment.
+* transition map array with dimensions (env.height, env.width, 16),
+          assuming 16 bits encoding of transitions.
+
+* Two 2D arrays (map_height, map_width, 2) containing respectively the position of the given agent
+         target and the positions of the other agents targets.
+
+* A 3D array (map_height, map_width, 4) wtih
+            - first channel containing the agents position and direction
+            - second channel containing the other agents positions and diretions
+            - third channel containing agent malfunctions
+            - fourth channel containing agent fractional speeds
+            
+### Tree Observation Builder
+`TreeObsForRailEnv` computes the current observation for each agent.
+
+The observation vector is composed of 4 sequential parts, corresponding to data from the up to 4 possible
+movements in a `RailEnv` (up to because only a subset of possible transitions are allowed in RailEnv).
+The possible movements are sorted relative to the current orientation of the agent, rather than NESW as for
+the transitions. The order is:
+
+    [data from 'left'] + [data from 'forward'] + [data from 'right'] + [data from 'back']
+
+Each branch data is organized as:
+
+    [root node information] +
+    [recursive branch data from 'left'] +
+    [... from 'forward'] +
+    [... from 'right] +
+    [... from 'back']
+
+Each node information is composed of 9 features:
+
+1. if own target lies on the explored branch the current distance from the agent in number of cells is stored.
+
+2. if another agents target is detected the distance in number of cells from the agents current location
+    is stored
+
+3. if another agent is detected the distance in number of cells from current agent position is stored.
+
+4. possible conflict detected
+    tot_dist = Other agent predicts to pass along this cell at the same time as the agent, we store the
+     distance in number of cells from current agent position
+
+    0 = No other agent reserve the same cell at similar time
+
+5. if an not usable switch (for agent) is detected we store the distance.
+
+6. This feature stores the distance in number of cells to the next branching  (current node)
+
+7. minimum distance from node to the agent's target given the direction of the agent if this path is chosen
+
+8. agent in the same direction
+    n = number of agents present same direction
+        (possible future use: number of other agents in the same direction in this branch)
+    0 = no agent present same direction
+
+9. agent in the opposite direction
+    n = number of agents present other direction than myself (so conflict)
+        (possible future use: number of other agents in other direction in this branch, ie. number of conflicts)
+    0 = no agent present other direction than myself
+
+10. malfunctioning/blokcing agents
+    n = number of time steps the oberved agent remains blocked
+
+11. slowest observed speed of an agent in same direction
+    1 if no agent is observed
+
+    min_fractional speed otherwise
+
+Missing/padding nodes are filled in with -inf (truncated).
+Missing values in present node are filled in with +inf (truncated).
+
+
+In case of the root node, the values are [0, 0, 0, 0, distance from agent to target, own malfunction, own speed]
+In case the target node is reached, the values are [0, 0, 0, 0, 0].
+
+
+## [Rendering](rendering)
+## [Railway](railway)
--- a/docs/specifications/visualization.md
+++ b/docs/specifications/visualization.md
+# Visualization
+
+![logo](https://drive.google.com/uc?export=view&id=1rstqMPJXFJd9iD46z1A5Rus-W0Ww6O8i)
+
+
+# Introduction & Scope
+
+Broad requirements for human-viewable display of a single Flatland Environment.
+
+
+## Context
+
+Shows this software component in relation to some of the other components.  We name the component the "Renderer".  Multiple agents interact with a single Environment.  A renderer interacts with the environment, and displays on screen, and/or into movie or image files.
+
+
+
+<p id="gdcalert2" ><span style="color: red; font-weight: bold">>>>>>  gd2md-html alert: inline drawings not supported directly from Docs. You may want to copy the inline drawing to a standalone drawing and export by reference. See <a href="https://github.com/evbacher/gd2md-html/wiki/Google-Drawings-by-reference">Google Drawings by reference</a> for details. The img URL below is a placeholder. </span><br>(<a href="#">Back to top</a>)(<a href="#gdcalert3">Next alert</a>)<br><span style="color: red; font-weight: bold">>>>>> </span></p>
+
+
+![drawing](https://docs.google.com/a/google.com/drawings/d/12345/export/png)
+
+
+# Requirements
+
+
+## Primary Requirements
+
+
+
+1. Visualize or Render the state of the environment
+    1. Read an Environment + Agent Snapshot provided by the Environment component
+    2. Display onto a local screen in real-time (or near real-time)
+    3. Include all the agents
+    4. Illustrate the agent observations (typically subsets of the grid / world)
+    5. 2d-rendering only
+2. Output visualisation into movie / image files for use in later animation
+3. Should not impose control-flow constraints on Environment
+    6. Should not force env to respond to events
+    7. Should not drive the "main loop" of Inference or training 
+
+
+## Secondary / Optional Requirements 
+
+
+
+1. During training (possibly across multiple processes or machines / OS instances), display a single training environment,
+    1. without holding up the other environments in the training.
+    2. Some training environments may be remote to the display machine (eg using GCP / AWS)
+    3. Attach to / detach from running environment / training cluster without restarting training.
+2. Provide a switch to make use of graphics / artwork provided by graphic artist
+    4. Fast / compact mode for general use
+    5. Beauty mode for publicity / demonstrations
+3. Provide a switch between smooth / continuous animation of an agent (slower) vs jumping from cell to cell (faster)
+    6. Smooth / continuous translation between cells
+    7. Smooth / continuous rotation 
+4. Speed - ideally capable of 60fps (see performance metrics)
+5. Window view - only render part of the environment, or a single agent and agents nearby.
+    8. May not be feasible to render very large environments
+    9. Possibly more than one window, ie one for each selected agent
+    10. Window(s) can be tied to agents, ie they move around with the agent, and optionally rotate with the agent.
+6. Interactive scaling
+    11. eg wide view, narrow / enlarged view
+    12. eg with mouse scrolling & zooming
+7. Minimize necessary skill-set for participants
+    13. Python API to gui toolkit, no need for C/C++
+8. View on various media:
+    14. Linux & Windows local display
+    15. Browser
+
+
+## Performance Metrics
+
+Here are some performance metrics which the Renderer should meet.
+
+
+<table>
+  <tr>
+   <td>
+   </td>
+   <td><p style="text-align: right">
+# Per second</p>
+
+   </td>
+   <td><p style="text-align: right">
+Target Time (ms)</p>
+
+   </td>
+   <td><p style="text-align: right">
+Prototype time (ms)</p>
+
+   </td>
+  </tr>
+  <tr>
+   <td>Write an agent update (ie env as client providing an agent update)
+   </td>
+   <td>
+   </td>
+   <td><p style="text-align: right">
+0.1</p>
+
+   </td>
+   <td>
+   </td>
+  </tr>
+  <tr>
+   <td>Draw an environment window 20x20
+   </td>
+   <td><p style="text-align: right">
+60</p>
+
+   </td>
+   <td><p style="text-align: right">
+16</p>
+
+   </td>
+   <td>
+   </td>
+  </tr>
+  <tr>
+   <td>Draw an environment window 50 x 50
+   </td>
+   <td><p style="text-align: right">
+10</p>
+
+   </td>
+   <td>
+   </td>
+   <td>
+   </td>
+  </tr>
+  <tr>
+   <td>Draw an agent update on an existing environment window.  5 agents visible.
+   </td>
+   <td>
+   </td>
+   <td><p style="text-align: right">
+1</p>
+
+   </td>
+   <td>
+   </td>
+  </tr>
+</table>
+
+
+
+## Example Visualization
+
+
+# Reference Documents
+
+Link to this doc: https://docs.google.com/document/d/1Y4Mw0Q6r8PEOvuOZMbxQX-pV2QKDuwbZJBvn18mo9UU/edit#
+
+
+## Core Specification
+
+This specifies the system containing the environment and agents - this will be able to run independently of the renderer.
+
+[https://docs.google.com/document/d/1RN162b8wSfYTBblrdE6-Wi_zSgQTvVm6ZYghWWKn5t8/edit](https://docs.google.com/document/d/1RN162b8wSfYTBblrdE6-Wi_zSgQTvVm6ZYghWWKn5t8/edit)
+
+The data structure which the renderer needs to read initially resides here.
+
+
+## Visualization Specification
+
+This will specify the software which will meet the requirements documented here.
+
+[https://docs.google.com/document/d/1XYOe_aUIpl1h_RdHnreACvevwNHAZWT0XHDL0HsfzRY/edit#](https://docs.google.com/document/d/1XYOe_aUIpl1h_RdHnreACvevwNHAZWT0XHDL0HsfzRY/edit#)
+
+
+## Interface Specification
+
+This will specify the interfaces through which the different components communicate
+
+
+# Non-requirements - to be deleted below here.
+
+The below has been copied into the spec doc.    Comments may be lost.  I'm only preserving it to save the comments for a few days - they don't cut & paste into the other doc!
+
+
+## Interface with Environment Component
+
+
+
+*   Environment produces the Env Snapshot data structure (TBD)
+*   Renderer reads the Env Snapshot
+*   Connection between Env and Renderer, either:
+    *   Environment "invokes" the renderer in-process
+    *   Renderer "connects" to the environment
+        *   Eg Env acts as a server, Renderer as a client
+*   Either
+    *   The Env sends a Snapshot to the renderer and waits for rendering
+*   Or:
+    *   The Env puts snapshots into a rendering queue
+    *   The renderer blocks / waits on the queue, waiting for a new snapshot to arrive
+        *   If several snapshots are waiting, delete and skip them and just render the most recent
+        *   Delete the snapshot after rendering
+*   Optionally
+    *   Render every frame / time step
+    *   Or, render frames without blocking environment
+        *   Render frames in separate process / thread
+
+
+#### Environment Snapshot
+
+**Data Structure**
+
+A definitions of the data structure is to be defined in Core requirements.
+
+It is a requirement of the Renderer component that it can read this data structure.
+
+**Example only**
+
+Top-level dictionary
+
+
+
+*   World nd-array
+    *   Each element represents available transitions in a cell
+*   List of agents
+    *   Agent location, orientation, movement (forward / stop / turn?)
+    *   Observation
+        *   Rectangular observation
+            *   Maybe just dimensions - width + height (ie no need for contents)
+            *   Can be highlighted in display as per minigrid
+        *   Tree-based observation
+            *   TBD
+
+
+## Investigation into Existing Tools / Libraries
+
+
+
+1. Pygame
+    1. Very easy to use. Like dead simple to add sprites etc. ([https://studywolf.wordpress.com/2015/03/06/arm-visualization-with-pygame/](https://studywolf.wordpress.com/2015/03/06/arm-visualization-with-pygame/))
+    2. No inbuilt support for threads/processes. Does get faster if using pypy/pysco.
+2. PyQt
+    3. Somewhat simple, a little more verbose to use the different modules.
+    4. Multi-threaded via QThread! Yay! (Doesn't block main thread that does the real work), ([https://nikolak.com/pyqt-threading-tutorial/](https://nikolak.com/pyqt-threading-tutorial/))
+
+**How to structure the code**
+
+
+
+1. Define draw functions/classes for each primitive
+    1. Primitives: Agents (Trains), Railroad, Grass, Houses etc.
+2. Background. Initialize the background before starting the episode.
+    2. Static objects in the scenes, directly draw those primitives once and cache.
+
+**Proposed Interfaces**
+
+To-be-filled
+
+
+## Technical Graphics Considerations
+
+
+#### Overlay dynamic primitives over the background at each time step.
+
+No point trying to figure out changes. Need to explicitly draw every primitive anyways (that's how these renders work).
--- a/docs/flatland_2.0.md
+++ b/docs/flatland_2.0.md
-# Flatland 2.0 Introduction
+Flatland 2.0 Introduction
+=========================

 ## What's new?


--- a/requirements_continuous_integration.txt
+++ b/requirements_continuous_integration.txt
@@ -6,6 +6,7 @@ benchmarker>=4.0.1
 coverage>=4.5.1
 Sphinx>=1.8.1
 sphinx-rtd-theme>=0.4.3
+docutils>=0.15.2
 flake8>=3.7.7
 flake8-eradicate>=0.2.0
 twine>=1.12.1
@@ -15,3 +16,4 @@ jupyter-core>=4.5.0
 notebook>=5.7.8
 pytest-xvfb>=1.2.0
 git+https://github.com/who8mylunch/Jupyter_Canvas_Widget.git@bd151ae1509c50b5809944dd3294f58b7b069c86
+recommonmark>=0.6.0