Compare revisions

adc770e6 · adc770e6 · adc770e6 · adc770e6 · adc770e6 · adc770e6
--- a/docs/flatland.core.grid.rst
+++ b/docs/flatland.core.grid.rst
-flatland.core.grid package
-==========================
-
-Submodules
----------
-
-flatland.core.grid.grid4 module
-------------------------------
-
-.. automodule:: flatland.core.grid.grid4
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid4\_astar module
--------------------------------------
-
-.. automodule:: flatland.core.grid.grid4_astar
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid4\_utils module
--------------------------------------
-
-.. automodule:: flatland.core.grid.grid4_utils
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid8 module
-------------------------------
-
-.. automodule:: flatland.core.grid.grid8
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.grid\_utils module
-------------------------------------
-
-.. automodule:: flatland.core.grid.grid_utils
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.grid.rail\_env\_grid module
-----------------------------------------
-
-.. automodule:: flatland.core.grid.rail_env_grid
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.core.grid
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.core.rst
+++ b/docs/flatland.core.rst
-flatland.core package
-=====================
-
-Submodules
----------
-
-flatland.core.env module
------------------------
-
-.. automodule:: flatland.core.env
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.core.transitions module
--------------------------------
-
-.. automodule:: flatland.core.transitions
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.core
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.envs.rst
+++ b/docs/flatland.envs.rst
-flatland.envs package
-=====================
-
-Submodules
----------
-
-flatland.envs.rail\_env module
------------------------------
-
-.. automodule:: flatland.envs.rail_env
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.envs
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.evaluators.rst
+++ b/docs/flatland.evaluators.rst
-flatland.evaluators package
-===========================
-
-Submodules
----------
-
-flatland.evaluators.aicrowd\_helpers module
-------------------------------------------
-
-.. automodule:: flatland.evaluators.aicrowd_helpers
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.evaluators.client module
---------------------------------
-
-.. automodule:: flatland.evaluators.client
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.evaluators.messages module
-----------------------------------
-
-.. automodule:: flatland.evaluators.messages
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-flatland.evaluators.service module
----------------------------------
-
-.. automodule:: flatland.evaluators.service
-    :members:
-    :undoc-members:
-    :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland.evaluators
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland.rst
+++ b/docs/flatland.rst
-flatland package
-================
-
-Subpackages
-----------
-
-.. toctree::
-
-   flatland.core
-   flatland.envs
-   flatland.evaluators
-   flatland.utils
-
-Submodules
----------
-
-flatland.cli module
-------------------
-
-.. automodule:: flatland.cli
-   :members:
-   :undoc-members:
-   :show-inheritance:
-
-
-Module contents
---------------
-
-.. automodule:: flatland
-   :members:
-   :undoc-members:
-   :show-inheritance:
--- a/docs/flatland.utils.rst
+++ b/docs/flatland.utils.rst
-flatland.utils package
-======================
-
-Module contents
---------------
-
-.. automodule:: flatland.utils
-    :members:
-    :undoc-members:
-    :show-inheritance:
--- a/docs/flatland_2.0.md
+++ b/docs/flatland_2.0.md
-# Flatland 2.0 Introduction (Beta)
-
-Curious to see whats coming in **Flat**land 2.0? Have a look at the current development and report bugs and give us feedback on the environment.
-
-**WARNING**: Flatlnadn 2.0 Beta is under current development and not stable nor final. We would however like you to play with the code and help us get the best possible environment for multi-agent control problems.
-
-## Whats new
-
-In this version of **Flat**land we are moving closer to realistic and more complex railway problems. Earlier versions of **Flat**land which introduced you to the concept of restricted transitions was still to simplified to give us feasible solutions for daily operations. Thus the following changes are coming in the next version to be closer to real railway network challenges:
-
- **New Level Generator** with less connections between different nodes in the network and thus much higher agent densities on rails.
- **Stochastic Events** that cause agents to stop and get stuck for different number of time steps.
- **Different Speed Classes** allow agents to move at different speeds and thus enhance complexity in the search for optimal solutions.
-
-Below we explain these changes in more detail and how you can play with their parametrization. We appreciate *your feedback* on the performance and the difficulty on these levels to help us shape the best possible **Flat**land 2.0 environment.
-
-## Get the new level generators
-Since this is currently still in *beta* phase you can only install this version of **Flat**land through the gitlab repository. Once you have downloaded the [Flatland Repository](https://gitlab.aicrowd.com/flatland/flatland) you have to switch to the [147_new_level_generator](https://gitlab.aicrowd.com/flatland/flatland/tree/147_new_level_generator) branch to be able access the latest changes in **Flat**land.
-
-Once you have switched to this branch install **Flat**land by running `python setup.py install`.
-
-## Generate levels
-
-We are currently working on different new level generators and you can expect that the levels in the submission testing will not all come from just one but rather different level generators to be sure that the controllers can handle any railway specific challenge.
-
-For this early **beta** testing we suggest you have a look at the `sparse_rail_generator` and `realistic_rail_generator`.
-
-### Sparse Rail Generator
-![Example_Sparse](https://i.imgur.com/DP8sIyx.png)
-
-The idea behind the sparse rail generator is to mimic classic railway structures where dense nodes (cities) are sparsly connected to each other and where you have to manage traffic flow between the nodes efficiently. The cities in this level generator are much simplified in comparison to real city networks but it mimics parts of the problems faced in daily operations of any railway company.
-
-There are a few parameters you can tune to build your own map and test different complexity levels of the levels. **Warning** some combinations of parameters do not go well together and will lead to infeasible level generation. In the worst case, the level generator currently issues a warning when it cannot build the environment according to the parameters provided. This will lead to a crash of the whole env. We are currently working on improvements here and are **happy for any suggestions from your side**.
-
-To build en environment you instantiate a `RailEnv` follows
-
-```
-# Initialize the generator
-RailGenerator = sparse_rail_generator(num_cities=10,                        # Number of cities in map
-                                                   num_intersections=10,    # Number of interesections in map
-                                                   num_trainstations=50,    # Number of possible start/targets on map
-                                                   min_node_dist=6,         # Minimal distance of nodes
-                                                   node_radius=3,           # Proximity of stations to city center
-                                                   num_neighb=3,            # Number of connections to other cities
-                                                   seed=5,                  # Random seed
-                                                   realistic_mode=True      # Ordered distribution of nodes
-                                                   )
-
-# Build the environment
-env = RailEnv(width=50,
-              height=50,
-              rail_generator=RailGenerator,
-              number_of_agents=10,
-              obs_builder_object=TreeObsForRailEnv(max_depth=3,predictor=shortest_path_predictor)
-              )
-```
-
-You can tune the following parameters:
-
- `num_citeis` is the number of cities on a map. Cities are the only nodes that can host start and end points for agent tasks (Train stations). Here you have to be carefull that the number is not too high as all the cities have to fit on the map. When `realistic_mode=False` you have to be carefull when chosing `min_node_dist` because leves will fails if not all cities (and intersections) can be placed with at least `min_node_dist` between them.
- `num_intersections` is the number of nodes that don't hold any trainstations. They are also the first priority that a city connects to. We use these to allow for sparse connections between cities.
- `num_trainstations`defines the *Total* number of trainstations in the network. This also sets the max number of allowed agents in the environment. This is also a delicate parameter as there is only a limitid amount of space available around nodes and thus if the number is too high the level generation will fail. *Important*: Only the number of agents provided to the environment will actually produce active train stations. The others will just be present as dead-ends (See figures below).
- `min_node_dist`is only used if `realistic_mode=False` and represents the minimal distance between two nodes.
- `node_radius` defines the extent of a city. Each trainstation is placed at a distance to the closes city node that is smaller or equal to this number.
- `num_neighb`defines the number of neighbouring nodes that connect to each other. Thus this changes the connectivity and thus the amount of alternative routes in the network.
- `seed` is used to initialize the random generator
- `realistic_mode` currently only changes how the nodes are distirbuted. If it is set to `True` the nodes are evenly spreas out and cities and intersecitons are set between each other.
-
-If you run into any bugs with sets of parameters please let us know.
-
-Here is a network with `realistic_mode=False` and the parameters from above.
-
-![sparse_random](https://i.imgur.com/Xg7nifF.png)
-
-and here with `realistic_mode=True`
-
-![sparse_ordered](https://i.imgur.com/jyA7Pt4.png)
-
-## Add Stochasticity
-
-Another area where we improve **Flat**land 2.0 is by adding stochastic events during the episodes. This is very common for railway networks where the initial plan usually needs to be rescheduled during operations as minor events such as delayed departure from trainstations, malfunctions on trains or infrastructure or just the weather lead to delayed trains.
-
-We implemted a poisson process to simulate delays by stopping agents at random times for random durations. The parameters necessary for the stochastic events can be provided when creating the environment.
-
-```
-# Use a the malfunction generator to break agents from time to time
-stochastic_data = {'prop_malfunction': 0.5,  # Percentage of defective agents
-                   'malfunction_rate': 30,  # Rate of malfunction occurence
-                   'min_duration': 3,  # Minimal duration of malfunction
-                   'max_duration': 10  # Max duration of malfunction
-                   }
-
-```
-
-The parameters are as follows:
-
- `prop_malfunction` is the proportion of agents that can malfunction. `1.0` means that each agent can break.
- `malfunction_rate` is the mean rate of the poisson process in number of environment steps.
- `min_dutation` and `max_duration` set the range of malfunction durations. They are sampled uniformly
-
-You can introduce stochasticity by simply creating the env as follows:
-
-```
-# Use a the malfunction generator to break agents from time to time
-stochastic_data = {'prop_malfunction': 0.5,  # Percentage of defective agents
-                   'malfunction_rate': 30,  # Rate of malfunction occurence
-                   'min_duration': 3,  # Minimal duration of malfunction
-                   'max_duration': 10  # Max duration of malfunction
-                   }
-
-# Use your own observation builder
-TreeObservation = TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv())
-
-env = RailEnv(width=10,
-              height=10,
-              rail_generator=sparse_rail_generator(num_cities=3,  # Number of cities in map (where train stations are)
-                                                   num_intersections=1,  # Number of interesections (no start / target)
-                                                   num_trainstations=8,  # Number of possible start/targets on map
-                                                   min_node_dist=3,  # Minimal distance of nodes
-                                                   node_radius=2,  # Proximity of stations to city center
-                                                   num_neighb=2,  # Number of connections to other cities/intersections
-                                                   seed=15,  # Random seed
-                                                   ),
-              number_of_agents=5,
-              stochastic_data=stochastic_data,  # Malfunction generator data
-              obs_builder_object=TreeObservation)
-```
-
-You will quickly realize that this will lead to unforseen difficulties which means that **your controller** needs to observe the environment at all times to be able to react to the stochastic events.
-
-## Add different speed profiles
-
-One of the main contributions to the complexity of railway network operations stems from the fact that all trains travel at different speeds while sharing a very limited railway network. In **Flat**land 2.0 this feature will be enabled as well and will lead to much more complex configurations. This is still in early *beta* and even though stock observation builders and predictors do support these changes we have not yet fully tested them. Here we count on your support :).
-
-Currently you have to initialize the speed profiles manually after the environment has been reset (*Attention*: this is currently being worked on and will change soon). In order for agent to have differnt speed profiles you can include this after your `env.reset()` call:
-
-```
-# Reset environment and get initial observations for all agents
-    obs = env.reset()
-    for idx in range(env.get_num_agents()):
-        tmp_agent = env.agents[idx]
-        speed = (idx % 4) + 1
-        tmp_agent.speed_data["speed"] = 1 / speed
-```
-
-Where you can actually chose as many different speeds as you like. Keep in mind that the *fastest speed* is 1 and all slower speeds must be between 1 and 0. For the submission scoring you can assume that there will be no more than 5 speed profiles.
-
-## Example code
-
-To see allt he changes in action you can just run the `flatland_example_2_0.py` file in the examples folder. The file can be found [here](https://gitlab.aicrowd.com/flatland/flatland/blob/147_new_level_generator/examples/flatland_2_0_example.py)
--- a/docs/index.rst
+++ b/docs/index.rst
 Welcome to flatland's documentation!
 ======================================

+.. include:: ../README.rst
+
 .. toctree::
   :maxdepth: 2
   :caption: Contents:

-   readme
-   installation
-   about_flatland
-   gettingstarted
-   intro_observationbuilder
-   localevaluation
-   modules
-   FAQ
-   contributing
-   authors
+   01_readme
+   03_tutorials_toc
+   04_specifications_toc
+   05_apidoc
+   06_contributing
+   07_changes
+   08_authors
+   09_faq_toc
+   10_interface

 Indices and tables
 ==================

--- a/docs/installation.rst
+++ b/docs/installation.rst
-.. highlight:: shell
-
-============
-Installation
-============
-
-Software Runtime & Dependencies
-------------------------------
-
-This is the recommended way of installation and running flatland's dependencies.
-
-* Install `Anaconda <https://www.anaconda.com/distribution/>`_ by following the instructions `here <https://www.anaconda.com/distribution/>`_
-* Create a new conda environment 
-
-.. code-block:: console
-
-    $ conda create python=3.6 --name flatland-rl
-    $ conda activate flatland-rl
-
-* Install the necessary dependencies
-
-.. code-block:: console
-
-    $ conda install -c conda-forge cairosvg pycairo
-    $ conda install -c anaconda tk  
-
-
-Stable release
--------------
-
-To install flatland, run this command in your terminal:
-
-.. code-block:: console
-
-    $ pip install flatland-rl
-
-This is the preferred method to install flatland, as it will always install the most recent stable release.
-
-If you don't have `pip`_ installed, this `Python installation guide`_ can guide
-you through the process.
-
-.. _pip: https://pip.pypa.io
-.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/
-
-
-From sources
------------
-
-The sources for flatland can be downloaded from the `Gitlab repo`_.
-
-You can clone the public repository:
-
-.. code-block:: console
-
-    $ git clone git@gitlab.aicrowd.com:flatland/flatland.git
-
-Once you have a copy of the source, you can install it with:
-
-.. code-block:: console
-
-    $ python setup.py install
-
-
-.. _Gitlab repo: https://gitlab.aicrowd.com/flatland/flatland
-
-
-Jupyter Canvas Widget
---------------------
-If you work with jupyter notebook you need to install the Jupyer Canvas Widget. To install the Jupyter Canvas Widget read also
-https://github.com/Who8MyLunch/Jupyter_Canvas_Widget#installation
--- a/docs/interface/pettingzoo.md
+++ b/docs/interface/pettingzoo.md
+# PettingZoo
+
+> PettingZoo (https://www.pettingzoo.ml/) is a collection of multi-agent environments for reinforcement learning. We build a pettingzoo interface for flatland.
+
+## Background
+
+PettingZoo is a popular multi-agent environment library (https://arxiv.org/abs/2009.14471) that aims to be the gym standard for Multi-Agent Reinforcement Learning. We list the below advantages that make it suitable for use with flatland
+
+- Works with both rllib (https://docs.ray.io/en/latest/rllib.html) and stable baselines 3 (https://stable-baselines3.readthedocs.io/) using wrappers from Super Suit.
+- Clean API (https://www.pettingzoo.ml/api) with additional facilities/api for parallel, saving observation, recording using gym monitor, processing, normalising observations
+- Scikit-learn inspired api
+  e.g.
+
+```python
+act = model.predict(obs, deterministic=True)[0] 
+```
+
+- Parallel learning using literally 2 lines of code to use with stable baselines 3
+
+```python
+env = ss.pettingzoo_env_to_vec_env_v0(env)
+env = ss.concat_vec_envs_v0(env, 8, num_cpus=4, base_class=’stable_baselines3’)
+```
+
+- Tested and supports various multi-agent environments with many agents comparable to flatland. e.g. https://www.pettingzoo.ml/magent
+- Clean interface means we can custom add an experimenting tool like wandb and have full flexibility to save information we want
--- a/docs/interface/pettingzoo.rst
+++ b/docs/interface/pettingzoo.rst
+
+PettingZoo
+==========
+
+..
+
+   PettingZoo (https://www.pettingzoo.ml/) is a collection of multi-agent environments for reinforcement learning. We build a pettingzoo interface for flatland.
+
+
+Background
+----------
+
+PettingZoo is a popular multi-agent environment library (https://arxiv.org/abs/2009.14471) that aims to be the gym standard for Multi-Agent Reinforcement Learning. We list the below advantages that make it suitable for use with flatland
+
+
+* Works with both rllib (https://docs.ray.io/en/latest/rllib.html) and stable baselines 3 (https://stable-baselines3.readthedocs.io/) using wrappers from Super Suit.
+* Clean API (https://www.pettingzoo.ml/api) with additional facilities/api for parallel, saving observation, recording using gym monitor, processing, normalising observations
+* Scikit-learn inspired api
+  e.g.
+
+.. code-block:: python
+
+   act = model.predict(obs, deterministic=True)[0]
+
+
+* Parallel learning using literally 2 lines of code to use with stable baselines 3
+
+.. code-block:: python
+
+   env = ss.pettingzoo_env_to_vec_env_v0(env)
+   env = ss.concat_vec_envs_v0(env, 8, num_cpus=4, base_class=’stable_baselines3’)
+
+
+* Tested and supports various multi-agent environments with many agents comparable to flatland. e.g. https://www.pettingzoo.ml/magent
+* Clean interface means we can custom add an experimenting tool like wandb and have full flexibility to save information we want
--- a/docs/interface/wrappers.md
+++ b/docs/interface/wrappers.md
+# Environment Wrappers
+
+> We provide various environment wrappers to work with both the rail env and the petting zoo interface.
+
+## Background
+
+These wrappers changes certain environment behavior which can help to get better reinforcement learning training.
+
+## Supported Inbuilt Wrappers
+
+We provide 2 sample wrappers for ShortestPathAction wrapper and SkipNoChoice wrapper. The wrappers requires many env properties that are only created on environment reset. Hence before using the wrapper, we must reset the rail env. To use the wrappers, simply pass the resetted rail env. Code samples are shown below for each wrapper.
+
+### ShortestPathAction Wrapper
+
+To use the ShortestPathAction Wrapper, simply wrap the rail env as follows
+
+```python
+rail_env.reset(random_seed=1)
+rail_env = ShortestPathActionWrapper(rail_env)
+```
+
+The shortest path action wrapper maps the existing action space into 3 actions - Shortest Path (`0`), Next Shortest Path (`1`) and Stop (`2`).  Hence, we must ensure that the predicted action should always be one of these (0, 1 and 2) actions. To route all agents in the shortest path, pass `0` as the action.
+
+### SkipNoChoice Wrapper
+
+To use the SkipNoChoiceWrapper, simply wrap the rail env as follows
+
+```python
+rail_env.reset(random_seed=1)
+rail_env = SkipNoChoiceCellsWrapper(rail_env, accumulate_skipped_rewards=False, discounting=0.0)
+```
--- a/docs/interface/wrappers.rst
+++ b/docs/interface/wrappers.rst
+
+Environment Wrappers
+====================
+
+..
+
+   We provide various environment wrappers to work with both the rail env and the petting zoo interface.
+
+
+Background
+----------
+
+These wrappers changes certain environment behavior which can help to get better reinforcement learning training.
+
+Supported Inbuilt Wrappers
+--------------------------
+
+We provide 2 sample wrappers for ShortestPathAction wrapper and SkipNoChoice wrapper. The wrappers requires many env properties that are only created on environment reset. Hence before using the wrapper, we must reset the rail env. To use the wrappers, simply pass the resetted rail env. Code samples are shown below for each wrapper.
+
+ShortestPathAction Wrapper
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To use the ShortestPathAction Wrapper, simply wrap the rail env as follows
+
+.. code-block:: python
+
+   rail_env.reset(random_seed=1)
+   rail_env = ShortestPathActionWrapper(rail_env)
+
+The shortest path action wrapper maps the existing action space into 3 actions - Shortest Path (\ ``0``\ ), Next Shortest Path (\ ``1``\ ) and Stop (\ ``2``\ ).  Hence, we must ensure that the predicted action should always be one of these (0, 1 and 2) actions. To route all agents in the shortest path, pass ``0`` as the action.
+
+SkipNoChoice Wrapper
+^^^^^^^^^^^^^^^^^^^^
+
+To use the SkipNoChoiceWrapper, simply wrap the rail env as follows
+
+.. code-block:: python
+
+   rail_env.reset(random_seed=1)
+   rail_env = SkipNoChoiceCellsWrapper(rail_env, accumulate_skipped_rewards=False, discounting=0.0)
--- a/docs/localevaluation.rst
+++ b/docs/localevaluation.rst
-=====
-Local Evaluation
-=====
-
-This document explains you how to locally evaluate your submissions before making 
-an official submission to the competition.
-
-Requirements
--------------
-
-* **flatland-rl** : We expect that you have `flatland-rl` installed by following the instructions in  :doc:`installation`.
-
-* **redis** : Additionally you will also need to have  `redis installed <https://redis.io/topics/quickstart>`_ and **should have it running in the background.**
-
-Test Data
--------------
-
-* **test env data** : You can `download and untar the test-env-data <https://www.aicrowd.com/challenges/flatland-challenge/dataset_files>`_, 
-at a location of your choice, lets say `/path/to/test-env-data/`. After untarring the folder, the folder structure should look something like : 
-
-
-.. code-block:: console
-
-    .
-    └── test-env-data
-        ├── Test_0
-        │   ├── Level_0.pkl
-        │   └── Level_1.pkl
-        ├── Test_1
-        │   ├── Level_0.pkl
-        │   └── Level_1.pkl
-        ├..................
-        ├..................
-        ├── Test_8
-        │   ├── Level_0.pkl
-        │   └── Level_1.pkl
-        └── Test_9
-            ├── Level_0.pkl
-            └── Level_1.pkl
-
-Evaluation Service
------------------
-
-* **start evaluation service** : Then you can start the evaluator by running : 
-
-.. code-block:: console
-
-    flatland-evaluator --tests /path/to/test-env-data/
-
-RemoteClient
------------------
-
-* **run client** : Some `sample submission code can be found in the starter-kit <https://github.com/AIcrowd/flatland-challenge-starter-kit/>`_, 
-but before you can run your code locally using `FlatlandRemoteClient`, you will have to set the `AICROWD_TESTS_FOLDER` environment variable to the location where you 
-previous untarred the folder with `the test-env-data`:
-
-.. code-block:: console
-
-    export AICROWD_TESTS_FOLDER="/path/to/test-env-data/"
-
-    # or on Windows :
-    # 
-    # set AICROWD_TESTS_FOLDER "\path\to\test-env-data\"
-
-    # and then finally run your code
-    python run.py
--- a/docs/make.bat
+++ b/docs/make.bat
-@ECHO OFF
-
-pushd %~dp0
-
-REM Command file for Sphinx documentation
-
-if "%SPHINXBUILD%" == "" (
-	set SPHINXBUILD=python -msphinx
-)
-set SOURCEDIR=.
-set BUILDDIR=_build
-set SPHINXPROJ=flatland
-
-if "%1" == "" goto help
-
-%SPHINXBUILD% >NUL 2>NUL
-if errorlevel 9009 (
-	echo.
-	echo.The Sphinx module was not found. Make sure you have Sphinx installed,
-	echo.then set the SPHINXBUILD environment variable to point to the full
-	echo.path of the 'sphinx-build' executable. Alternatively you may add the
-	echo.Sphinx directory to PATH.
-	echo.
-	echo.If you don't have Sphinx installed, grab it from
-	echo.http://sphinx-doc.org/
-	exit /b 1
-)
-
-%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
-goto end
-
-:help
-%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
-
-:end
-popd
--- a/docs/modules.rst
+++ b/docs/modules.rst
-flatland
-========
-
-.. toctree::
-   :maxdepth: 4
-
-   flatland
--- a/docs/specifications/core.md
+++ b/docs/specifications/core.md
+## Core Specifications
+
+### Environment Class Overview
+
+The Environment class contains all necessary functions for the interactions between the agents and the environment. The base Environment class is derived from rllib.env.MultiAgentEnv (https://github.com/ray-project/ray).
+
+The functions are specific for each realization of Flatland (e.g. Railway, Vaccination,...)
+In particular, we retain the rllib interface in the use of the step() function, that accepts a dictionary of actions indexed by the agents handles (returned by get_agent_handles()) and returns dictionaries of observations, dones and infos.
+
+```python
+class Environment:
+    """Base interface for multi-agent environments in Flatland.
+
+    Agents are identified by agent ids (handles).
+    Examples:
+        >>> obs, info = env.reset()
+        >>> print(obs)
+        {
+            "train_0": [2.4, 1.6],
+            "train_1": [3.4, -3.2],
+        }
+        >>> obs, rewards, dones, infos = env.step(
+            action_dict={
+                "train_0": 1, "train_1": 0})
+        >>> print(rewards)
+        {
+            "train_0": 3,
+            "train_1": -1,
+        }
+        >>> print(dones)
+        {
+            "train_0": False,    # train_0 is still running
+            "train_1": True,     # train_1 is done
+            "__all__": False,    # the env is not done
+        }
+        >>> print(infos)
+        {
+            "train_0": {},  # info for train_0
+            "train_1": {},  # info for train_1
+        }
+    """
+
+    def __init__(self):
+        pass
+
+    def reset(self):
+        """
+        Resets the env and returns observations from agents in the environment.
+
+        Returns:
+        obs : dict
+            New observations for each agent.
+        """
+        raise NotImplementedError()
+
+    def step(self, action_dict):
+        """
+        Performs an environment step with simultaneous execution of actions for
+        agents in action_dict.
+        Returns observations from agents in the environment.
+        The returns are dicts mapping from agent_id strings to values.
+
+        Parameters
+        -------
+        action_dict : dict
+            Dictionary of actions to execute, indexed by agent id.
+
+        Returns
+        -------
+        obs : dict
+            New observations for each ready agent.
+        rewards: dict
+            Reward values for each ready agent.
+        dones : dict
+            Done values for each ready agent. The special key "__all__"
+            (required) is used to indicate env termination.
+        infos : dict
+            Optional info values for each agent id.
+        """
+        raise NotImplementedError()
+
+    def render(self):
+        """
+        Perform rendering of the environment.
+        """
+        raise NotImplementedError()
+
+    def get_agent_handles(self):
+        """
+        Returns a list of agents' handles to be used as keys in the step()
+        function.
+        """
+        raise NotImplementedError()
+
+```
--- a/docs/specifications/img/UML_flatland.png
+++ b/docs/specifications/img/UML_flatland.png
--- a/docs/specifications/intro.md
+++ b/docs/specifications/intro.md
+## Intro
+
+In a human-readable language, specifications provide
+- code base overview (hand-drawn concept)
+- key concepts (generators, envs) and how are they linked
+- link relevant code base
+
+![Overview](img/UML_flatland.png)
+`Diagram Source <https://confluence.sbb.ch/x/pQfsSw>`_
--- a/docs/observation_actions.rst
+++ b/docs/observation_actions.rst
-=============================
+
 Observation and Action Spaces
-=============================
+----------------------------
 This is an introduction to the three standard observations and the action space of **Flatland**.

 Action Space
-============
+^^^^^^^^^^^^
 Flatland is a railway simulation. Thus the actions of an agent are strongly limited to the railway network. This means that in many cases not all actions are valid.
 The possible actions of an agent are

@@ -15,7 +15,7 @@ The possible actions of an agent are
 - ``4`` **Stop**: This action causes the agent to stop.

 Observation Spaces
-==================
+^^^^^^^^^^^^^^^^^^
 In the **Flatland** environment we have included three basic observations to get started. The figure below illustrates the observation range of the different basic observation: ``Global``, ``Local Grid`` and ``Local Tree``.

 .. image:: https://i.imgur.com/oo8EIYv.png
@@ -24,7 +24,7 @@ In the **Flatland** environment we have included three basic observations to get

   
 Global Observation
------------------
+~~~~~~~~~~~~~~~~~~
 Gives a global observation of the entire rail environment.

 The observation is composed of the following elements:
@@ -37,7 +37,7 @@ We encourage you to enhance this observation with any layer you think might help
 It would also be possible to construct a global observation for a super agent that controls all agents at once.

 Local Grid Observation
----------------------
+~~~~~~~~~~~~~~~~~~~~~~
 Gives a local observation of the rail environment around the agent.
 The observation is composed of the following elements:

@@ -50,7 +50,7 @@ Be aware that this observation **does not** contain any clues about target locat
 We encourage you to come up with creative ways to overcome this problem. In the tree observation below we introduce the concept of distance maps.

 Tree Observation
----------------
+~~~~~~~~~~~~~~~~
 The tree observation is built by exploiting the graph structure of the railway network. The observation is generated by spanning a **4 branched tree** from the current position of the agent. Each branch follows the allowed transitions (backward branch only allowed at dead-ends) until a cell with multiple allowed transitions is reached. Here the information gathered along the branch is stored as a node in the tree.
 The figure below illustrates how the tree observation is built:

@@ -73,7 +73,7 @@ The right side of the figure shows the resulting tree of the railway network on
    
    
 Node Information
----------------
+~~~~~~~~~~~~~~~~
 Each node is filled with information gathered along the path to the node. Currently each node contains 9 features:

 - 1: if own target lies on the explored branch the current distance from the agent in number of cells is stored.
No results found