From cfae84003d9cf87c832d469188aea70f1ad2ba01 Mon Sep 17 00:00:00 2001
From: spiglerg <spiglerg@gmail.com>
Date: Thu, 23 May 2019 12:47:32 +0000
Subject: [PATCH] Update gettingstarted.rst

---
 docs/gettingstarted.rst | 81 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/docs/gettingstarted.rst b/docs/gettingstarted.rst
index c94cb434..0b7c552f 100644
--- a/docs/gettingstarted.rst
+++ b/docs/gettingstarted.rst
@@ -5,7 +5,9 @@ Getting Started
 Overview
 --------------
 
-Following are three short tutorials to help new users get acquainted with how to create RailEnvs, how to train simple DQN agents on them, and how to customize them.
+Following are three short tutorials to help new users get acquainted with how 
+to create RailEnvs, how to train simple DQN agents on them, and how to customize 
+them.
 
 To use flatland in a project:
 
@@ -17,6 +19,83 @@ To use flatland in a project:
 Part 1 : Basic Usage
 --------------
 
+The basic usage of RailEnv environments consists in creating a RailEnv object 
+endowed with a rail generator, that generates new rail networks on each reset, 
+and an observation generator object, that is supplied with environment-specific 
+information at each time step and provides a suitable observation vector to the 
+agents.
+
+The simplest rail generators are envs.generators.rail_from_manual_specifications_generator 
+and envs.generators.random_rail_generator.
+
+The first one accepts a list of lists whose each element is a 2-tuple, whose 
+entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and 
+the desired clockwise rotation of the cell contents (0, 90, 180 or 270 degrees).
+For example,
+
+.. code-block:: python
+
+    specs = [[(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)],
+             [(0, 0), (0, 0), (0, 0), (0, 0), (7, 0), (0, 0)],
+             [(7, 270), (1, 90), (1, 90), (1, 90), (2, 90), (7, 90)],
+             [(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)]]
+
+    env = RailEnv(width=6,
+                  height=4,
+                  rail_generator=rail_from_manual_specifications_generator(specs),
+                  number_of_agents=1,
+                  obs_builder_object=TreeObsForRailEnv(max_depth=2))
+
+Alternatively, a random environment can be generated (optionally specifying 
+weights for each cell type to increase or decrease their proportion in the 
+generated rail networks).
+
+.. code-block:: python
+
+    # Relative weights of each cell type to be used by the random rail generators.
+    transition_probability = [1.0,  # empty cell - Case 0
+                              1.0,  # Case 1 - straight
+                              1.0,  # Case 2 - simple switch
+                              0.3,  # Case 3 - diamond drossing
+                              0.5,  # Case 4 - single slip
+                              0.5,  # Case 5 - double slip
+                              0.2,  # Case 6 - symmetrical
+                              0.0,  # Case 7 - dead end
+                              0.2,  # Case 8 - turn left
+                              0.2,  # Case 9 - turn right
+                              1.0]  # Case 10 - mirrored switch
+    
+    # Example generate a random rail
+    env = RailEnv(width=10,
+                  height=10,
+                  rail_generator=random_rail_generator(cell_type_relative_proportion=transition_probability),
+                  number_of_agents=3,
+                  obs_builder_object=TreeObsForRailEnv(max_depth=2))
+
+Environments can be rendered using the utils.rendertools utilities, for example:
+
+.. code-block: python
+
+    env_renderer = RenderTool(env, gl="QT")
+    env_renderer.renderEnv(show=True)
+
+
+Finally, the environment can be run by supplying the environment step function 
+with a dictionary of actions whose keys are agents' handles (returned by 
+env.get_agent_handles() ) and the corresponding values the selected actions.
+For example, for a 2-agents environment:
+
+.. code-block: python
+
+    handles = env.get_agent_handles()
+    action_dict = {handles[0]:0, handles[1]:0}
+    obs, all_rewards, done, _ = env.step(action_dict)
+
+where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents' 
+handles, whose values correspond to the relevant observations, rewards and terminal 
+status for each agent. Further, the 'dones' dictionary returns an extra key 
+'__all__' that is set to True after all agents have reached their goals.
+
 
 
 Part 2 : Training a Simple DQN Agent
-- 
GitLab