Update observation_actions.rst

ac43ecbc · Erik Nygren · a60da65a · ac43ecbc
Commit ac43ecbc authored 5 years ago by Erik Nygren
--- a/docs/observation_actions.rst
+++ b/docs/observation_actions.rst
@@ -25,9 +25,67 @@ In the **Flatland** environment we have included three basic observations to get
 Global Observation
 ------------------
+Gives a global observation of the entire rail environment.
+The observation is composed of the following elements:
+    - transition map array with dimensions (env.height, env.width, 16), assuming 16 bits encoding of transitions.
+    - Two 2D arrays (map_height, map_width, 2) containing respectively the position of the given agent target and the positions of the other agents targets.
+    - A 3D array (map_height, map_width, 8) with the 4 first channels containing the one hot encoding of the direction of the given agent and the 4 second channels containing the positions of the other agents at their position coordinates.
+Feel free to enhance this observation with any layer you think might help solve the problem.
+It would also be possible to construct a global observation for a super agent that controls all agents at once.
 Local Grid Observation
 ----------------------
+Gives a local observation of the rail environment around the agent.
+The observation is composed of the following elements:
+    - transition map array of the local environment around the given agent, with dimensions (2*view_radius + 1, 2*view_radius + 1, 16), assuming 16 bits encoding of transitions.
+    - Two 2D arrays (2*view_radius + 1, 2*view_radius + 1, 2) containing respectively, if they are in the agent's vision range, its target position, the positions of the other targets.
+    - A 3D array (2*view_radius + 1, 2*view_radius + 1, 4) containing the one hot encoding of directions of the other agents at their position coordinates, if they are in the agent's vision range.
+    - A 4 elements array with one hot encoding of the direction.
+Be aware that this observation does not contain any clues about target location. Thus navigation on maps where the radios of the observation does not guarantee a visible target at all times will become very difficult.
+We encourage you to come up with creative ways to overcome this problem. In the tree observation below we introduce the concept of distance maps.
 Tree Observation
 ----------------
+The tree observations is build by exploiting the graph structure of the railway network. The observation is generated by spanning a 4 branched tree from the current position of the agent. Each branch follows the allowed transitions (backward branch only allowed at dead-ends) untill a cell with multiple allowed transitions is reached. Here the information gathered along the branch is stored as a node in the tree.
+.. image:: https://i.imgur.com/C4LbqPJ.png
+    :height: 100
+    :width: 200
+Node Information
+----------------
+Each node is filled with information gathered along the path to the node. Currently each node contains 9 features:
+- 1: if own target lies on the explored branch the current distance from the agent in number of cells is stored.
+- 2: if another agents target is detected the distance in number of cells from current agent position is stored.
+- 3: if another agent is detected the distance in number of cells from current agent position is stored.
+- 4: possible conflict detected (This only works when we use a predictor and will not be important in this tutorial)
+- 5: if an not usable switch (for agent) is detected we store the distance. An unusable switch is a switch where the agent does not have any choice of path, but other agents coming from different directions might. 
+- 6: This feature stores the distance (in number of cells) to the next node (e.g. switch or target or dead-end)
+- 7: minimum remaining travel distance from node to the agent's target given the direction of the agent if this path is chosen
+- 8: agent in the same direction found on path to node
+    - n = number of agents present same direction (possible future use: number of other agents in the same direction in this branch)
+    - 0 = no agent present same direction
+- 9: agent in the opposite direction on path to node
+    - n = number of agents present other direction than myself
+    - 0 = no agent present other direction than myself