From ac43ecbc7ee9d505c076f74e185a21e6b72f7136 Mon Sep 17 00:00:00 2001 From: mlerik <baerenjesus@gmail.com> Date: Tue, 9 Jul 2019 17:59:14 +0000 Subject: [PATCH] Update observation_actions.rst --- docs/observation_actions.rst | 58 ++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/docs/observation_actions.rst b/docs/observation_actions.rst index 9de50d7..44763b4 100644 --- a/docs/observation_actions.rst +++ b/docs/observation_actions.rst @@ -25,9 +25,67 @@ In the **Flatland** environment we have included three basic observations to get Global Observation ------------------ +Gives a global observation of the entire rail environment. + +The observation is composed of the following elements: + + - transition map array with dimensions (env.height, env.width, 16), assuming 16 bits encoding of transitions. + - Two 2D arrays (map_height, map_width, 2) containing respectively the position of the given agent target and the positions of the other agents targets. + - A 3D array (map_height, map_width, 8) with the 4 first channels containing the one hot encoding of the direction of the given agent and the 4 second channels containing the positions of the other agents at their position coordinates. + +Feel free to enhance this observation with any layer you think might help solve the problem. +It would also be possible to construct a global observation for a super agent that controls all agents at once. Local Grid Observation ---------------------- +Gives a local observation of the rail environment around the agent. +The observation is composed of the following elements: + + - transition map array of the local environment around the given agent, with dimensions (2*view_radius + 1, 2*view_radius + 1, 16), assuming 16 bits encoding of transitions. + - Two 2D arrays (2*view_radius + 1, 2*view_radius + 1, 2) containing respectively, if they are in the agent's vision range, its target position, the positions of the other targets. + - A 3D array (2*view_radius + 1, 2*view_radius + 1, 4) containing the one hot encoding of directions of the other agents at their position coordinates, if they are in the agent's vision range. + - A 4 elements array with one hot encoding of the direction. + +Be aware that this observation does not contain any clues about target location. Thus navigation on maps where the radios of the observation does not guarantee a visible target at all times will become very difficult. +We encourage you to come up with creative ways to overcome this problem. In the tree observation below we introduce the concept of distance maps. Tree Observation ---------------- +The tree observations is build by exploiting the graph structure of the railway network. The observation is generated by spanning a 4 branched tree from the current position of the agent. Each branch follows the allowed transitions (backward branch only allowed at dead-ends) untill a cell with multiple allowed transitions is reached. Here the information gathered along the branch is stored as a node in the tree. + +.. image:: https://i.imgur.com/C4LbqPJ.png + :height: 100 + :width: 200 + + +Node Information +---------------- +Each node is filled with information gathered along the path to the node. Currently each node contains 9 features: + +- 1: if own target lies on the explored branch the current distance from the agent in number of cells is stored. + +- 2: if another agents target is detected the distance in number of cells from current agent position is stored. + +- 3: if another agent is detected the distance in number of cells from current agent position is stored. + +- 4: possible conflict detected (This only works when we use a predictor and will not be important in this tutorial) + + +- 5: if an not usable switch (for agent) is detected we store the distance. An unusable switch is a switch where the agent does not have any choice of path, but other agents coming from different directions might. + + +- 6: This feature stores the distance (in number of cells) to the next node (e.g. switch or target or dead-end) + +- 7: minimum remaining travel distance from node to the agent's target given the direction of the agent if this path is chosen + + +- 8: agent in the same direction found on path to node + - n = number of agents present same direction (possible future use: number of other agents in the same direction in this branch) + - 0 = no agent present same direction + +- 9: agent in the opposite direction on path to node + - n = number of agents present other direction than myself + - 0 = no agent present other direction than myself + + + -- GitLab