Skip to content
Snippets Groups Projects
Commit bcd06459 authored by Erik Nygren's avatar Erik Nygren
Browse files

minor bugfixes. Simple distance map test (thanks Christian). This test will be enhanced soon.

parent 82c15121
No related branches found
No related tags found
No related merge requests found
...@@ -16,8 +16,10 @@ class TreeObsForRailEnv(ObservationBuilder): ...@@ -16,8 +16,10 @@ class TreeObsForRailEnv(ObservationBuilder):
TreeObsForRailEnv object. TreeObsForRailEnv object.
This object returns observation vectors for agents in the RailEnv environment. This object returns observation vectors for agents in the RailEnv environment.
The information is local to each agent and exploits the tree structure of the rail The information is local to each agent and exploits the graph structure of the rail
network to simplify the representation of the state of the environment for each agent. network to simplify the representation of the state of the environment for each agent.
For details about the features in the tree observation see the get() function.
""" """
observation_dim = 9 observation_dim = 9
...@@ -204,7 +206,7 @@ class TreeObsForRailEnv(ObservationBuilder): ...@@ -204,7 +206,7 @@ class TreeObsForRailEnv(ObservationBuilder):
[... from 'right] + [... from 'right] +
[... from 'back'] [... from 'back']
Finally, each node information is composed of 8 floating point values: Each node information is composed of 9 features:
#1: if own target lies on the explored branch the current distance from the agent in number of cells is stored. #1: if own target lies on the explored branch the current distance from the agent in number of cells is stored.
......
""" """
Definition of the RailEnv environment and related level-generation functions. Definition of the RailEnv environment.
Generator functions are functions that take width, height and num_resets as arguments and return
a GridTransitionMap object.
""" """
# TODO: _ this is a global method --> utils or remove later # TODO: _ this is a global method --> utils or remove later
...@@ -46,20 +43,35 @@ class RailEnv(Environment): ...@@ -46,20 +43,35 @@ class RailEnv(Environment):
to avoid bottlenecks. to avoid bottlenecks.
The valid actions in the environment are: The valid actions in the environment are:
0: do nothing
1: turn left and move to the next cell; if the agent was not moving, movement is started - 0: do nothing (continue moving or stay still)
2: move to the next cell in front of the agent; if the agent was not moving, movement is started - 1: turn left at switch and move to the next cell; if the agent was not moving, movement is started
3: turn right and move to the next cell; if the agent was not moving, movement is started - 2: move to the next cell in front of the agent; if the agent was not moving, movement is started
4: stop moving - 3: turn right at switch and move to the next cell; if the agent was not moving, movement is started
- 4: stop moving
Moving forward in a dead-end cell makes the agent turn 180 degrees and step Moving forward in a dead-end cell makes the agent turn 180 degrees and step
to the cell it came from. to the cell it came from.
The actions of the agents are executed in order of their handle to prevent The actions of the agents are executed in order of their handle to prevent
deadlocks and to allow them to learn relative priorities. deadlocks and to allow them to learn relative priorities.
TODO: WRITE ABOUT THE REWARD FUNCTION, and possibly allow for alpha and Reward Function:
beta to be passed as parameters to __init__().
It costs each agent a step_penalty for every time-step taken in the environment. Independent of the movement
of the agent. Currently all other penalties such as penalty for stopping, starting and invalid actions are set to 0.
alpha = 1
beta = 1
Reward function parameters:
- invalid_action_penalty = 0
- step_penalty = -alpha
- global_reward = beta
- stop_penalty = 0 # penalty for stopping a moving agent
- start_penalty = 0 # penalty for starting a stopped agent
""" """
def __init__(self, def __init__(self,
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment