@@ -12,6 +12,11 @@ The possible actions of an agent are
...
@@ -12,6 +12,11 @@ The possible actions of an agent are
- 3 *Deviate Right*: Exactly the same as deviate left but for right turns.
- 3 *Deviate Right*: Exactly the same as deviate left but for right turns.
- 4 *Stop*: This action causes the agent to stop, this is necessary to avoid conflicts in multi agent setups (Not needed for navigation).
- 4 *Stop*: This action causes the agent to stop, this is necessary to avoid conflicts in multi agent setups (Not needed for navigation).
## Shortest path predictor
With multiple agents alot of conlflicts will arise on the railway network. These conflicts arise because different agents want to occupie the same cells at the same time. Due to the nature of the railway network and the dynamic of the railway agents (can't turn around), the conflicts have to be detected in advance in order to avoid them. If agents are facing each other and don't have any options to deviate from their path it is called a *deadlock*.
Therefore we introduce a simple prediction function that predicts the most likely (here shortest) path of all the agents. Furthermore, the prediction is withdrawn if an agent stopps and replaced by a prediction that the agent will stay put. The predictions allow the agents to detect possible conflicts before they happen and thus performe counter measures.
*ATTENTION*: This is a very basic implementation of a predictor. It will not solve all the problems because it always predicts shortest paths and not alternative routes. It is up to you to come up with much more clever predictors to avod conflicts!
## Tree Observation
## Tree Observation
Flatland offers three basic observations from the beginning. We encourage you to develop your own observations that are better suited for this specific task.
Flatland offers three basic observations from the beginning. We encourage you to develop your own observations that are better suited for this specific task.
...
@@ -57,34 +62,41 @@ Each node is filled with information gathered along the path to the node. Curren
...
@@ -57,34 +62,41 @@ Each node is filled with information gathered along the path to the node. Curren
For training purposes the tree is flattend into a single array.
For training purposes the tree is flattend into a single array.
## Training
## Training
### Setting up the environment
### Setting up the environment
Let us now train a simle double dueling DQN agent to navigate to its target on flatland. We start by importing flatland
Let us now train a simle double dueling DQN agent to detect to find its target and try to avoid conflicts on flatland. We start by importing the necessary packages from Flatland. Note that we now also import a predictor from `flatland.envs.predictions`
```
```
from flatland.envs.generators import complex_rail_generator
from flatland.envs.generators import complex_rail_generator
from flatland.envs.observations import TreeObsForRailEnv
from flatland.envs.observations import TreeObsForRailEnv
from flatland.envs.predictions import ShortestPathPredictorForRailEnv
from flatland.envs.rail_env import RailEnv
from flatland.envs.rail_env import RailEnv
from flatland.utils.rendertools import RenderTool
from utils.observation_utils import norm_obs_clip, split_tree
from utils.observation_utils import norm_obs_clip, split_tree
```
```
For this simple example we want to train on randomly generated levels using the `complex_rail_generator`. We use the following parameter for our first experiment:
For this simple example we want to train on randomly generated levels using the `complex_rail_generator`. The training curriculum will use different sets of parameters throughout training to enhance generalizability of the solution.
```
```
# Parameters for the Environment
# Initialize a random map with a random number of agents
x_dim = 10
x_dim = np.random.randint(8, 20)
y_dim = 10
y_dim = np.random.randint(8, 20)
n_agents = 1
n_agents = np.random.randint(3, 8)
n_goals = 5
n_goals = n_agents + np.random.randint(0, 3)
min_dist = 5
min_dist = int(0.75 * min(x_dim, y_dim))
tree_depth = 3
```
```
As mentioned above, for this experiment we are going to use the tree observation and thus we load the observation builder:
As mentioned above, for this experiment we are going to use the tree observation and thus we load the observation builder. Also we are now using the predictor as well which is passed to the observation builder.
```
```
# We are training an Agent using the Tree Observation with depth 2
And pass it as an argument to the environment setup
And pass it as an argument to the environment setup
...
@@ -101,31 +113,21 @@ env = RailEnv(width=x_dim,
...
@@ -101,31 +113,21 @@ env = RailEnv(width=x_dim,
We have no successfully set up the environment for training. To visualize it in the renderer we also initiate the renderer with.
We have no successfully set up the environment for training. To visualize it in the renderer we also initiate the renderer with.
```
env_renderer = RenderTool(env, gl="PILSVG", )
```
###Setting up the agent
###Setting up the agent
To set up a appropriate agent we need the state and action space sizes. From the discussion above about the tree observation we end up with:
To set up a appropriate agent we need the state and action space sizes. From the discussion above about the tree observation we end up with:
[**Adrian**: I just wonder, why this is not done in seperate method in the the observation: get_state_size, then we don't have to write down much more. And the user don't need to
understand anything about the oberservation. I suggest moving this into the obersvation, base ObservationBuilder declare it as an abstract method. ... ]
```
```
# Given the depth of the tree observation and the number of features per node we get the following state_size
# The action space of flatland is 5 discrete actions
action_size = 5
action_size = 5
```
```
In the `training_navigation.py` file you will find further variable that we initiate in order to keep track of the training progress.
In the `multi_agent_training.py` file you will find further variable that we initiate in order to keep track of the training progress.
Below you see an example code to train an agent. It is important to note that we reshape and normalize the tree observation provided by the environment to facilitate training.
Below you see an example code to train an agent. It is important to note that we reshape and normalize the tree observation provided by the environment to facilitate training.
To do so, we use the utility functions `split_tree(tree=np.array(obs[a]), num_features_per_node=features_per_node, current_depth=0)` and `norm_obs_clip()`. Feel free to modify the normalization as you see fit.
To do so, we use the utility functions `split_tree(tree=np.array(obs[a]), num_features_per_node=features_per_node, current_depth=0)` and `norm_obs_clip()`. Feel free to modify the normalization as you see fit.
...
@@ -143,81 +145,105 @@ To do so, we use the utility functions `split_tree(tree=np.array(obs[a]), num_fe
...
@@ -143,81 +145,105 @@ To do so, we use the utility functions `split_tree(tree=np.array(obs[a]), num_fe
```
```
We now use the normalized `agent_obs` for our training loop:
We now use the normalized `agent_obs` for our training loop:
[**Adrian**: Same question as above, why not done in the observation class?]
```
for trials in range(1, n_trials + 1):
# Reset environment
obs = env.reset(True, True)
if not Training:
env_renderer.set_new_rail()
# Split the observation tree into its parts and normalize the observation using the utility functions.
Running the `navigation_training.py` file trains a simple agent to navigate to any random target within the railway network. After running you should see a learning curve similiar to this one:
data = norm_obs_clip(data, fixed_radius=observation_radius)
scores_window.append(score / max_steps) # save most recent score
scores.append(np.mean(scores_window))
dones_list.append((np.mean(done_window)))
```
Running the `multi_agent_training.py` file trains a simple agent to navigate to any random target within the railway network. After running you should see a learning curve similiar to this one: