Skip to content
Snippets Groups Projects
Commit a58f4ef1 authored by Erik Nygren's avatar Erik Nygren :bullettrain_front:
Browse files

Update Getting_Started_Training.md

parent 07724835
No related branches found
No related tags found
No related merge requests found
......@@ -54,7 +54,7 @@ Each node is filled with information gathered along the path to the node. Curren
For training purposes the tree is flattend into a single array.
## Training
### Setting up the environment
Let us now train a simle double dueling DQN agent to navigate to its target on flatland. We start by importing flatland
```
from flatland.envs.generators import complex_rail_generator
......@@ -62,4 +62,139 @@ from flatland.envs.observations import TreeObsForRailEnv
from flatland.envs.rail_env import RailEnv
from flatland.utils.rendertools import RenderTool
from utils.observation_utils import norm_obs_clip, split_tree
```
\ No newline at end of file
```
For this simple example we want to train on randomly generated levels using the `complex_rail_generator`. We use the following parameter for our first experiment:
```
# Parameters for the Environment
x_dim = 10
y_dim = 10
n_agents = 1
n_goals = 5
min_dist = 5
```
As mentioned above, for this experiment we are going to use the tree observation and thus we load the observation builder:
```
# We are training an Agent using the Tree Observation with depth 2
observation_builder = TreeObsForRailEnv(max_depth=2)
```
And pass it as an argument to the environment setup
````
env = RailEnv(width=x_dim,
height=y_dim,
rail_generator=complex_rail_generator(nr_start_goal=n_goals, nr_extra=5, min_dist=min_dist,
max_dist=99999,
seed=0),
obs_builder_object=observation_builder,
number_of_agents=n_agents)
```
We have no successfully set up the environment for training. To visualize it in the renderer we also initiate the renderer with.
```
env_renderer = RenderTool(env, gl="PILSVG", )
```
###Setting up the agent
To set up a appropriate agent we need the state and action space sizes. From the discussion above about the tree observation we end up with:
```
# Given the depth of the tree observation and the number of features per node we get the following state_size
features_per_node = 9
tree_depth = 2
nr_nodes = 0
for i in range(tree_depth + 1):
nr_nodes += np.power(4, i)
state_size = features_per_node * nr_nodes
# The action space of flatland is 5 discrete actions
action_size = 5
```
In the `training_navigation.py` file you will finde further variable that we initiate in order to keep track of the training progress.
Below you see an example code to train an agent. It is important to note that we reshape and normalize the tree observation provided by the environment to facilitate training.
To do so, we use the utility functions `split_tree(tree=np.array(obs[a]), num_features_per_node=features_per_node, current_depth=0)` and `norm_obs_clip()`. Feel free to modify the normalization as you see fit.
```
# Split the observation tree into its parts and normalize the observation using the utility functions.
# Build agent specific local observation
for a in range(env.get_num_agents()):
rail_data, distance_data, agent_data = split_tree(tree=np.array(obs[a]),
num_features_per_node=features_per_node,
current_depth=0)
rail_data = norm_obs_clip(rail_data)
distance_data = norm_obs_clip(distance_data)
agent_data = np.clip(agent_data, -1, 1)
agent_obs[a] = np.concatenate((np.concatenate((rail_data, distance_data)), agent_data))
```
We now use the normalized `agent_obs` for our training loop:
```
for trials in range(1, n_trials + 1):
# Reset environment
obs = env.reset(True, True)
if not Training:
env_renderer.set_new_rail()
# Split the observation tree into its parts and normalize the observation using the utility functions.
# Build agent specific local observation
for a in range(env.get_num_agents()):
rail_data, distance_data, agent_data = split_tree(tree=np.array(obs[a]),
num_features_per_node=features_per_node,
current_depth=0)
rail_data = norm_obs_clip(rail_data)
distance_data = norm_obs_clip(distance_data)
agent_data = np.clip(agent_data, -1, 1)
agent_obs[a] = np.concatenate((np.concatenate((rail_data, distance_data)), agent_data))
# Reset score and done
score = 0
env_done = 0
# Run episode
for step in range(max_steps):
# Only render when not triaing
if not Training:
env_renderer.renderEnv(show=True, show_observations=True)
# Chose the actions
for a in range(env.get_num_agents()):
if not Training:
eps = 0
action = agent.act(agent_obs[a], eps=eps)
action_dict.update({a: action})
# Count number of actions takes for statistics
action_prob[action] += 1
# Environment step
next_obs, all_rewards, done, _ = env.step(action_dict)
for a in range(env.get_num_agents()):
rail_data, distance_data, agent_data = split_tree(tree=np.array(next_obs[a]),
num_features_per_node=features_per_node,
current_depth=0)
rail_data = norm_obs_clip(rail_data)
distance_data = norm_obs_clip(distance_data)
agent_data = np.clip(agent_data, -1, 1)
agent_next_obs[a] = np.concatenate((np.concatenate((rail_data, distance_data)), agent_data))
# Update replay buffer and train agent
for a in range(env.get_num_agents()):
# Remember and train agent
if Training:
agent.step(agent_obs[a], action_dict[a], all_rewards[a], agent_next_obs[a], done[a])
# Update the current score
score += all_rewards[a] / env.get_num_agents()
agent_obs = agent_next_obs.copy()
if done['__all__']:
env_done = 1
break
# Epsilon decay
eps = max(eps_end, eps_decay * eps) # decrease epsilon
```
Running the `navigation_training.py` file trains a simple agent to navigate to any random target within the railway network. After running you should see a learning curve similiar to this one:
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment