diff --git a/torch_training/Getting_Started_Training.md b/torch_training/Getting_Started_Training.md index b4a5a6fd085d143aeb8f419b96633d6f73f92cab..6acd488f241c2c45276ce4b5f9dd848120c51dd9 100644 --- a/torch_training/Getting_Started_Training.md +++ b/torch_training/Getting_Started_Training.md @@ -54,7 +54,7 @@ Each node is filled with information gathered along the path to the node. Curren For training purposes the tree is flattend into a single array. ## Training - +### Setting up the environment Let us now train a simle double dueling DQN agent to navigate to its target on flatland. We start by importing flatland ``` from flatland.envs.generators import complex_rail_generator @@ -62,4 +62,139 @@ from flatland.envs.observations import TreeObsForRailEnv from flatland.envs.rail_env import RailEnv from flatland.utils.rendertools import RenderTool from utils.observation_utils import norm_obs_clip, split_tree -``` \ No newline at end of file +``` +For this simple example we want to train on randomly generated levels using the `complex_rail_generator`. We use the following parameter for our first experiment: +``` +# Parameters for the Environment +x_dim = 10 +y_dim = 10 +n_agents = 1 +n_goals = 5 +min_dist = 5 +``` +As mentioned above, for this experiment we are going to use the tree observation and thus we load the observation builder: +``` +# We are training an Agent using the Tree Observation with depth 2 +observation_builder = TreeObsForRailEnv(max_depth=2) +``` + +And pass it as an argument to the environment setup +```` +env = RailEnv(width=x_dim, + height=y_dim, + rail_generator=complex_rail_generator(nr_start_goal=n_goals, nr_extra=5, min_dist=min_dist, + max_dist=99999, + seed=0), + obs_builder_object=observation_builder, + number_of_agents=n_agents) +``` +We have no successfully set up the environment for training. To visualize it in the renderer we also initiate the renderer with. +``` +env_renderer = RenderTool(env, gl="PILSVG", ) +``` +###Setting up the agent +To set up a appropriate agent we need the state and action space sizes. From the discussion above about the tree observation we end up with: +``` +# Given the depth of the tree observation and the number of features per node we get the following state_size +features_per_node = 9 +tree_depth = 2 +nr_nodes = 0 +for i in range(tree_depth + 1): + nr_nodes += np.power(4, i) +state_size = features_per_node * nr_nodes + +# The action space of flatland is 5 discrete actions +action_size = 5 +``` +In the `training_navigation.py` file you will finde further variable that we initiate in order to keep track of the training progress. +Below you see an example code to train an agent. It is important to note that we reshape and normalize the tree observation provided by the environment to facilitate training. +To do so, we use the utility functions `split_tree(tree=np.array(obs[a]), num_features_per_node=features_per_node, current_depth=0)` and `norm_obs_clip()`. Feel free to modify the normalization as you see fit. +``` +# Split the observation tree into its parts and normalize the observation using the utility functions. + # Build agent specific local observation + for a in range(env.get_num_agents()): + rail_data, distance_data, agent_data = split_tree(tree=np.array(obs[a]), + num_features_per_node=features_per_node, + current_depth=0) + rail_data = norm_obs_clip(rail_data) + distance_data = norm_obs_clip(distance_data) + agent_data = np.clip(agent_data, -1, 1) + agent_obs[a] = np.concatenate((np.concatenate((rail_data, distance_data)), agent_data)) +``` +We now use the normalized `agent_obs` for our training loop: + +``` +for trials in range(1, n_trials + 1): + + # Reset environment + obs = env.reset(True, True) + if not Training: + env_renderer.set_new_rail() + + # Split the observation tree into its parts and normalize the observation using the utility functions. + # Build agent specific local observation + for a in range(env.get_num_agents()): + rail_data, distance_data, agent_data = split_tree(tree=np.array(obs[a]), + num_features_per_node=features_per_node, + current_depth=0) + rail_data = norm_obs_clip(rail_data) + distance_data = norm_obs_clip(distance_data) + agent_data = np.clip(agent_data, -1, 1) + agent_obs[a] = np.concatenate((np.concatenate((rail_data, distance_data)), agent_data)) + + # Reset score and done + score = 0 + env_done = 0 + + # Run episode + for step in range(max_steps): + + # Only render when not triaing + if not Training: + env_renderer.renderEnv(show=True, show_observations=True) + + # Chose the actions + for a in range(env.get_num_agents()): + if not Training: + eps = 0 + + action = agent.act(agent_obs[a], eps=eps) + action_dict.update({a: action}) + + # Count number of actions takes for statistics + action_prob[action] += 1 + + # Environment step + next_obs, all_rewards, done, _ = env.step(action_dict) + + for a in range(env.get_num_agents()): + rail_data, distance_data, agent_data = split_tree(tree=np.array(next_obs[a]), + num_features_per_node=features_per_node, + current_depth=0) + rail_data = norm_obs_clip(rail_data) + distance_data = norm_obs_clip(distance_data) + agent_data = np.clip(agent_data, -1, 1) + agent_next_obs[a] = np.concatenate((np.concatenate((rail_data, distance_data)), agent_data)) + + # Update replay buffer and train agent + for a in range(env.get_num_agents()): + + # Remember and train agent + if Training: + agent.step(agent_obs[a], action_dict[a], all_rewards[a], agent_next_obs[a], done[a]) + + # Update the current score + score += all_rewards[a] / env.get_num_agents() + + agent_obs = agent_next_obs.copy() + if done['__all__']: + env_done = 1 + break + + # Epsilon decay + eps = max(eps_end, eps_decay * eps) # decrease epsilon +``` + +Running the `navigation_training.py` file trains a simple agent to navigate to any random target within the railway network. After running you should see a learning curve similiar to this one: + +