Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • jack_bruck/baselines
  • rivesunder/baselines
  • xzhaoma/baselines
  • giulia_cantini/baselines
  • sfwatergit/baselines
  • jiaodaxiaozi/baselines
  • flatland/baselines
7 results
Show changes
Commits on Source (278)
Showing
with 91 additions and 959 deletions
*pycache*
*ppo_policy*
torch_training/Nets/
# Default ignored files
/workspace.xml
\ No newline at end of file
MIT License
Copyright (c) 2019 SBB AG and AIcrowd
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
include AUTHORS.md
include CONTRIBUTING.rst
include changelog.md
include LICENSE
include README.md
include requirements_torch_training.txt
recursive-include tests *
recursive-exclude * __pycache__
recursive-exclude * *.py[co]
recursive-include docs *.rst *.md conf.py *.jpg *.png *.gif
# ⚠️ Deprecated repository
This repository is deprecated! Please go to:
#### **https://gitlab.aicrowd.com/flatland/flatland-examples**
## Torch Training
The `torch_training` folder shows an example of how to train agents with a DQN implemented in pytorch.
In the links below you find introductions to training an agent on Flatland:
- Training an agent for navigation ([Introduction](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/Getting_Started_Training.md))
- Training multiple agents to avoid conflicts ([Introduction](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/Multi_Agent_Training_Intro.md))
Use this introductions to get used to the Flatland environment. Then build your own predictors, observations and agents to improve the performance even more and solve the most complex environments of the challenge.
With the above introductions you will solve tasks like these and even more...
![Conflict_Avoidance](https://i.imgur.com/AvBHKaD.gif)
## Sequential Agent
This is a very simple baseline to show you have the `complex_level_generator` generates feasible network configurations.
If you run the `run_test.py` file you will see a simple agent that solves the level by sequentially running each agent along its shortest path.
This is very innefficient but it solves all the instances generated by `complex_level_generator`. However when being scored for the AIcrowd competition, this agent fails due to the duration it needs to solve an episode.
Here you see it in action:
![Sequential_Agent](https://i.imgur.com/DsbG6zK.gif)
\ No newline at end of file
This repository allows to run Rail Environment multi agent training with the RLLib Library.
It should be clone inside the main flatland repository.
## Installation:
```sh
pip install ray
pip install gin-config
```
To start a grid search on some parameters, you can create a folder containing a config.gin file (see example in `grid_search_configs/n_agents_grid_search/config.gin`.
Then, you can modify the config.gin file path at the end of the `grid_search_train.py` file.
The results will be stored inside the folder, and the learning curves can be visualized in
tensorboard:
```
tensorboard --logdir=/path/to/foler_containing_config_gin_file
```
## Gin config files
In each config.gin files, all the parameters, except `local_dir` of the `run_experiment` functions have to be specified.
For example, to indicate the number of agents that have to be initialized at the beginning of each simulation, the following line should be added:
```
run_experiment.n_agents = 2
```
If several number of agents have to be explored during the experiment, one can pass the following value to the `n_agents` parameter:
```
run_experiment.n_agents = {"grid_search": [2,5]}
```
which is the way to indicate to the tune library to experiment several values for a parameter.
To reference a class or an object within gin, you should first register it from the `train_experiment.py` script adding the following line:
```
gin.external_configurable(TreeObsForRailEnv)
```
and then a `TreeObsForRailEnv` object can be referenced in the `config.gin` file:
```
run_experiment.obs_builder = {"grid_search": [@TreeObsForRailEnv(), @GlobalObsForRailEnv()]}
TreeObsForRailEnv.max_depth = 2
```
Note that `@TreeObsForRailEnv` references the class, while `@TreeObsForRailEnv()` references instantiates an object of this class.
More documentation on how to use gin-config can be found on the library github repository: https://github.com/google/gin-config
from flatland.envs.rail_env import RailEnv
from ray.rllib.env.multi_agent_env import MultiAgentEnv
from flatland.envs.observations import TreeObsForRailEnv
from flatland.envs.generators import random_rail_generator
from ray.rllib.utils.seed import seed as set_seed
import numpy as np
class RailEnvRLLibWrapper(MultiAgentEnv):
def __init__(self, config):
# width,
# height,
# rail_generator=random_rail_generator(),
# number_of_agents=1,
# obs_builder_object=TreeObsForRailEnv(max_depth=2)):
super(MultiAgentEnv, self).__init__()
if hasattr(config, "vector_index"):
vector_index = config.vector_index
else:
vector_index = 1
#self.rail_generator = config["rail_generator"](nr_start_goal=config['number_of_agents'], min_dist=5,
# nr_extra=30, seed=config['seed'] * (1+vector_index))
set_seed(config['seed'] * (1+vector_index))
#self.env = RailEnv(width=config["width"], height=config["height"],
self.env = RailEnv(width=10, height=20,
number_of_agents=config["number_of_agents"], obs_builder_object=config['obs_builder'])
self.env.load('/mount/SDC/flatland/baselines/torch_training/railway/complex_scene.pkl')
self.width = self.env.width
self.height = self.env.height
def reset(self):
self.agents_done = []
obs = self.env.reset(False, False)
o = dict()
# o['agents'] = obs
# obs[0] = [obs[0], np.ones((17, 17)) * 17]
# obs['global_obs'] = np.ones((17, 17)) * 17
self.rail = self.env.rail
self.agents = self.env.agents
self.agents_static = self.env.agents_static
self.dev_obs_dict = self.env.dev_obs_dict
return obs
def step(self, action_dict):
obs, rewards, dones, infos = self.env.step(action_dict)
# print(obs)
d = dict()
r = dict()
o = dict()
# print(self.agents_done)
# print(dones)
for agent, done in dones.items():
if agent not in self.agents_done:
if agent != '__all__':
o[agent] = obs[agent]
r[agent] = rewards[agent]
d[agent] = dones[agent]
for agent, done in dones.items():
if done and agent != '__all__':
self.agents_done.append(agent)
self.rail = self.env.rail
self.agents = self.env.agents
self.agents_static = self.env.agents_static
self.dev_obs_dict = self.env.dev_obs_dict
#print(obs)
#return obs, rewards, dones, infos
# oo = dict()
# oo['agents'] = o
# o['global'] = np.ones((17, 17)) * 17
# o[0] = [o[0], np.ones((17, 17)) * 17]
# o['global_obs'] = np.ones((17, 17)) * 17
# r['global_obs'] = 0
# d['global_obs'] = True
return o, r, d, infos
def get_agent_handles(self):
return self.env.get_agent_handles()
def get_num_agents(self):
return self.env.get_num_agents()
from ray.rllib.models import ModelCatalog, Model
from ray.rllib.models.misc import normc_initializer
import tensorflow as tf
class ConvModelGlobalObs(Model):
def _build_layers_v2(self, input_dict, num_outputs, options):
"""Define the layers of a custom model.
Arguments:
input_dict (dict): Dictionary of input tensors, including "obs",
"prev_action", "prev_reward", "is_training".
num_outputs (int): Output tensor must be of size
[BATCH_SIZE, num_outputs].
options (dict): Model options.
Returns:
(outputs, feature_layer): Tensors of size [BATCH_SIZE, num_outputs]
and [BATCH_SIZE, desired_feature_size].
When using dict or tuple observation spaces, you can access
the nested sub-observation batches here as well:
Examples:
>>> print(input_dict)
{'prev_actions': <tf.Tensor shape=(?,) dtype=int64>,
'prev_rewards': <tf.Tensor shape=(?,) dtype=float32>,
'is_training': <tf.Tensor shape=(), dtype=bool>,
'obs': (observation, features)
"""
# Convolutional Layer #1
Relu = tf.nn.relu
BatchNormalization = tf.layers.batch_normalization
Dropout = tf.layers.dropout
Dense = tf.contrib.layers.fully_connected
map_size = int(input_dict['obs'][0].shape[0])
N_CHANNELS = 96
conv1 = Relu(self.conv2d(input_dict['obs'], N_CHANNELS, 'valid', strides=(2, 2)))
# conv2 = Relu(self.conv2d(conv1, 64, 'valid'))
# conv3 = Relu(self.conv2d(conv2, 64, 'valid'))
conv2_flat = tf.reshape(conv1, [-1, int(N_CHANNELS * ((map_size-3 + 1)/2)**2)])
# conv4_feature = tf.concat((conv2_flat, input_dict['obs'][1]), axis=1)
s_fc1 = Relu(Dense(conv2_flat, 256))
layerN_minus_1 = Relu(Dense(s_fc1, 64))
layerN = Dense(layerN_minus_1, num_outputs)
return layerN, layerN_minus_1
def conv2d(self, x, out_channels, padding, strides=(1,1)):
return tf.layers.conv2d(x, out_channels, kernel_size=[3, 3], padding=padding,
use_bias=True, strides=strides)
class LightModel(Model):
def _build_layers_v2(self, input_dict, num_outputs, options):
"""Define the layers of a custom model.
Arguments:
input_dict (dict): Dictionary of input tensors, including "obs",
"prev_action", "prev_reward", "is_training".
num_outputs (int): Output tensor must be of size
[BATCH_SIZE, num_outputs].
options (dict): Model options.
Returns:
(outputs, feature_layer): Tensors of size [BATCH_SIZE, num_outputs]
and [BATCH_SIZE, desired_feature_size].
When using dict or tuple observation spaces, you can access
the nested sub-observation batches here as well:
Examples:
>>> print(input_dict)
{'prev_actions': <tf.Tensor shape=(?,) dtype=int64>,
'prev_rewards': <tf.Tensor shape=(?,) dtype=float32>,
'is_training': <tf.Tensor shape=(), dtype=bool>,
'obs': (observation, features)
"""
# print(input_dict)
# Convolutional Layer #1
self.sess = tf.get_default_session()
Relu = tf.nn.relu
BatchNormalization = tf.layers.batch_normalization
Dropout = tf.layers.dropout
Dense = tf.contrib.layers.fully_connected
#conv1 = Relu(self.conv2d(input_dict['obs'][0], 32, 'valid'))
conv1 = Relu(self.conv2d(input_dict['obs'], 32, 'valid'))
conv2 = Relu(self.conv2d(conv1, 16, 'valid'))
# conv3 = Relu(self.conv2d(conv2, 64, 'valid'))
conv4_flat = tf.reshape(conv2, [-1, 16 * (17-2*2)**2])
#conv4_feature = tf.concat((conv4_flat, input_dict['obs'][1]), axis=1)
s_fc1 = Relu(Dense(conv4_flat, 128, weights_initializer=normc_initializer(1.0)))
# layerN_minus_1 = Relu(Dense(s_fc1, 256, use_bias=False))
layerN = Dense(s_fc1, num_outputs, weights_initializer=normc_initializer(0.01))
return layerN, s_fc1
def conv2d(self, x, out_channels, padding):
return tf.layers.conv2d(x, out_channels, kernel_size=[3, 3], padding=padding, use_bias=True)
# weights_initializer=normc_initializer(1.0))
import numpy as np
from ray.rllib.models.preprocessors import Preprocessor
def max_lt(seq, val):
"""
Return greatest item in seq for which item < val applies.
None is returned if seq was empty or all items in seq were >= val.
"""
max = 0
idx = len(seq) - 1
while idx >= 0:
if seq[idx] < val and seq[idx] >= 0 and seq[idx] > max:
max = seq[idx]
idx -= 1
return max
def min_lt(seq, val):
"""
Return smallest item in seq for which item > val applies.
None is returned if seq was empty or all items in seq were >= val.
"""
min = np.inf
idx = len(seq) - 1
while idx >= 0:
if seq[idx] > val and seq[idx] < min:
min = seq[idx]
idx -= 1
return min
def norm_obs_clip(obs, clip_min=-1, clip_max=1):
"""
This function returns the difference between min and max value of an observation
:param obs: Observation that should be normalized
:param clip_min: min value where observation will be clipped
:param clip_max: max value where observation will be clipped
:return: returns normalized and clipped observation
"""
max_obs = max(1, max_lt(obs, 1000))
min_obs = max(0, min_lt(obs, 0))
if max_obs == min_obs:
return np.clip(np.array(obs)/ max_obs, clip_min, clip_max)
norm = np.abs(max_obs - min_obs)
if norm == 0:
norm = 1.
return np.clip((np.array(obs)-min_obs)/ norm, clip_min, clip_max)
class CustomPreprocessor(Preprocessor):
def _init_shape(self, obs_space, options):
return (111,)
def transform(self, observation):
if len(observation) == 111:
return norm_obs_clip(observation)
else:
return observation
class ConvModelPreprocessor(Preprocessor):
def _init_shape(self, obs_space, options):
out_shape = (obs_space[0].shape[0], obs_space[0].shape[1], sum([space.shape[2] for space in obs_space]))
return out_shape
def transform(self, observation):
return np.concatenate([observation[0],
observation[1],
observation[2]], axis=2)
# class NoPreprocessor:
# def _init_shape(self, obs_space, options):
# num_features = 0
# for space in obs_space:
run_experiment.name = "observation_benchmark_results"
run_experiment.num_iterations = 1002
run_experiment.save_every = 50
run_experiment.hidden_sizes = [32, 32]
run_experiment.map_width = 20
run_experiment.map_height = 20
run_experiment.n_agents = 5
run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_{config[n_agents]}_agents_conv_model_{config[conv_model]}_"
run_experiment.horizon = 50
run_experiment.seed = 123
#run_experiment.conv_model = {"grid_search": [True, False]}
run_experiment.conv_model = False
#run_experiment.obs_builder = {"grid_search": [@GlobalObsForRailEnv(), @GlobalObsForRailEnvDirectionDependent]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
run_experiment.obs_builder = @TreeObsForRailEnv()
TreeObsForRailEnv.max_depth = 2
LocalObsForRailEnv.view_radius = 5
run_experiment.entropy_coeff = 0.01
run_experiment.name = "observation_benchmark_results"
run_experiment.num_iterations = 1002
run_experiment.save_every = 100
run_experiment.hidden_sizes = {"grid_search": [[32, 32], [64, 64], [128, 128], [256, 256]}
run_experiment.map_width = 20
run_experiment.map_height = 20
run_experiment.n_agents = 5
run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_entropy_coeff_{config[entropy_coeff]}_{config[hidden_sizes][0]}_hidden_sizes_"
run_experiment.horizon = 50
run_experiment.seed = 123
run_experiment.entropy_coeff = {"grid_search": [1e-3, 1e-2, 0]}
run_experiment.obs_builder = {"grid_search": [@LocalObsForRailEnv()]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
TreeObsForRailEnv.max_depth = 2
LocalObsForRailEnv.view_radius = 5
run_experiment.name = "observation_benchmark_results"
run_experiment.num_iterations = 1002
run_experiment.save_every = 100
run_experiment.hidden_sizes = [32,32]
run_experiment.map_width = 20
run_experiment.map_height = 20
run_experiment.n_agents = {"grid_search": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_entropy_coeff_{config[entropy_coeff]}_{config[n_agents]}_agents_"
run_experiment.horizon = 50
run_experiment.seed = 123
run_experiment.entropy_coeff = {"grid_search": [1e-3, 1e-2, 0]}
run_experiment.obs_builder = {"grid_search": [@TreeObsForRailEnv()]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
TreeObsForRailEnv.max_depth = 2
LocalObsForRailEnv.view_radius = 5
run_experiment.name = "observation_benchmark_results"
run_experiment.num_iterations = 1002
run_experiment.save_every = 100
run_experiment.hidden_sizes = [32, 32]
run_experiment.map_width = 20
run_experiment.map_height = 20
run_experiment.n_agents = 5
run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_{config[n_agents]}_agents"
run_experiment.horizon = 50
run_experiment.seed = 123
run_experiment.obs_builder = {"grid_search": [@LocalObsForRailEnv()]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
TreeObsForRailEnv.max_depth = 2
LocalObsForRailEnv.view_radius = 5
run_experiment.name = "observation_benchmark_loaded_env_results"
run_experiment.num_iterations = 1002
run_experiment.save_every = 50
run_experiment.hidden_sizes = [32, 32]
run_experiment.map_width = 20
run_experiment.map_height = 20
run_experiment.n_agents = 5
run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}"#_entropy_coeff_{config[entropy_coeff]}_{config[hidden_sizes][0]}_hidden_sizes_"
run_experiment.horizon = 50
run_experiment.seed = 123
run_experiment.conv_model = False
run_experiment.entropy_coeff = 1e-2
run_experiment.obs_builder = @TreeObsForRailEnv()#{"grid_search": [@LocalObsForRailEnv(), @TreeObsForRailEnv(), @GlobalObsForRailEnv(), @GlobalObsForRailEnvDirectionDependent()]}
TreeObsForRailEnv.max_depth = 2
LocalObsForRailEnv.view_radius = 5
from baselines.RLLib_training.RailEnvRLLibWrapper import RailEnvRLLibWrapper
import gym
from flatland.envs.generators import complex_rail_generator
# Import PPO trainer: we can replace these imports by any other trainer from RLLib.
from ray.rllib.agents.ppo.ppo import DEFAULT_CONFIG
from ray.rllib.agents.ppo.ppo import PPOTrainer as Trainer
# from baselines.CustomPPOTrainer import PPOTrainer as Trainer
from ray.rllib.agents.ppo.ppo_policy_graph import PPOPolicyGraph as PolicyGraph
# from baselines.CustomPPOPolicyGraph import CustomPPOPolicyGraph as PolicyGraph
from ray.rllib.models import ModelCatalog
from ray.tune.logger import pretty_print
from baselines.RLLib_training.custom_preprocessors import CustomPreprocessor, ConvModelPreprocessor
from baselines.RLLib_training.custom_models import ConvModelGlobalObs
import ray
import numpy as np
from ray.tune.logger import UnifiedLogger
import tempfile
import gin
from ray import tune
from ray.rllib.utils.seed import seed as set_seed
from flatland.envs.observations import TreeObsForRailEnv, GlobalObsForRailEnv,\
LocalObsForRailEnv, GlobalObsForRailEnvDirectionDependent
from flatland.utils.rendertools import RenderTool
import time
gin.external_configurable(TreeObsForRailEnv)
gin.external_configurable(GlobalObsForRailEnv)
gin.external_configurable(LocalObsForRailEnv)
gin.external_configurable(GlobalObsForRailEnvDirectionDependent)
from ray.rllib.models.preprocessors import TupleFlatteningPreprocessor
ModelCatalog.register_custom_preprocessor("tree_obs_prep", CustomPreprocessor)
ModelCatalog.register_custom_preprocessor("global_obs_prep", TupleFlatteningPreprocessor)
ModelCatalog.register_custom_preprocessor("conv_obs_prep", ConvModelPreprocessor)
ModelCatalog.register_custom_model("conv_model", ConvModelGlobalObs)
ray.init()#object_store_memory=150000000000, redis_max_memory=30000000000)
CHECKPOINT_PATH = '/home/guillaume/EPFL/Master_Thesis/flatland/baselines/RLLib_training/experiment_configs/' \
'conv_model_test/ppo_policy_TreeObsForRailEnv_5_agents_conv_model_False_ial1g3w9/checkpoint_51/checkpoint-51'
N_EPISODES = 3
N_STEPS_PER_EPISODE = 50
def render_training_result(config):
print('Init Env')
set_seed(config['seed'], config['seed'], config['seed'])
transition_probability = [15, # empty cell - Case 0
5, # Case 1 - straight
5, # Case 2 - simple switch
1, # Case 3 - diamond crossing
1, # Case 4 - single slip
1, # Case 5 - double slip
1, # Case 6 - symmetrical
0, # Case 7 - dead end
1, # Case 1b (8) - simple turn right
1, # Case 1c (9) - simple turn left
1] # Case 2b (10) - simple switch mirrored
# Example configuration to generate a random rail
env_config = {"width": config['map_width'],
"height": config['map_height'],
"rail_generator": complex_rail_generator,
"number_of_agents": config['n_agents'],
"seed": config['seed'],
"obs_builder": config['obs_builder']}
# Observation space and action space definitions
if isinstance(config["obs_builder"], TreeObsForRailEnv):
obs_space = gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(105,))
preprocessor = "tree_obs_prep"
elif isinstance(config["obs_builder"], GlobalObsForRailEnv):
obs_space = gym.spaces.Tuple((
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 8)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
if config['conv_model']:
preprocessor = "conv_obs_prep"
else:
preprocessor = "global_obs_prep"
elif isinstance(config["obs_builder"], GlobalObsForRailEnvDirectionDependent):
obs_space = gym.spaces.Tuple((
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 5)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
if config['conv_model']:
preprocessor = "conv_obs_prep"
else:
preprocessor = "global_obs_prep"
elif isinstance(config["obs_builder"], LocalObsForRailEnv):
view_radius = config["obs_builder"].view_radius
obs_space = gym.spaces.Tuple((
gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 16)),
gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 2)),
gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 4)),
gym.spaces.Box(low=0, high=1, shape=(4,))))
preprocessor = "global_obs_prep"
else:
raise ValueError("Undefined observation space")
act_space = gym.spaces.Discrete(4)
# Dict with the different policies to train
policy_graphs = {
config['policy_folder_name'].format(**locals()): (PolicyGraph, obs_space, act_space, {})
}
def policy_mapping_fn(agent_id):
return config['policy_folder_name'].format(**locals())
# Trainer configuration
trainer_config = DEFAULT_CONFIG.copy()
if config['conv_model']:
trainer_config['model'] = {"custom_model": "conv_model", "custom_preprocessor": preprocessor}
else:
trainer_config['model'] = {"fcnet_hiddens": config['hidden_sizes'], "custom_preprocessor": preprocessor}
trainer_config['multiagent'] = {"policy_graphs": policy_graphs,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": list(policy_graphs.keys())}
trainer_config["horizon"] = config['horizon']
trainer_config["num_workers"] = 0
trainer_config["num_cpus_per_worker"] = 3
trainer_config["num_gpus"] = 0
trainer_config["num_gpus_per_worker"] = 0
trainer_config["num_cpus_for_driver"] = 1
trainer_config["num_envs_per_worker"] = 1
trainer_config['entropy_coeff'] = config['entropy_coeff']
trainer_config["env_config"] = env_config
trainer_config["batch_mode"] = "complete_episodes"
trainer_config['simple_optimizer'] = False
trainer_config['postprocess_inputs'] = True
trainer_config['log_level'] = 'WARN'
env = RailEnvRLLibWrapper(env_config)
trainer = Trainer(env=RailEnvRLLibWrapper, config=trainer_config)
trainer.restore(CHECKPOINT_PATH)
policy = trainer.get_policy(config['policy_folder_name'].format(**locals()))
env_renderer = RenderTool(env, gl="PIL", show=True)
for episode in range(N_EPISODES):
observation = env.reset()
for i in range(N_STEPS_PER_EPISODE):
action, _, infos = policy.compute_actions(list(observation.values()), [])
env_renderer.renderEnv(show=True, frames=True, iEpisode=episode, iStep=i,
action_dict=action)
logits = infos['behaviour_logits']
actions = dict()
for j, logit in enumerate(logits):
actions[j] = np.argmax(logit)
time.sleep(1)
observation, _, _, _ = env.step(action)
env_renderer.close_window()
@gin.configurable
def run_experiment(name, num_iterations, n_agents, hidden_sizes, save_every,
map_width, map_height, horizon, policy_folder_name, local_dir, obs_builder,
entropy_coeff, seed, conv_model):
render_training_result(
config={"n_agents": n_agents,
"hidden_sizes": hidden_sizes, # Array containing the sizes of the network layers
"save_every": save_every,
"map_width": map_width,
"map_height": map_height,
"local_dir": local_dir,
"horizon": horizon, # Max number of time steps
'policy_folder_name': policy_folder_name,
"obs_builder": obs_builder,
"entropy_coeff": entropy_coeff,
"seed": seed,
"conv_model": conv_model
})
if __name__ == '__main__':
gin.external_configurable(tune.grid_search)
dir = '/home/guillaume/EPFL/Master_Thesis/flatland/baselines/RLLib_training/experiment_configs/conv_model_test' # To Modify
gin.parse_config_file(dir + '/config.gin')
run_experiment(local_dir=dir)
from flatland.envs import rail_env
from flatland.envs.rail_env import random_rail_generator
from baselines.RailEnvRLLibWrapper import RailEnvRLLibWrapper
from flatland.utils.rendertools import RenderTool
import random
import gym
import matplotlib.pyplot as plt
from flatland.envs.generators import complex_rail_generator
import ray.rllib.agents.ppo.ppo as ppo
import ray.rllib.agents.dqn.dqn as dqn
from ray.rllib.agents.ppo.ppo import PPOTrainer
from ray.rllib.agents.dqn.dqn import DQNTrainer
from ray.rllib.agents.ppo.ppo_policy_graph import PPOPolicyGraph
from ray.rllib.agents.dqn.dqn_policy_graph import DQNPolicyGraph
from ray.tune.registry import register_env
from ray.rllib.models import ModelCatalog
from ray.tune.logger import pretty_print
from baselines.CustomPreprocessor import CustomPreprocessor
import ray
import numpy as np
from ray.rllib.env.multi_agent_env import MultiAgentEnv
# RailEnv.__bases__ = (RailEnv.__bases__[0], MultiAgentEnv)
ModelCatalog.register_custom_preprocessor("my_prep", CustomPreprocessor)
ray.init()
def train(config):
print('Init Env')
random.seed(1)
np.random.seed(1)
transition_probability = [15, # empty cell - Case 0
5, # Case 1 - straight
5, # Case 2 - simple switch
1, # Case 3 - diamond crossing
1, # Case 4 - single slip
1, # Case 5 - double slip
1, # Case 6 - symmetrical
0, # Case 7 - dead end
1, # Case 1b (8) - simple turn right
1, # Case 1c (9) - simple turn left
1] # Case 2b (10) - simple switch mirrored
# Example generate a random rail
"""
env = RailEnv(width=10,
height=10,
rail_generator=random_rail_generator(cell_type_relative_proportion=transition_probability),
number_of_agents=1)
"""
env_config = {"width": 20,
"height":20,
"rail_generator":complex_rail_generator(nr_start_goal=5, min_dist=5, max_dist=99999, seed=0),
"number_of_agents":5}
"""
env = RailEnv(width=20,
height=20,
rail_generator=rail_from_list_of_saved_GridTransitionMap_generator(
['../notebooks/temp.npy']),
number_of_agents=3)
"""
# if config['render']:
# env_renderer = RenderTool(env, gl="QT")
# plt.figure(figsize=(5,5))
obs_space = gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(105,))
act_space = gym.spaces.Discrete(4)
# Dict with the different policies to train
policy_graphs = {
"ppo_policy": (PPOPolicyGraph, obs_space, act_space, {})
}
def policy_mapping_fn(agent_id):
return f"ppo_policy"
agent_config = ppo.DEFAULT_CONFIG.copy()
agent_config['model'] = {"fcnet_hiddens": [32, 32], "custom_preprocessor": "my_prep"}
agent_config['multiagent'] = {"policy_graphs": policy_graphs,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": list(policy_graphs.keys())}
agent_config["horizon"] = 50
agent_config["num_workers"] = 0
# agent_config["sample_batch_size"]: 1000
#agent_config["num_cpus_per_worker"] = 40
#agent_config["num_gpus"] = 2.0
#agent_config["num_gpus_per_worker"] = 2.0
#agent_config["num_cpus_for_driver"] = 5
#agent_config["num_envs_per_worker"] = 15
agent_config["env_config"] = env_config
#agent_config["batch_mode"] = "complete_episodes"
ppo_trainer = PPOTrainer(env=RailEnvRLLibWrapper, config=agent_config)
for i in range(100000 + 2):
print("== Iteration", i, "==")
print("-- PPO --")
print(pretty_print(ppo_trainer.train()))
# if i % config['save_every'] == 0:
# checkpoint = ppo_trainer.save()
# print("checkpoint saved at", checkpoint)
train({})
from baselines.RLLib_training.RailEnvRLLibWrapper import RailEnvRLLibWrapper
import gym
from flatland.envs.generators import complex_rail_generator
# Import PPO trainer: we can replace these imports by any other trainer from RLLib.
from ray.rllib.agents.ppo.ppo import DEFAULT_CONFIG
from ray.rllib.agents.ppo.ppo import PPOTrainer as Trainer
# from baselines.CustomPPOTrainer import PPOTrainer as Trainer
from ray.rllib.agents.ppo.ppo_policy_graph import PPOPolicyGraph as PolicyGraph
# from baselines.CustomPPOPolicyGraph import CustomPPOPolicyGraph as PolicyGraph
from ray.rllib.models import ModelCatalog
from ray.tune.logger import pretty_print
from baselines.RLLib_training.custom_preprocessors import CustomPreprocessor, ConvModelPreprocessor
from baselines.RLLib_training.custom_models import ConvModelGlobalObs
import ray
import numpy as np
from ray.tune.logger import UnifiedLogger
import tempfile
import gin
from ray import tune
from ray.rllib.utils.seed import seed as set_seed
from flatland.envs.observations import TreeObsForRailEnv, GlobalObsForRailEnv,\
LocalObsForRailEnv, GlobalObsForRailEnvDirectionDependent
gin.external_configurable(TreeObsForRailEnv)
gin.external_configurable(GlobalObsForRailEnv)
gin.external_configurable(LocalObsForRailEnv)
gin.external_configurable(GlobalObsForRailEnvDirectionDependent)
from ray.rllib.models.preprocessors import TupleFlatteningPreprocessor
ModelCatalog.register_custom_preprocessor("tree_obs_prep", CustomPreprocessor)
ModelCatalog.register_custom_preprocessor("global_obs_prep", TupleFlatteningPreprocessor)
ModelCatalog.register_custom_preprocessor("conv_obs_prep", ConvModelPreprocessor)
ModelCatalog.register_custom_model("conv_model", ConvModelGlobalObs)
ray.init()#object_store_memory=150000000000, redis_max_memory=30000000000)
def train(config, reporter):
print('Init Env')
set_seed(config['seed'], config['seed'], config['seed'])
config['map_width']= 20
config['map_height']= 10
config['n_agents'] = 8
# Example configuration to generate a random rail
env_config = {"width": config['map_width'],
"height": config['map_height'],
"rail_generator": complex_rail_generator,
"number_of_agents": config['n_agents'],
"seed": config['seed'],
"obs_builder": config['obs_builder']}
# Observation space and action space definitions
if isinstance(config["obs_builder"], TreeObsForRailEnv):
obs_space = gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(111,))
preprocessor = "tree_obs_prep"
elif isinstance(config["obs_builder"], GlobalObsForRailEnv):
obs_space = gym.spaces.Tuple((
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 8)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
if config['conv_model']:
preprocessor = "conv_obs_prep"
else:
preprocessor = "global_obs_prep"
elif isinstance(config["obs_builder"], GlobalObsForRailEnvDirectionDependent):
obs_space = gym.spaces.Tuple((
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 5)),
gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
if config['conv_model']:
preprocessor = "conv_obs_prep"
else:
preprocessor = "global_obs_prep"
elif isinstance(config["obs_builder"], LocalObsForRailEnv):
view_radius = config["obs_builder"].view_radius
obs_space = gym.spaces.Tuple((
gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 16)),
gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 2)),
gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 4)),
gym.spaces.Box(low=0, high=1, shape=(4,))))
preprocessor = "global_obs_prep"
else:
raise ValueError("Undefined observation space")
act_space = gym.spaces.Discrete(4)
# Dict with the different policies to train
policy_graphs = {
config['policy_folder_name'].format(**locals()): (PolicyGraph, obs_space, act_space, {})
}
def policy_mapping_fn(agent_id):
return config['policy_folder_name'].format(**locals())
# Trainer configuration
trainer_config = DEFAULT_CONFIG.copy()
if config['conv_model']:
trainer_config['model'] = {"custom_model": "conv_model", "custom_preprocessor": preprocessor}
else:
trainer_config['model'] = {"fcnet_hiddens": config['hidden_sizes'], "custom_preprocessor": preprocessor}
trainer_config['multiagent'] = {"policy_graphs": policy_graphs,
"policy_mapping_fn": policy_mapping_fn,
"policies_to_train": list(policy_graphs.keys())}
trainer_config["horizon"] = config['horizon']
trainer_config["num_workers"] = 0
trainer_config["num_cpus_per_worker"] = 3
trainer_config["num_gpus"] = 0
trainer_config["num_gpus_per_worker"] = 0
trainer_config["num_cpus_for_driver"] = 1
trainer_config["num_envs_per_worker"] = 1
trainer_config['entropy_coeff'] = config['entropy_coeff']
trainer_config["env_config"] = env_config
trainer_config["batch_mode"] = "complete_episodes"
trainer_config['simple_optimizer'] = False
trainer_config['postprocess_inputs'] = True
trainer_config['log_level'] = 'WARN'
def logger_creator(conf):
"""Creates a Unified logger with a default logdir prefix
containing the agent name and the env id
"""
logdir = config['policy_folder_name'].format(**locals())
logdir = tempfile.mkdtemp(
prefix=logdir, dir=config['local_dir'])
return UnifiedLogger(conf, logdir, None)
logger = logger_creator
trainer = Trainer(env=RailEnvRLLibWrapper, config=trainer_config, logger_creator=logger)
for i in range(100000 + 2):
print("== Iteration", i, "==")
print(pretty_print(trainer.train()))
if i % config['save_every'] == 0:
checkpoint = trainer.save()
print("checkpoint saved at", checkpoint)
reporter(num_iterations_trained=trainer._iteration)
@gin.configurable
def run_experiment(name, num_iterations, n_agents, hidden_sizes, save_every,
map_width, map_height, horizon, policy_folder_name, local_dir, obs_builder,
entropy_coeff, seed, conv_model):
tune.run(
train,
name=name,
stop={"num_iterations_trained": num_iterations},
config={"n_agents": n_agents,
"hidden_sizes": hidden_sizes, # Array containing the sizes of the network layers
"save_every": save_every,
"map_width": map_width,
"map_height": map_height,
"local_dir": local_dir,
"horizon": horizon, # Max number of time steps
'policy_folder_name': policy_folder_name,
"obs_builder": obs_builder,
"entropy_coeff": entropy_coeff,
"seed": seed,
"conv_model": conv_model
},
resources_per_trial={
"cpu": 2,
"gpu": 0.0
},
local_dir=local_dir
)
if __name__ == '__main__':
gin.external_configurable(tune.grid_search)
dir = '/mount/SDC/flatland/baselines/RLLib_training/experiment_configs/observation_benchmark_loaded_env' # To Modify
gin.parse_config_file(dir + '/config.gin')
run_experiment(local_dir=dir)
{'Test_0':[20,20,20,3],
'Test_1':[10,10,3,4321],
'Test_2':[10,10,5,123],
'Test_3':[50,50,5,21],
'Test_4':[50,50,20,85],
'Test_5':[100,100,5,436],
'Test_6':[100,100,20,6487],
'Test_7':[100,100,50,567],
'Test_8':[100,10,20,3245],
'Test_9':[10,100,20,632]
}
\ No newline at end of file
#ray==0.7.0
gym ==0.12.5
opencv-python==4.1.0.25
#tensorflow==1.13.1
lz4==2.1.10
gin-config==0.1.4
\ No newline at end of file
git+https://gitlab.aicrowd.com/flatland/flatland.git
importlib-metadata>=0.17
importlib_resources>=1.0.2
torch>=1.1.0
\ No newline at end of file