Compare revisions

c93e9d56 · 7d37fb3a · d5376a20 · 366b4fe5 · 85f9e04b · 540dce24
--- a/.gitignore
+++ b/.gitignore
 *pycache*
+*ppo_policy*
+
+torch_training/Nets/
--- a/.idea/.gitignore
+++ b/.idea/.gitignore
+# Default ignored files
+/workspace.xml
\ No newline at end of file
--- a/LICENSE
+++ b/LICENSE
+MIT License
+
+Copyright (c) 2019 SBB AG and AIcrowd
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/MANIFEST.in
+++ b/MANIFEST.in
+include AUTHORS.md
+include CONTRIBUTING.rst
+include changelog.md
+include LICENSE
+include README.md
+include requirements_torch_training.txt
+
+
+
+recursive-include tests *
+recursive-exclude * __pycache__
+recursive-exclude * *.py[co]
+
+recursive-include docs *.rst *.md conf.py *.jpg *.png *.gif
--- a/README.md
+++ b/README.md
+
+# ⚠️ Deprecated repository
+
+This repository is deprecated! Please go to:
+
+#### **https://gitlab.aicrowd.com/flatland/flatland-examples**
+
+
+## Torch Training
+The `torch_training` folder shows an example of how to train agents with a DQN implemented in pytorch.
+In the links below you find introductions to training an agent on Flatland:
+
+- Training an agent for navigation ([Introduction](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/Getting_Started_Training.md))
+- Training multiple agents to avoid conflicts ([Introduction](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/Multi_Agent_Training_Intro.md)) 
+
+Use this introductions to get used to the Flatland environment. Then build your own predictors, observations and agents to improve the performance even more and solve the most complex environments of the challenge.
+
+With the above introductions you will solve tasks like these and even more...
+
+![Conflict_Avoidance](https://i.imgur.com/AvBHKaD.gif)
+
+
+## Sequential Agent
+This is a very simple baseline to show you have the `complex_level_generator` generates feasible network configurations.
+If you run the `run_test.py` file you will see a simple agent that solves the level by sequentially running each agent along its shortest path.
+This is very innefficient but it solves all the instances generated by `complex_level_generator`. However when being scored for the AIcrowd competition, this agent fails due to the duration it needs to solve an episode.
+
+Here you see it in action:
+
+![Sequential_Agent](https://i.imgur.com/DsbG6zK.gif)
\ No newline at end of file
--- a/RLLib_training/README.md
+++ b/RLLib_training/README.md
-This repository allows to run Rail Environment multi agent training with the RLLib Library.
-
-It should be clone inside the main flatland repository.
-
-## Installation:
-```sh
-pip install ray
-pip install gin-config
-```
-
-To start a grid search on some parameters, you can create a folder containing a config.gin file (see example in `grid_search_configs/n_agents_grid_search/config.gin`.
-
-Then, you can modify the config.gin file path at the end of the `grid_search_train.py` file.
-
-The results will be stored inside the folder, and the learning curves can be visualized in 
-tensorboard:
-
-```
-tensorboard --logdir=/path/to/foler_containing_config_gin_file
-```
-
-## Gin config files
-
-In each config.gin files, all the parameters, except `local_dir` of the `run_experiment` functions have to be specified.
-For example, to indicate the number of agents that have to be initialized at the beginning of each simulation, the following line should be added:
-
-```
-run_experiment.n_agents = 2
-```
-
-If several number of agents have to be explored during the experiment, one can pass the following value to the `n_agents` parameter:
-
-```
-run_experiment.n_agents = {"grid_search": [2,5]}
-```
-
-which is the way to indicate to the tune library to experiment several values for a parameter.
-
-To reference a class or an object within gin, you should first register it from the `train_experiment.py` script adding the following line:
-
-```
-gin.external_configurable(TreeObsForRailEnv)
-```
-
-and then a `TreeObsForRailEnv` object can be referenced in the `config.gin` file:
-
-```
-run_experiment.obs_builder = {"grid_search": [@TreeObsForRailEnv(), @GlobalObsForRailEnv()]}
-TreeObsForRailEnv.max_depth = 2
-```
-
-Note that `@TreeObsForRailEnv` references the class, while `@TreeObsForRailEnv()` references instantiates an object of this class.
-
-
-
-
-More documentation on how to use gin-config can be found on the library github repository: https://github.com/google/gin-config
--- a/RLLib_training/RailEnvRLLibWrapper.py
+++ b/RLLib_training/RailEnvRLLibWrapper.py
-from flatland.envs.rail_env import RailEnv
-from ray.rllib.env.multi_agent_env import MultiAgentEnv
-from flatland.envs.observations import TreeObsForRailEnv
-from flatland.envs.generators import random_rail_generator
-from ray.rllib.utils.seed import seed as set_seed
-import numpy as np
-
-
-class RailEnvRLLibWrapper(MultiAgentEnv):
-
-    def __init__(self, config):
-                 # width,
-                 # height,
-                 # rail_generator=random_rail_generator(),
-                 # number_of_agents=1,
-                 # obs_builder_object=TreeObsForRailEnv(max_depth=2)):
-        super(MultiAgentEnv, self).__init__()
-        if hasattr(config, "vector_index"):
-            vector_index = config.vector_index
-        else:
-            vector_index = 1
-        #self.rail_generator = config["rail_generator"](nr_start_goal=config['number_of_agents'], min_dist=5,
-         #                                              nr_extra=30, seed=config['seed'] * (1+vector_index))
-        set_seed(config['seed'] * (1+vector_index))
-        #self.env = RailEnv(width=config["width"], height=config["height"],
-        self.env = RailEnv(width=10, height=20,
-                number_of_agents=config["number_of_agents"], obs_builder_object=config['obs_builder'])
-
-        self.env.load('/mount/SDC/flatland/baselines/torch_training/railway/complex_scene.pkl')
-
-        self.width = self.env.width
-        self.height = self.env.height
-
-
-    
-    def reset(self):
-        self.agents_done = []
-        obs = self.env.reset(False, False)
-        o = dict()
-        # o['agents'] = obs
-        # obs[0] = [obs[0], np.ones((17, 17)) * 17]
-        # obs['global_obs'] = np.ones((17, 17)) * 17
-
-
-        self.rail = self.env.rail
-        self.agents = self.env.agents
-        self.agents_static = self.env.agents_static
-        self.dev_obs_dict = self.env.dev_obs_dict
-        return obs
-
-    def step(self, action_dict):
-        obs, rewards, dones, infos = self.env.step(action_dict)
-        # print(obs)
-
-        d = dict()
-        r = dict()
-        o = dict()
-        # print(self.agents_done)
-        # print(dones)
-        for agent, done in dones.items():
-            if agent not in self.agents_done:
-                if agent != '__all__':
-                    o[agent] = obs[agent]
-                    r[agent] = rewards[agent]
-    
-                d[agent] = dones[agent]
-
-        for agent, done in dones.items():
-            if done and agent != '__all__':
-                self.agents_done.append(agent)
-
-        self.rail = self.env.rail
-        self.agents = self.env.agents
-        self.agents_static = self.env.agents_static
-        self.dev_obs_dict = self.env.dev_obs_dict
-        
-        #print(obs)
-        #return obs, rewards, dones, infos
-        # oo = dict()
-        # oo['agents'] = o
-        # o['global'] = np.ones((17, 17)) * 17
-
-        # o[0] = [o[0], np.ones((17, 17)) * 17]
-        # o['global_obs'] = np.ones((17, 17)) * 17
-        # r['global_obs'] = 0
-        # d['global_obs'] = True
-        return o, r, d, infos
-
-    def get_agent_handles(self):
-        return self.env.get_agent_handles()
-
-    def get_num_agents(self):
-        return self.env.get_num_agents()
--- a/RLLib_training/custom_models.py
+++ b/RLLib_training/custom_models.py
-from ray.rllib.models import ModelCatalog, Model
-from ray.rllib.models.misc import normc_initializer
-
-import tensorflow as tf
-
-
-class ConvModelGlobalObs(Model):
-    def _build_layers_v2(self, input_dict, num_outputs, options):
-        """Define the layers of a custom model.
-        Arguments:
-            input_dict (dict): Dictionary of input tensors, including "obs",
-                "prev_action", "prev_reward", "is_training".
-            num_outputs (int): Output tensor must be of size
-                [BATCH_SIZE, num_outputs].
-            options (dict): Model options.
-        Returns:
-            (outputs, feature_layer): Tensors of size [BATCH_SIZE, num_outputs]
-                and [BATCH_SIZE, desired_feature_size].
-        When using dict or tuple observation spaces, you can access
-        the nested sub-observation batches here as well:
-        Examples:
-            >>> print(input_dict)
-            {'prev_actions': <tf.Tensor shape=(?,) dtype=int64>,
-             'prev_rewards': <tf.Tensor shape=(?,) dtype=float32>,
-             'is_training': <tf.Tensor shape=(), dtype=bool>,
-             'obs': (observation, features)
-        """
-        # Convolutional Layer #1
-
-        Relu = tf.nn.relu
-        BatchNormalization = tf.layers.batch_normalization
-        Dropout = tf.layers.dropout
-        Dense = tf.contrib.layers.fully_connected
-
-        map_size = int(input_dict['obs'][0].shape[0])
-
-        N_CHANNELS = 96
-
-        conv1 = Relu(self.conv2d(input_dict['obs'], N_CHANNELS, 'valid', strides=(2, 2)))
-
-        # conv2 = Relu(self.conv2d(conv1, 64, 'valid'))
-
-        # conv3 = Relu(self.conv2d(conv2, 64, 'valid'))
-
-        conv2_flat = tf.reshape(conv1, [-1, int(N_CHANNELS * ((map_size-3 + 1)/2)**2)])
-        # conv4_feature = tf.concat((conv2_flat, input_dict['obs'][1]), axis=1)
-        s_fc1 = Relu(Dense(conv2_flat, 256))
-        layerN_minus_1 = Relu(Dense(s_fc1, 64))
-        layerN = Dense(layerN_minus_1, num_outputs)
-        return layerN, layerN_minus_1
-
-    def conv2d(self, x, out_channels, padding, strides=(1,1)):
-        return tf.layers.conv2d(x, out_channels, kernel_size=[3, 3], padding=padding,
-                                use_bias=True, strides=strides)
-
-
-class LightModel(Model):
-    def _build_layers_v2(self, input_dict, num_outputs, options):
-        """Define the layers of a custom model.
-        Arguments:
-            input_dict (dict): Dictionary of input tensors, including "obs",
-                "prev_action", "prev_reward", "is_training".
-            num_outputs (int): Output tensor must be of size
-                [BATCH_SIZE, num_outputs].
-            options (dict): Model options.
-        Returns:
-            (outputs, feature_layer): Tensors of size [BATCH_SIZE, num_outputs]
-                and [BATCH_SIZE, desired_feature_size].
-        When using dict or tuple observation spaces, you can access
-        the nested sub-observation batches here as well:
-        Examples:
-            >>> print(input_dict)
-            {'prev_actions': <tf.Tensor shape=(?,) dtype=int64>,
-             'prev_rewards': <tf.Tensor shape=(?,) dtype=float32>,
-             'is_training': <tf.Tensor shape=(), dtype=bool>,
-             'obs': (observation, features)
-        """
-        # print(input_dict)
-        # Convolutional Layer #1
-        self.sess = tf.get_default_session()
-        Relu = tf.nn.relu
-        BatchNormalization = tf.layers.batch_normalization
-        Dropout = tf.layers.dropout
-        Dense = tf.contrib.layers.fully_connected
-
-        #conv1 = Relu(self.conv2d(input_dict['obs'][0], 32, 'valid'))
-        conv1 = Relu(self.conv2d(input_dict['obs'], 32, 'valid'))
-        conv2 = Relu(self.conv2d(conv1, 16, 'valid'))
-
-        # conv3 = Relu(self.conv2d(conv2, 64, 'valid'))
-
-        conv4_flat = tf.reshape(conv2, [-1, 16 * (17-2*2)**2])
-        #conv4_feature = tf.concat((conv4_flat, input_dict['obs'][1]), axis=1)
-        s_fc1 = Relu(Dense(conv4_flat, 128, weights_initializer=normc_initializer(1.0)))
-        # layerN_minus_1 = Relu(Dense(s_fc1, 256, use_bias=False))
-        layerN = Dense(s_fc1, num_outputs, weights_initializer=normc_initializer(0.01))
-        return layerN, s_fc1
-
-    def conv2d(self, x, out_channels, padding):
-        return tf.layers.conv2d(x, out_channels, kernel_size=[3, 3], padding=padding, use_bias=True)
-                                # weights_initializer=normc_initializer(1.0))
--- a/RLLib_training/custom_preprocessors.py
+++ b/RLLib_training/custom_preprocessors.py
-import numpy as np
-from ray.rllib.models.preprocessors import Preprocessor
-
-
-def max_lt(seq, val):
-    """
-    Return greatest item in seq for which item < val applies.
-    None is returned if seq was empty or all items in seq were >= val.
-    """
-    max = 0
-    idx = len(seq) - 1
-    while idx >= 0:
-        if seq[idx] < val and seq[idx] >= 0 and seq[idx] > max:
-            max = seq[idx]
-        idx -= 1
-    return max
-
-
-def min_lt(seq, val):
-    """
-    Return smallest item in seq for which item > val applies.
-    None is returned if seq was empty or all items in seq were >= val.
-    """
-    min = np.inf
-    idx = len(seq) - 1
-    while idx >= 0:
-        if seq[idx] > val and seq[idx] < min:
-            min = seq[idx]
-        idx -= 1
-    return min
-
-
-def norm_obs_clip(obs, clip_min=-1, clip_max=1):
-    """
-    This function returns the difference between min and max value of an observation
-    :param obs: Observation that should be normalized
-    :param clip_min: min value where observation will be clipped
-    :param clip_max: max value where observation will be clipped
-    :return: returns normalized and clipped observation
-    """
-    max_obs = max(1, max_lt(obs, 1000))
-    min_obs = max(0, min_lt(obs, 0))
-    if max_obs == min_obs:
-        return np.clip(np.array(obs)/ max_obs, clip_min, clip_max)
-    norm = np.abs(max_obs - min_obs)
-    if norm == 0:
-        norm = 1.
-    return np.clip((np.array(obs)-min_obs)/ norm, clip_min, clip_max)
-
-
-class CustomPreprocessor(Preprocessor):
-    def _init_shape(self, obs_space, options):
-        return (111,)
-
-    def transform(self, observation):
-        if len(observation) == 111:
-            return norm_obs_clip(observation)
-        else:
-            return observation
-
-
-class ConvModelPreprocessor(Preprocessor):
-    def _init_shape(self, obs_space, options):
-        out_shape = (obs_space[0].shape[0], obs_space[0].shape[1], sum([space.shape[2] for space in obs_space]))
-        return out_shape
-
-    def transform(self, observation):
-        return np.concatenate([observation[0],
-                               observation[1],
-                               observation[2]], axis=2)
-
-
-
-# class NoPreprocessor:
-#     def _init_shape(self, obs_space, options):
-#         num_features = 0
-#         for space in obs_space:
-
--- a/RLLib_training/experiment_configs/conv_model_test/config.gin
+++ b/RLLib_training/experiment_configs/conv_model_test/config.gin
-run_experiment.name = "observation_benchmark_results"
-run_experiment.num_iterations = 1002
-run_experiment.save_every = 50
-run_experiment.hidden_sizes = [32, 32]
-
-run_experiment.map_width = 20
-run_experiment.map_height = 20
-run_experiment.n_agents = 5
-run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_{config[n_agents]}_agents_conv_model_{config[conv_model]}_"
-
-run_experiment.horizon = 50
-run_experiment.seed = 123
-
-#run_experiment.conv_model = {"grid_search": [True, False]}
-run_experiment.conv_model = False
-
-#run_experiment.obs_builder = {"grid_search": [@GlobalObsForRailEnv(), @GlobalObsForRailEnvDirectionDependent]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
-run_experiment.obs_builder = @TreeObsForRailEnv()
-TreeObsForRailEnv.max_depth = 2
-LocalObsForRailEnv.view_radius = 5
-
-run_experiment.entropy_coeff = 0.01
-
--- a/RLLib_training/experiment_configs/entropy_coeff_benchmark/config.gin
+++ b/RLLib_training/experiment_configs/entropy_coeff_benchmark/config.gin
-run_experiment.name = "observation_benchmark_results"
-run_experiment.num_iterations = 1002
-run_experiment.save_every = 100
-run_experiment.hidden_sizes = {"grid_search": [[32, 32], [64, 64], [128, 128], [256, 256]}
-
-run_experiment.map_width = 20
-run_experiment.map_height = 20
-run_experiment.n_agents = 5
-run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_entropy_coeff_{config[entropy_coeff]}_{config[hidden_sizes][0]}_hidden_sizes_"
-
-run_experiment.horizon = 50
-run_experiment.seed = 123
-
-run_experiment.entropy_coeff = {"grid_search": [1e-3, 1e-2, 0]}
-
-run_experiment.obs_builder = {"grid_search": [@LocalObsForRailEnv()]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
-TreeObsForRailEnv.max_depth = 2
-LocalObsForRailEnv.view_radius = 5
-
--- a/RLLib_training/experiment_configs/n_agents_experiment/config.gin
+++ b/RLLib_training/experiment_configs/n_agents_experiment/config.gin
-run_experiment.name = "observation_benchmark_results"
-run_experiment.num_iterations = 1002
-run_experiment.save_every = 100
-run_experiment.hidden_sizes = [32,32]
-
-run_experiment.map_width = 20
-run_experiment.map_height = 20
-run_experiment.n_agents = {"grid_search": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
-run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_entropy_coeff_{config[entropy_coeff]}_{config[n_agents]}_agents_"
-
-run_experiment.horizon = 50
-run_experiment.seed = 123
-
-run_experiment.entropy_coeff = {"grid_search": [1e-3, 1e-2, 0]}
-
-run_experiment.obs_builder = {"grid_search": [@TreeObsForRailEnv()]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
-TreeObsForRailEnv.max_depth = 2
-LocalObsForRailEnv.view_radius = 5
-
--- a/RLLib_training/experiment_configs/observation_benchmark/config.gin
+++ b/RLLib_training/experiment_configs/observation_benchmark/config.gin
-run_experiment.name = "observation_benchmark_results"
-run_experiment.num_iterations = 1002
-run_experiment.save_every = 100
-run_experiment.hidden_sizes = [32, 32]
-
-run_experiment.map_width = 20
-run_experiment.map_height = 20
-run_experiment.n_agents = 5
-run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}_{config[n_agents]}_agents"
-
-run_experiment.horizon = 50
-run_experiment.seed = 123
-
-run_experiment.obs_builder = {"grid_search": [@LocalObsForRailEnv()]}# [@TreeObsForRailEnv(), @GlobalObsForRailEnv() ]}
-TreeObsForRailEnv.max_depth = 2
-LocalObsForRailEnv.view_radius = 5
-
--- a/RLLib_training/experiment_configs/observation_benchmark_loaded_env/config.gin
+++ b/RLLib_training/experiment_configs/observation_benchmark_loaded_env/config.gin
-run_experiment.name = "observation_benchmark_loaded_env_results"
-run_experiment.num_iterations = 1002
-run_experiment.save_every = 50
-run_experiment.hidden_sizes = [32, 32]
-
-run_experiment.map_width = 20
-run_experiment.map_height = 20
-run_experiment.n_agents = 5
-run_experiment.policy_folder_name = "ppo_policy_{config[obs_builder].__class__.__name__}"#_entropy_coeff_{config[entropy_coeff]}_{config[hidden_sizes][0]}_hidden_sizes_"
-
-run_experiment.horizon = 50
-run_experiment.seed = 123
-run_experiment.conv_model = False
-
-run_experiment.entropy_coeff = 1e-2
-
-run_experiment.obs_builder = @TreeObsForRailEnv()#{"grid_search": [@LocalObsForRailEnv(), @TreeObsForRailEnv(), @GlobalObsForRailEnv(), @GlobalObsForRailEnvDirectionDependent()]}
-TreeObsForRailEnv.max_depth = 2
-LocalObsForRailEnv.view_radius = 5
--- a/RLLib_training/render_training_result.py
+++ b/RLLib_training/render_training_result.py
-from baselines.RLLib_training.RailEnvRLLibWrapper import RailEnvRLLibWrapper
-import gym
-
-
-from flatland.envs.generators import complex_rail_generator
-
-
-# Import PPO trainer: we can replace these imports by any other trainer from RLLib.
-from ray.rllib.agents.ppo.ppo import DEFAULT_CONFIG
-from ray.rllib.agents.ppo.ppo import PPOTrainer as Trainer
-# from baselines.CustomPPOTrainer import PPOTrainer as Trainer
-from ray.rllib.agents.ppo.ppo_policy_graph import PPOPolicyGraph as PolicyGraph
-# from baselines.CustomPPOPolicyGraph import CustomPPOPolicyGraph as PolicyGraph
-
-from ray.rllib.models import ModelCatalog
-from ray.tune.logger import pretty_print
-from baselines.RLLib_training.custom_preprocessors import CustomPreprocessor, ConvModelPreprocessor
-
-from baselines.RLLib_training.custom_models import ConvModelGlobalObs
-
-
-import ray
-import numpy as np
-
-from ray.tune.logger import UnifiedLogger
-import tempfile
-
-import gin
-
-from ray import tune
-
-from ray.rllib.utils.seed import seed as set_seed
-from flatland.envs.observations import TreeObsForRailEnv, GlobalObsForRailEnv,\
-                                       LocalObsForRailEnv, GlobalObsForRailEnvDirectionDependent
-
-from flatland.utils.rendertools import RenderTool
-import time
-
-gin.external_configurable(TreeObsForRailEnv)
-gin.external_configurable(GlobalObsForRailEnv)
-gin.external_configurable(LocalObsForRailEnv)
-gin.external_configurable(GlobalObsForRailEnvDirectionDependent)
-
-from ray.rllib.models.preprocessors import TupleFlatteningPreprocessor
-
-ModelCatalog.register_custom_preprocessor("tree_obs_prep", CustomPreprocessor)
-ModelCatalog.register_custom_preprocessor("global_obs_prep", TupleFlatteningPreprocessor)
-ModelCatalog.register_custom_preprocessor("conv_obs_prep", ConvModelPreprocessor)
-ModelCatalog.register_custom_model("conv_model", ConvModelGlobalObs)
-ray.init()#object_store_memory=150000000000, redis_max_memory=30000000000)
-
-
-CHECKPOINT_PATH = '/home/guillaume/EPFL/Master_Thesis/flatland/baselines/RLLib_training/experiment_configs/' \
-                  'conv_model_test/ppo_policy_TreeObsForRailEnv_5_agents_conv_model_False_ial1g3w9/checkpoint_51/checkpoint-51'
-
-N_EPISODES = 3
-N_STEPS_PER_EPISODE = 50
-
-
-def render_training_result(config):
-    print('Init Env')
-
-    set_seed(config['seed'], config['seed'], config['seed'])
-
-    transition_probability = [15,  # empty cell - Case 0
-                              5,  # Case 1 - straight
-                              5,  # Case 2 - simple switch
-                              1,  # Case 3 - diamond crossing
-                              1,  # Case 4 - single slip
-                              1,  # Case 5 - double slip
-                              1,  # Case 6 - symmetrical
-                              0,  # Case 7 - dead end
-                              1,  # Case 1b (8)  - simple turn right
-                              1,  # Case 1c (9)  - simple turn left
-                              1]  # Case 2b (10) - simple switch mirrored
-
-    # Example configuration to generate a random rail
-    env_config = {"width": config['map_width'],
-                  "height": config['map_height'],
-                  "rail_generator": complex_rail_generator,
-                  "number_of_agents": config['n_agents'],
-                  "seed": config['seed'],
-                  "obs_builder": config['obs_builder']}
-
-
-    # Observation space and action space definitions
-    if isinstance(config["obs_builder"], TreeObsForRailEnv):
-        obs_space = gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(105,))
-        preprocessor = "tree_obs_prep"
-
-    elif isinstance(config["obs_builder"], GlobalObsForRailEnv):
-        obs_space = gym.spaces.Tuple((
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 8)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
-        if config['conv_model']:
-            preprocessor = "conv_obs_prep"
-        else:
-            preprocessor = "global_obs_prep"
-
-    elif isinstance(config["obs_builder"], GlobalObsForRailEnvDirectionDependent):
-        obs_space = gym.spaces.Tuple((
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 5)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
-        if config['conv_model']:
-            preprocessor = "conv_obs_prep"
-        else:
-            preprocessor = "global_obs_prep"
-
-    elif isinstance(config["obs_builder"], LocalObsForRailEnv):
-        view_radius = config["obs_builder"].view_radius
-        obs_space = gym.spaces.Tuple((
-            gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 16)),
-            gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 2)),
-            gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 4)),
-            gym.spaces.Box(low=0, high=1, shape=(4,))))
-        preprocessor = "global_obs_prep"
-
-    else:
-        raise ValueError("Undefined observation space")
-
-    act_space = gym.spaces.Discrete(4)
-
-    # Dict with the different policies to train
-    policy_graphs = {
-        config['policy_folder_name'].format(**locals()): (PolicyGraph, obs_space, act_space, {})
-    }
-
-    def policy_mapping_fn(agent_id):
-        return config['policy_folder_name'].format(**locals())
-
-    # Trainer configuration
-    trainer_config = DEFAULT_CONFIG.copy()
-    if config['conv_model']:
-        trainer_config['model'] = {"custom_model": "conv_model", "custom_preprocessor": preprocessor}
-    else:
-        trainer_config['model'] = {"fcnet_hiddens": config['hidden_sizes'], "custom_preprocessor": preprocessor}
-
-    trainer_config['multiagent'] = {"policy_graphs": policy_graphs,
-                                    "policy_mapping_fn": policy_mapping_fn,
-                                    "policies_to_train": list(policy_graphs.keys())}
-    trainer_config["horizon"] = config['horizon']
-
-    trainer_config["num_workers"] = 0
-    trainer_config["num_cpus_per_worker"] = 3
-    trainer_config["num_gpus"] = 0
-    trainer_config["num_gpus_per_worker"] = 0
-    trainer_config["num_cpus_for_driver"] = 1
-    trainer_config["num_envs_per_worker"] = 1
-    trainer_config['entropy_coeff'] = config['entropy_coeff']
-    trainer_config["env_config"] = env_config
-    trainer_config["batch_mode"] = "complete_episodes"
-    trainer_config['simple_optimizer'] = False
-    trainer_config['postprocess_inputs'] = True
-    trainer_config['log_level'] = 'WARN'
-
-    env = RailEnvRLLibWrapper(env_config)
-
-    trainer = Trainer(env=RailEnvRLLibWrapper, config=trainer_config)
-
-    trainer.restore(CHECKPOINT_PATH)
-
-    policy = trainer.get_policy(config['policy_folder_name'].format(**locals()))
-
-    env_renderer = RenderTool(env, gl="PIL", show=True)
-    for episode in range(N_EPISODES):
-        observation = env.reset()
-        for i in range(N_STEPS_PER_EPISODE):
-
-            action, _, infos = policy.compute_actions(list(observation.values()), [])
-            env_renderer.renderEnv(show=True, frames=True, iEpisode=episode, iStep=i,
-                                   action_dict=action)
-            logits = infos['behaviour_logits']
-            actions = dict()
-            for j, logit in enumerate(logits):
-                actions[j] = np.argmax(logit)
-
-            time.sleep(1)
-            observation, _, _, _ = env.step(action)
-
-    env_renderer.close_window()
-
-
-@gin.configurable
-def run_experiment(name, num_iterations, n_agents, hidden_sizes, save_every,
-                   map_width, map_height, horizon, policy_folder_name, local_dir, obs_builder,
-                   entropy_coeff, seed, conv_model):
-
-    render_training_result(
-        config={"n_agents": n_agents,
-                "hidden_sizes": hidden_sizes,  # Array containing the sizes of the network layers
-                "save_every": save_every,
-                "map_width": map_width,
-                "map_height": map_height,
-                "local_dir": local_dir,
-                "horizon": horizon,  # Max number of time steps
-                'policy_folder_name': policy_folder_name,
-                "obs_builder": obs_builder,
-                "entropy_coeff": entropy_coeff,
-                "seed": seed,
-                "conv_model": conv_model
-                })
-
-
-if __name__ == '__main__':
-    gin.external_configurable(tune.grid_search)
-    dir = '/home/guillaume/EPFL/Master_Thesis/flatland/baselines/RLLib_training/experiment_configs/conv_model_test'  # To Modify
-    gin.parse_config_file(dir + '/config.gin')
-    run_experiment(local_dir=dir)
--- a/RLLib_training/train.py
+++ b/RLLib_training/train.py
-from flatland.envs import rail_env
-from flatland.envs.rail_env import random_rail_generator
-from baselines.RailEnvRLLibWrapper import RailEnvRLLibWrapper
-from flatland.utils.rendertools import RenderTool
-import random
-import gym
-
-import matplotlib.pyplot as plt
-
-from flatland.envs.generators import complex_rail_generator
-
-import ray.rllib.agents.ppo.ppo as ppo
-import ray.rllib.agents.dqn.dqn as dqn
-from ray.rllib.agents.ppo.ppo import PPOTrainer
-from ray.rllib.agents.dqn.dqn import DQNTrainer
-from ray.rllib.agents.ppo.ppo_policy_graph import PPOPolicyGraph
-from ray.rllib.agents.dqn.dqn_policy_graph import DQNPolicyGraph
-
-from ray.tune.registry import register_env
-from ray.rllib.models import ModelCatalog
-from ray.tune.logger import pretty_print
-from baselines.CustomPreprocessor import CustomPreprocessor
-
-
-import ray
-import numpy as np
-
-from ray.rllib.env.multi_agent_env import MultiAgentEnv
-
-# RailEnv.__bases__ = (RailEnv.__bases__[0], MultiAgentEnv)
-
-
-
-ModelCatalog.register_custom_preprocessor("my_prep", CustomPreprocessor)
-ray.init()
-
-def train(config):
-    print('Init Env')
-    random.seed(1)
-    np.random.seed(1)
-
-    transition_probability = [15,  # empty cell - Case 0
-                              5,  # Case 1 - straight
-                              5,  # Case 2 - simple switch
-                              1,  # Case 3 - diamond crossing
-                              1,  # Case 4 - single slip
-                              1,  # Case 5 - double slip
-                              1,  # Case 6 - symmetrical
-                              0,  # Case 7 - dead end
-                              1,  # Case 1b (8)  - simple turn right
-                              1,  # Case 1c (9)  - simple turn left
-                              1]  # Case 2b (10) - simple switch mirrored
-
-    # Example generate a random rail
-    """
-    env = RailEnv(width=10,
-                  height=10,
-                  rail_generator=random_rail_generator(cell_type_relative_proportion=transition_probability),
-                  number_of_agents=1)
-    """
-    env_config = {"width": 20,
-                  "height":20,
-                  "rail_generator":complex_rail_generator(nr_start_goal=5, min_dist=5, max_dist=99999, seed=0),
-                  "number_of_agents":5}
-    """
-    env = RailEnv(width=20,
-                  height=20,
-                  rail_generator=rail_from_list_of_saved_GridTransitionMap_generator(
-                          ['../notebooks/temp.npy']),
-                  number_of_agents=3)
-
-    """
-
-    # if config['render']:
-    #     env_renderer = RenderTool(env, gl="QT")
-    # plt.figure(figsize=(5,5))
-
-    obs_space = gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(105,))
-    act_space = gym.spaces.Discrete(4)
-
-    # Dict with the different policies to train
-    policy_graphs = {
-        "ppo_policy": (PPOPolicyGraph, obs_space, act_space, {})
-    }
-
-    def policy_mapping_fn(agent_id):
-        return f"ppo_policy"
-
-    agent_config = ppo.DEFAULT_CONFIG.copy()
-    agent_config['model'] = {"fcnet_hiddens": [32, 32], "custom_preprocessor": "my_prep"}
-    agent_config['multiagent'] = {"policy_graphs": policy_graphs,
-                                  "policy_mapping_fn": policy_mapping_fn,
-                                  "policies_to_train": list(policy_graphs.keys())}
-    agent_config["horizon"] = 50
-    agent_config["num_workers"] = 0
-    # agent_config["sample_batch_size"]: 1000
-    #agent_config["num_cpus_per_worker"] = 40
-    #agent_config["num_gpus"] = 2.0
-    #agent_config["num_gpus_per_worker"] = 2.0
-    #agent_config["num_cpus_for_driver"] = 5
-    #agent_config["num_envs_per_worker"] = 15
-    agent_config["env_config"] = env_config
-    #agent_config["batch_mode"] = "complete_episodes"
-
-    ppo_trainer = PPOTrainer(env=RailEnvRLLibWrapper, config=agent_config)
-
-    for i in range(100000 + 2):
-        print("== Iteration", i, "==")
-
-        print("-- PPO --")
-        print(pretty_print(ppo_trainer.train()))
-
-        # if i % config['save_every'] == 0:
-        #     checkpoint = ppo_trainer.save()
-        #     print("checkpoint saved at", checkpoint)
-
-train({})
-
-
-
-
-
-
--- a/RLLib_training/train_experiment.py
+++ b/RLLib_training/train_experiment.py
-from baselines.RLLib_training.RailEnvRLLibWrapper import RailEnvRLLibWrapper
-import gym
-
-
-from flatland.envs.generators import complex_rail_generator
-
-
-# Import PPO trainer: we can replace these imports by any other trainer from RLLib.
-from ray.rllib.agents.ppo.ppo import DEFAULT_CONFIG
-from ray.rllib.agents.ppo.ppo import PPOTrainer as Trainer
-# from baselines.CustomPPOTrainer import PPOTrainer as Trainer
-from ray.rllib.agents.ppo.ppo_policy_graph import PPOPolicyGraph as PolicyGraph
-# from baselines.CustomPPOPolicyGraph import CustomPPOPolicyGraph as PolicyGraph
-
-from ray.rllib.models import ModelCatalog
-from ray.tune.logger import pretty_print
-from baselines.RLLib_training.custom_preprocessors import CustomPreprocessor, ConvModelPreprocessor
-
-from baselines.RLLib_training.custom_models import ConvModelGlobalObs
-
-
-import ray
-import numpy as np
-
-from ray.tune.logger import UnifiedLogger
-import tempfile
-
-import gin
-
-from ray import tune
-
-from ray.rllib.utils.seed import seed as set_seed
-from flatland.envs.observations import TreeObsForRailEnv, GlobalObsForRailEnv,\
-                                       LocalObsForRailEnv, GlobalObsForRailEnvDirectionDependent
-
-gin.external_configurable(TreeObsForRailEnv)
-gin.external_configurable(GlobalObsForRailEnv)
-gin.external_configurable(LocalObsForRailEnv)
-gin.external_configurable(GlobalObsForRailEnvDirectionDependent)
-
-from ray.rllib.models.preprocessors import TupleFlatteningPreprocessor
-
-ModelCatalog.register_custom_preprocessor("tree_obs_prep", CustomPreprocessor)
-ModelCatalog.register_custom_preprocessor("global_obs_prep", TupleFlatteningPreprocessor)
-ModelCatalog.register_custom_preprocessor("conv_obs_prep", ConvModelPreprocessor)
-ModelCatalog.register_custom_model("conv_model", ConvModelGlobalObs)
-ray.init()#object_store_memory=150000000000, redis_max_memory=30000000000)
-
-
-def train(config, reporter):
-    print('Init Env')
-
-    set_seed(config['seed'], config['seed'], config['seed'])
-
-    config['map_width']= 20
-    config['map_height']= 10
-    config['n_agents'] = 8
-
-    # Example configuration to generate a random rail
-    env_config = {"width": config['map_width'],
-                  "height": config['map_height'],
-                  "rail_generator": complex_rail_generator,
-                  "number_of_agents": config['n_agents'],
-                  "seed": config['seed'],
-                  "obs_builder": config['obs_builder']}
-
-    # Observation space and action space definitions
-    if isinstance(config["obs_builder"], TreeObsForRailEnv):
-        obs_space = gym.spaces.Box(low=-float('inf'), high=float('inf'), shape=(111,))
-        preprocessor = "tree_obs_prep"
-
-    elif isinstance(config["obs_builder"], GlobalObsForRailEnv):
-        obs_space = gym.spaces.Tuple((
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 8)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
-        if config['conv_model']:
-            preprocessor = "conv_obs_prep"
-        else:
-            preprocessor = "global_obs_prep"
-
-    elif isinstance(config["obs_builder"], GlobalObsForRailEnvDirectionDependent):
-        obs_space = gym.spaces.Tuple((
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 16)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 5)),
-            gym.spaces.Box(low=0, high=1, shape=(config['map_height'], config['map_width'], 2))))
-        if config['conv_model']:
-            preprocessor = "conv_obs_prep"
-        else:
-            preprocessor = "global_obs_prep"
-
-    elif isinstance(config["obs_builder"], LocalObsForRailEnv):
-        view_radius = config["obs_builder"].view_radius
-        obs_space = gym.spaces.Tuple((
-            gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 16)),
-            gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 2)),
-            gym.spaces.Box(low=0, high=1, shape=(2 * view_radius + 1, 2 * view_radius + 1, 4)),
-            gym.spaces.Box(low=0, high=1, shape=(4,))))
-        preprocessor = "global_obs_prep"
-
-    else:
-        raise ValueError("Undefined observation space")
-
-
-    act_space = gym.spaces.Discrete(4)
-
-    # Dict with the different policies to train
-    policy_graphs = {
-        config['policy_folder_name'].format(**locals()): (PolicyGraph, obs_space, act_space, {})
-    }
-
-    def policy_mapping_fn(agent_id):
-        return config['policy_folder_name'].format(**locals())
-
-
-    # Trainer configuration
-    trainer_config = DEFAULT_CONFIG.copy()
-    if config['conv_model']:
-        trainer_config['model'] = {"custom_model": "conv_model", "custom_preprocessor": preprocessor}
-    else:
-        trainer_config['model'] = {"fcnet_hiddens": config['hidden_sizes'], "custom_preprocessor": preprocessor}
-
-    trainer_config['multiagent'] = {"policy_graphs": policy_graphs,
-                                  "policy_mapping_fn": policy_mapping_fn,
-                                  "policies_to_train": list(policy_graphs.keys())}
-    trainer_config["horizon"] = config['horizon']
-
-    trainer_config["num_workers"] = 0
-    trainer_config["num_cpus_per_worker"] = 3
-    trainer_config["num_gpus"] = 0
-    trainer_config["num_gpus_per_worker"] = 0
-    trainer_config["num_cpus_for_driver"] = 1
-    trainer_config["num_envs_per_worker"] = 1
-    trainer_config['entropy_coeff'] = config['entropy_coeff']
-    trainer_config["env_config"] = env_config
-    trainer_config["batch_mode"] = "complete_episodes"
-    trainer_config['simple_optimizer'] = False
-    trainer_config['postprocess_inputs'] = True
-    trainer_config['log_level'] = 'WARN'
-
-    def logger_creator(conf):
-        """Creates a Unified logger with a default logdir prefix
-        containing the agent name and the env id
-        """
-        logdir = config['policy_folder_name'].format(**locals())
-        logdir = tempfile.mkdtemp(
-            prefix=logdir, dir=config['local_dir'])
-        return UnifiedLogger(conf, logdir, None)
-
-    logger = logger_creator
-
-    trainer = Trainer(env=RailEnvRLLibWrapper, config=trainer_config, logger_creator=logger)
-
-    for i in range(100000 + 2):
-        print("== Iteration", i, "==")
-
-        print(pretty_print(trainer.train()))
-
-        if i % config['save_every'] == 0:
-            checkpoint = trainer.save()
-            print("checkpoint saved at", checkpoint)
-
-        reporter(num_iterations_trained=trainer._iteration)
-
-
-@gin.configurable
-def run_experiment(name, num_iterations, n_agents, hidden_sizes, save_every,
-                   map_width, map_height, horizon, policy_folder_name, local_dir, obs_builder,
-                   entropy_coeff, seed, conv_model):
-
-    tune.run(
-        train,
-        name=name,
-        stop={"num_iterations_trained": num_iterations},
-        config={"n_agents": n_agents,
-                "hidden_sizes": hidden_sizes,  # Array containing the sizes of the network layers
-                "save_every": save_every,
-                "map_width": map_width,
-                "map_height": map_height,
-                "local_dir": local_dir,
-                "horizon": horizon,  # Max number of time steps
-                'policy_folder_name': policy_folder_name,
-                "obs_builder": obs_builder,
-                "entropy_coeff": entropy_coeff,
-                "seed": seed,
-                "conv_model": conv_model
-                },
-        resources_per_trial={
-            "cpu": 2,
-            "gpu": 0.0
-        },
-        local_dir=local_dir
-    )
-
-
-if __name__ == '__main__':
-    gin.external_configurable(tune.grid_search)
-    dir = '/mount/SDC/flatland/baselines/RLLib_training/experiment_configs/observation_benchmark_loaded_env'  # To Modify
-    gin.parse_config_file(dir + '/config.gin')
-    run_experiment(local_dir=dir)
--- a/parameters.txt
+++ b/parameters.txt
+{'Test_0':[20,20,20,3],
+'Test_1':[10,10,3,4321],
+'Test_2':[10,10,5,123],
+'Test_3':[50,50,5,21],
+'Test_4':[50,50,20,85],
+'Test_5':[100,100,5,436],
+'Test_6':[100,100,20,6487],
+'Test_7':[100,100,50,567],
+'Test_8':[100,10,20,3245],
+'Test_9':[10,100,20,632]
+}
\ No newline at end of file
--- a/requirements_RLLib_training.txt
+++ b/requirements_RLLib_training.txt
+#ray==0.7.0
+gym ==0.12.5
+opencv-python==4.1.0.25
+#tensorflow==1.13.1
+lz4==2.1.10
+gin-config==0.1.4
\ No newline at end of file
--- a/requirements_torch_training.txt
+++ b/requirements_torch_training.txt
+git+https://gitlab.aicrowd.com/flatland/flatland.git
+importlib-metadata>=0.17
+importlib_resources>=1.0.2
+torch>=1.1.0
\ No newline at end of file
No results found