Commit 0578761f authored by Shivam Khandelwal's avatar Shivam Khandelwal

Merge branch 'master' into 'gitlab-ci'

# Conflicts:
#   environment-cpu.yml
parents e8843784 64b1f59f
Pipeline #4858 failed with stage
in 3 minutes and 50 seconds
# NeurIPS 2020 Flatland Challenge baselines # 🚂 Flatland Baselines
📈 [**Results**](https://app.wandb.ai/masterscrat/flatland/reports/Flatland-Baselines--Vmlldzo4OTc5NA) This repository contains reinforcement learning baselines for the [NeurIPS 2020 Flatland Challenge](https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge/).
Experiments ## Provided baselines
---
```{note}
**Note:** looking for something simpler to start? We also provide a simpler Dueling Double DQN method implemented using PyTorch without relying on RLlib: **https://gitlab.aicrowd.com/flatland/flatland-examples**
```
### RL Methods
- Ape-X
- PPO
- CCPPO
- Pure Imitation Learning: MARWIL
- Mixed IL/RL: DQfD
### Custom observations
- Density observations
- Local conflict observations
### Tricks
- Action skipping
- Action masking
## Organisation
Experiments consist of one or many RLlib YAML config files alongside a MARKDOWN file containing results, plots and a detailed description of the methodology. Experiments consist of one or many RLlib YAML config files alongside a MARKDOWN file containing results, plots and a detailed description of the methodology.
...@@ -12,31 +38,20 @@ All files are stored in a experiment folder under `experiments/<env-name>/<exper ...@@ -12,31 +38,20 @@ All files are stored in a experiment folder under `experiments/<env-name>/<exper
- [Tree observations w/ fully connected network](experiments/flatland_random_sparse_small/tree_obs_fc_net) - [Tree observations w/ fully connected network](experiments/flatland_random_sparse_small/tree_obs_fc_net)
- [Global observations w/ convnet](experiments/flatland_random_sparse_small/global_obs_conv_net) - [Global observations w/ convnet](experiments/flatland_random_sparse_small/global_obs_conv_net)
Setup ## Setup
---
Using conda (recommended): The setup uses conda, [install it](https://www.anaconda.com/products/individual) if necessary.
``` ```
# with GPU support: # with GPU support:
conda env create -f environment-gpu.yml conda env create -f environment-gpu.yml
conda activate flatland-baseline-gpu-env
# or, without GPU support: # or, without GPU support:
#conda env create -f environment-cpu.yml conda env create -f environment-cpu.yml
conda activate flatland-baseline-cpu-env
conda activate flatland-env
pip install -r requirements.txt
``` ```
Using pip:
```
# no GPU support:
pip install -r requirements.txt
```
You may need to install/update bazel: [Ubuntu guide](https://docs.bazel.build/versions/master/install-ubuntu.html)
## Usage ## Usage
Training example: Training example:
...@@ -48,9 +63,8 @@ Evaluation example: ...@@ -48,9 +63,8 @@ Evaluation example:
`python ./rollout.py /tmp/ray/checkpoint_dir/checkpoint-0 --run PPO --no-render `python ./rollout.py /tmp/ray/checkpoint_dir/checkpoint-0 --run PPO --no-render
--config '{"env_config": {"test": true}}' --episodes 1000 --out rollouts.pkl` --config '{"env_config": {"test": true}}' --episodes 1000 --out rollouts.pkl`
Note that -f overrides all other trial-specific command-line options. Note that `-f` overrides all other trial-specific command-line options.
Notes ## Notes
---
- The basic structure of this repository is adopted from [https://github.com/spMohanty/rl-experiments/](https://github.com/spMohanty/rl-experiments/) - The basic structure of this repository is adapted from [https://github.com/spMohanty/rl-experiments/](https://github.com/spMohanty/rl-experiments/)
\ No newline at end of file \ No newline at end of file
# Combined Observation
```{admonition} TL;DR
This observation allows to combine multiple observation by specifying them int the fun config.
```
### 💡 The idea
Provide a simple way to combine multiple observation.
### 🗂️ Files and usage
The observation is defined in "neurips2020-flatland-baselines/envs/flatland/observations/combined_obs.py".
To combine multiple observations, instead of directly putting the observation settings under "observation_config", use the names of the observations you want to combine as keys to provide the corresponding observation configs (see example).
An example config is located in "neurips2020-flatland-baselines/baselines/combined_tree_local_conflict_obs/sparse_small_apex_maxdepth2_spmaxdepth30.yaml" and can be run with
`python ./train.py -f baselines/combined_tree_local_conflict_obs/sparse_small_apex_maxdepth2_spmaxdepth30.yaml`
### 📦 Implementation Details
This observation does not generate itself any information for the agent but just naively concat the outputs of the specified observations.
### 📈 Results
Since this observations is meant as a helper to easily explore combinations of observations, there is no meaningful baseline. However, we did a run combining tree and local conflict observations as a sanity check (see link below).
### 🔗 Links
* [W&B report for test run](https://app.wandb.ai/masterscrat/flatland/reports/Tree-and-Conflict-Obs-|-sparse-small_v0--VmlldzoxNTc4MzU)
### 🌟 Credits
flatland-sparse-small-combined-obs-tree-local-conflict-apex:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 15000000 # 1.5e7
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
num_samples: 3
config:
num_workers: 13
num_envs_per_worker: 5
num_gpus: 0
env_config:
observation: combined
observation_config:
tree:
max_depth: 2
shortest_path_max_depth: 30
localConflict:
max_depth: 2
shortest_path_max_depth: 30
n_local: 5
generator: sparse_rail_generator
generator_config: small_v0
resolve_deadlocks: False
deadlock_reward: 0
density_reward_factor: 0
wandb:
project: flatland
entity: masterscrat
tags: ["small_v0", "tree_and_local_conflict", "apex"] # TODO should be set programmatically
model:
fcnet_activation: relu
fcnet_hiddens: [256, 256]
vf_share_layers: True
evaluation_num_workers: 2
evaluation_interval: 100
evaluation_num_episodes: 100
evaluation_config:
explore: False
env_config:
observation: combined
observation_config:
tree:
max_depth: 2
shortest_path_max_depth: 30
localConflict:
max_depth: 2
shortest_path_max_depth: 30
n_local: 5
regenerate_rail_on_reset: True
regenerate_schedule_on_reset: True
render: False
# Global Density Observation
```{admonition} TL;DR
This observation is a global observation, that provides an agent with information on its own and the other agents predicted paths. The paths are predicted from the shortest path of each agent to its respective target. The information is encoded into a density map.
```
### 💡 The idea
The density observation is based on the idea that every agent's path to its target is represented in a discrete map of the environment assigning each location (cell) a value encoding the information if and when the cell will be occupied. For simplicity, we assume that an agent follows the shortest path to its target and don't consider alternative paths. The individual values along the agents' shortest paths are combined into a "density" for each cell. For example, if all agents would occupy the same cell at the same time step, the density would be very high. If the agents would use the same cell but at different time steps the density for that cell would be lower. The density map therefore potentially allows the agents to learn from the (projected) cell occupancy distribution.
### 🗂️ Files and usage
The observation is defined in "neurips2020-flatland-baselines/envs/flatland/observations/global_density_obs.py", the model used in the example in "neurips2020-flatland-baselines/models/global_dens_obs_model.py".
The observation can be configured with the following parameters:
* width and height: have to correspond to the shape of the environment
* max_t: max number of time steps the path of each agent is predicted for
* encoding: defining how to factor in the time information into the density value (2d options: exp_decay, lin_decay, binary; 3d option: series; see next section for more details)
An example config is located in "neurips2020-flatland-baselines/baselines/global_density_obs/sparse_small_apex_expdecay_maxt1000.yaml" and can be run with
`python ./train.py -f baselines/global_density_obs/sparse_small_apex_expdecay_maxt1000.yaml`
### 📦 Implementation Details
The observation for each agent consists of two arrays representing the cells of the environment. The first array contains the density values for the agent itself, and the second one the mean of the other agents' values for each cell. The arrays are either two or three dimensional depending on the encoding.
The idea behind this parameter is to provide a way to compress the space and time information into a 2d representation. However, it is possible to get a 3d observation with a separate, 2d density map for each time step, by using the option "series" (for time series) for the encoding. In this case, a binary representation for the individual agent occupancies is used.
The other options use a function of the time step *t* and the maximal time step *max_t* to determine the density value:
* exp_decay: e^(-t / max_t^(1/2))
* lin_decay: (max_t - t) / max_t
* binary: 1
We created a custom model (GlobalDensObsModel) for this observation that uses a convolutional neural network to process the observation. For the experiments, we used the IMPALA (see links section) implementation.
### 📈 Results
We trained the agents with the different encoding options and different values for max_t using Ape-X (see links section). However, we didn't search systematically or exhaustibly for the best settings.
The best runs achieved around 45% mean completion on the sparse, small flatand environment with max_t = 1000 and encoding = exp_decay. The mean completion rate is considerably lower than the tree observation but show that learning is possible from global observations and can inform approaches to combine local, tree and global observations.
More information on the runs is can be found in the weights and biases report linked below.
### 🔗 Links
* [IMPALA Paper – IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (Espeholt et al.)](https://arxiv.org/abs/1802.01561)
* [Ape-X Paper – Distributed Prioritized Experience Replay (Horgan et al.)](https://arxiv.org/abs/1803.00933)
* [W&B report for training runs](https://app.wandb.ai/masterscrat/flatland/reports/Density-Obs-|-sparse-small_v0--VmlldzoxMTYxMDE)
### 🌟 Credits
flatland-sparse-small-density-cnn-apex:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 15000000 # 1.5e7
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
num_samples: 3
config:
num_workers: 13
num_envs_per_worker: 5
num_gpus: 0
hiddens: []
dueling: False
env_config:
observation: density
observation_config:
width: 25
height: 25
max_t: 1000
encoding: exp_decay
generator: sparse_rail_generator
generator_config: small_v0
wandb:
project: flatland
entity: masterscrat
tags: ["small_v0", "density_obs", "apex"] # TODO should be set programmatically
model:
custom_model: global_dens_obs_model
custom_options:
architecture: impala
architecture_options:
residual_layers: [[16, 2], [32, 4]]
evaluation_num_workers: 2
evaluation_interval: 100
evaluation_num_episodes: 100
evaluation_config:
explore: False
env_config:
observation: density
observation_config:
width: 25
height: 25
max_t: 1000
encoding: exp_decay
regenerate_rail_on_reset: True
regenerate_schedule_on_reset: True
render: False
name: flatland-env name: flatland-baseline-cpu-env
dependencies: dependencies:
- conda-build - conda-build
- python=3.7 - python=3.7
...@@ -6,12 +6,11 @@ dependencies: ...@@ -6,12 +6,11 @@ dependencies:
- pip - pip
- cairo - cairo
- pip: - pip:
- flatland-rl==2.1.10 - flatland-rl==2.2.1