README.md 1.77 KB
Newer Older
MasterScrat's avatar
MasterScrat committed
1
# 🚂 Flatland Baselines
metataro's avatar
metataro committed
2

MasterScrat's avatar
MasterScrat committed
3
This repository contains reinforcement learning baselines for the [NeurIPS 2020 Flatland Challenge](https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge/).
metataro's avatar
metataro committed
4

MasterScrat's avatar
MasterScrat committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## Provided baselines

### RL Methods

- Double Dueling DQN 
- Ape-X
- PPO
- CCPPO
- Pure Imitation Learning: MARWIL
- Mixed IL/RL: DQfD 

### Custom observations

- Density observations
- Local conflict observations

### Tricks

- Action skipping
- Action masking

## Organisation
MasterScrat's avatar
Typo  
MasterScrat committed
27 28 29 30 31 32 33 34

Experiments consist of one or many RLlib YAML config files alongside a MARKDOWN file containing results, plots and a detailed description of the methodology.

All files are stored in a experiment folder under `experiments/<env-name>/<experiment-name>`.

- [Tree observations w/ fully connected network](experiments/flatland_random_sparse_small/tree_obs_fc_net)
- [Global observations w/ convnet](experiments/flatland_random_sparse_small/global_obs_conv_net)

MasterScrat's avatar
MasterScrat committed
35
## Setup
MasterScrat's avatar
MasterScrat committed
36

MasterScrat's avatar
MasterScrat committed
37
The setup uses conda, [install it](https://www.anaconda.com/products/individual) if necessary.
MasterScrat's avatar
MasterScrat committed
38 39

```
40 41
# with GPU support:
conda env create -f environment-gpu.yml
MasterScrat's avatar
MasterScrat committed
42
conda activate flatland-baseline-gpu-env
43 44

# or, without GPU support:
MasterScrat's avatar
MasterScrat committed
45 46
conda env create -f environment-cpu.yml
conda activate flatland-baseline-cpu-env
metataro's avatar
metataro committed
47 48 49
```

## Usage
50

metataro's avatar
metataro committed
51
Training example:
MasterScrat's avatar
MasterScrat committed
52 53

`python ./train.py -f experiments/flatland_random_sparse_small/global_obs_conv_net/ppo.yaml`
metataro's avatar
metataro committed
54

55
Evaluation example:
MasterScrat's avatar
MasterScrat committed
56 57 58

`python ./rollout.py /tmp/ray/checkpoint_dir/checkpoint-0 --run PPO --no-render
        --config '{"env_config": {"test": true}}' --episodes 1000 --out rollouts.pkl`
metataro's avatar
metataro committed
59

MasterScrat's avatar
MasterScrat committed
60
Note that `-f` overrides all other trial-specific command-line options.
metataro's avatar
metataro committed
61

MasterScrat's avatar
MasterScrat committed
62
## Notes
metataro's avatar
metataro committed
63

MasterScrat's avatar
MasterScrat committed
64
- The basic structure of this repository is adapted from [https://github.com/spMohanty/rl-experiments/](https://github.com/spMohanty/rl-experiments/)