Commit 46862bc1 authored by nilabha's avatar nilabha

Upd configs and readme

parent f2c50e49
......@@ -2,14 +2,14 @@ flatland-random-sparse-small-tree-marwil-fc-ppo:
run: MARWIL
env: flatland_sparse
stop:
timesteps_total: 10000000 # 1e7
timesteps_total: 1000000000 # 1e7
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
config:
beta:
grid_search: [0,0.25, 1] # compare IL (beta=0) vs MARWIL
grid_search: [0,0.25,0.5,0.75, 1] # compare IL (beta=0) vs MARWIL
input: /tmp/flatland-out
input_evaluation: [is, wis, simulation]
# effective batch_size: train_batch_size * num_agents_in_each_environment [5, 10]
......
flatland-random-sparse-small-tree-fc-ppo:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 100000000 # 1e8
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
config:
input:
"/tmp/flatland-out": 0.25
sampler: 0.75
input_evaluation: [is, wis, simulation]
num_workers: 2
num_envs_per_worker: 1
num_gpus: 0
env_config:
observation: tree
observation_config:
max_depth: 2
shortest_path_max_depth: 30
generator: sparse_rail_generator
generator_config: small_v0
wandb:
project: flatland
entity: masterscrat
tags: ["small_v0", "tree_obs", "apex_DQfD"] # TODO should be set programmatically
model:
custom_model: custom_loss_model
custom_options:
input_files: /tmp/flatland-out
lambda1: 1
lambda2: 1
flatland-random-sparse-small-tree-fc-ppo:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 100000000 # 1e8
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
config:
input: /tmp/flatland-out
input_evaluation: [is, wis, simulation]
num_workers: 2
num_envs_per_worker: 1
num_gpus: 0
env_config:
observation: tree
observation_config:
max_depth: 2
shortest_path_max_depth: 30
generator: sparse_rail_generator
generator_config: small_v0
wandb:
project: flatland
entity: masterscrat
tags: ["small_v0", "tree_obs", "apex_IL"] # TODO should be set programmatically
model:
fcnet_activation: relu
fcnet_hiddens: [256, 256]
vf_share_layers: True # False
\ No newline at end of file
flatland-random-sparse-small-tree-fc-ppo:
run: APEX
env: flatland_sparse
stop:
timesteps_total: 100000000 # 1e8
checkpoint_freq: 10
checkpoint_at_end: True
keep_checkpoints_num: 5
checkpoint_score_attr: episode_reward_mean
config:
input:
"/tmp/flatland-out": 0.25
sampler: 0.75
input_evaluation: [is, wis, simulation]
num_workers: 2
num_envs_per_worker: 1
num_gpus: 0
env_config:
observation: tree
observation_config:
max_depth: 2
shortest_path_max_depth: 30
generator: sparse_rail_generator
generator_config: small_v0
wandb:
project: flatland
entity: masterscrat
tags: ["small_v0", "tree_obs", "apex_Mixed_IL"] # TODO should be set programmatically
model:
fcnet_activation: relu
fcnet_hiddens: [256, 256]
vf_share_layers: True # False
\ No newline at end of file
......@@ -15,11 +15,15 @@ In the config file set the input location as follows
`input: /tmp/flatland-out`
The experiences are copied in the folder `/tmp/flatland-out` folder. It does a glob to find all experiences saved in json format.
## On Policy (MARWIL)
## On Policy ([MARWIL](http://papers.nips.cc/paper/7866-exponentially-weighted-imitation-learning-for-batched-historical-data.pdf))
### Phase 1
This is for Pure Imitation Learning with Input Evaluation using IS,WIS and Simulation
This is for Pure Imitation Learning with Input Evaluation using IS,WIS and Simulation.
We use the trainImitate.py file which is very similar to the train.py file.
TODO:
Make the train.py flexible enough to also use for Imitation Learning
Config file: `MARWIL.yaml`
```bash
python trainImitate.py -f experiments/flatland_sparse/small_v0/tree_obs_fc_net/ImitationLearning/MARWIL.yaml
......@@ -32,26 +36,30 @@ Config file: `MARWIL.yaml`
Set Beta = 0.25 and 1 to compare against the pure imitation MARWIL approach
## Off Policy (DQN)
### Ape-X
### DQN (TODO: Ape-X not working)
#### Phase 1
This is for Pure Imitation Learning with Input Evaluation using IS,WIS and Simulation
Config file: `apex_IL.yaml`
Config file: `dqn_IL.yaml`
```bash
python trainImitate.py -f experiments/flatland_sparse/small_v0/tree_obs_fc_net/ImitationLearning/apex_IL.yaml
python trainImitate.py -f experiments/flatland_sparse/small_v0/tree_obs_fc_net/ImitationLearning/dqn_IL.yaml
```
#### Phase 2
Replace Config file to `apex_mixed_IL.yaml`
Replace Config file to `dqn_mixed_IL.yaml`
Note that we no longer use Simulation for Input-evalaution as we have a sampler which runs the environment as per the proportion specified.
### [Ape-X DQfD](https://arxiv.org/pdf/1805.11593.pdf)
Involves mixed training in the ratio 25% (Expert) and 75% (Simulation). This is a deviation from the earlier [DQfD](https://arxiv.org/pdf/1704.03732.pdf) paper where there was a pure imitation step
A nice explanation and summary can be found [here](https://danieltakeshi.github.io/2019/05/11/dqfd-followups/) and [here](https://danieltakeshi.github.io/2019/04/30/il-and-rl/)
Config file: `apex_DQfD.yaml`
**Currently this is not working with custom loss model. Use the config dqn_DQfD.yaml. DQN is similar to Ape-X but is slower **
Config file: `dqn_DQfD.yaml`
(Currently the Ape-X version is not working with custom loss model. Use the config dqn_DQfD.yaml. DQN is similar to Ape-X but is slower)
```bash
python trainImitate.py -f experiments/flatland_sparse/small_v0/tree_obs_fc_net/ImitationLearning/apex_DQfD.yaml
python trainImitate.py -f experiments/flatland_sparse/small_v0/tree_obs_fc_net/ImitationLearning/dqn_DQfD.yaml
```
TODO:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment