Commit d76a69e2 authored by nilabha's avatar nilabha

Changes for pure and ppo and imitation converted to use custom train fn

parent 4c60e6c2
Pipeline #5014 passed with stage
in 13 minutes and 1 second
flatland-sparse-small-tree-fc-apex-il-trainer:
run: ImitationAgent
env: flatland_sparse
stop:
timesteps_total: 15000000 # 1.5e7
checkpoint_freq: 50
checkpoint_at_end: True
keep_checkpoints_num: 50
checkpoint_score_attr: episode_reward_mean
num_samples: 3
config:
num_workers: 13
num_envs_per_worker: 5
num_gpus: 0
clip_rewards: False
vf_clip_param: 500.0
entropy_coeff: 0.01
# effective batch_size: train_batch_size * num_agents_in_each_environment [5, 10]
# see https://github.com/ray-project/ray/issues/4628
train_batch_size: 1000 # 5000
rollout_fragment_length: 50 # 100
sgd_minibatch_size: 100 # 500
vf_share_layers: False
env_config:
observation: tree
observation_config:
max_depth: 2
shortest_path_max_depth: 30
generator: sparse_rail_generator
generator_config: small_v0
wandb:
project: flatland-paper
entity: aicrowd
tags: ["small_v0", "tree_obs", "apex_rllib_il"] # TODO should be set programmatically
model:
fcnet_activation: relu
fcnet_hiddens: [256, 256]
vf_share_layers: False # Should be same as ppo vf_shared_layers
evaluation_num_workers: 2
# Enable evaluation, once per training iteration.
evaluation_interval: 3
evaluation_interval: 1
# Run 1 episode each time evaluation runs.
evaluation_num_episodes: 2
# Override the env config for evaluation.
......
This diff is collapsed.
This diff is collapsed.
......@@ -14,7 +14,7 @@ from ray.tune.resources import resources_to_json
from ray.tune.tune import _make_scheduler
from utils.argparser import create_parser
from utils.loader import load_envs, load_models
from utils.loader import load_envs, load_models, load_algorithms
from envs.flatland import get_eval_config
from ray.rllib.utils import merge_dicts
......@@ -29,6 +29,8 @@ torch, _ = try_import_torch()
# Register all necessary assets in tune registries
load_envs(os.getcwd()) # Load envs
load_models(os.getcwd()) # Load models
from algorithms import CUSTOM_ALGORITHMS
load_algorithms(CUSTOM_ALGORITHMS) # Load algorithms
def on_episode_end(info):
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment