Skip to content
Snippets Groups Projects
Forked from Flatland / baselines
298 commits behind, 1 commit ahead of the upstream repository.

Experimental a3c implementation

Dependecies

  • pytorch 1.0 with gpu support
  • numpy

Installation

1.) Put all files in this folder into same folder as flatland OR 2.) Add path to flatland in "main.py" to sys.path

Running

  • Training: python3 main.py generate
  • Replay: python3 main.py test

Settings

PLEASE NOTE: THIS CODE IS EXPERIMENTAL AND AS IT IS!

Tested using complex rail generator with 5 Agents and 4o extra connections. The gridworld is 32x32, local observation is a cropped region of 8x8 out of the gridworld. Observation consist 3 temporal step

  • Transitions: 8x8 tensor float
  • Positions ( all agents): 8x8 tensor float
  • Position (agent x): 8x8 tensor float
  • Target (agent x): 8x8 tensor float

Fusion of 3 temporal steps -> state has size 3x4x8x8

The grid-size and the region-size are hardcoded in main.py!

Hyperparameters

FlatLink.n_step_return_ = 50 , length of n-step return FlatLink.gamma_ = 0.5, discount factor FlatLink.num_epochs = 20, epochs used for training model FlatLink.play_epochs = 50, overll number of training epochs (number of sequences play, train-model) FlatLink.thres = 2.00, threshold for choose random or predicted action FlatLink.max_iter = 300, maximum number steps allowed per game FlatLink.play_game_number = 10, number of games played per training epoch FlatLink.rate_decay = 0.97, decay rate for thres per epoch FlatLink.replay_buffer_size = 4000, replay buffer size (balanced buffer) FlatLink.min_thres = 0.05, minimum value for thres

FlatNet.loss_value_ = 0.01, weight for value loss FlatNet.loss_entropy_ = 0.05, weight for entropy loss

Remarks

Convergation rate is very slow, no proper training weigths available at this time.