README.md



Experimental a3c implementation

Dependecies

pytorch 1.0 with gpu support
numpy


Installation
1.) Put all files in this folder into same folder as flatland
OR
2.) Add path to flatland in "main.py" to sys.path

Running

Training: python3 main.py generate
Replay: python3 main.py test


Settings
PLEASE NOTE: THIS CODE IS EXPERIMENTAL AND AS IT IS!
Tested using complex rail generator with 5 Agents and 4o extra connections. The gridworld is 32x32, local observation is a cropped region of 8x8 out of the gridworld. Observation consist 3 temporal step

Transitions:               8x8 tensor float
Positions ( all agents):   8x8 tensor float
Position (agent x):        8x8 tensor float
Target (agent x):          8x8 tensor float

Fusion of 3 temporal steps -> state has size 3x4x8x8
The grid-size and the region-size are hardcoded in main.py!

Hyperparameters
FlatLink.n_step_return_ = 50 , length of n-step return
FlatLink.gamma_ = 0.5, discount factor
FlatLink.num_epochs = 20, epochs used for training model
FlatLink.play_epochs = 50, overll number of training epochs (number of sequences play, train-model)
FlatLink.thres = 2.00, threshold for choose random or predicted action
FlatLink.max_iter = 300, maximum number steps allowed per game
FlatLink.play_game_number = 10, number of games played per training epoch
FlatLink.rate_decay = 0.97, decay rate for thres per epoch
FlatLink.replay_buffer_size = 4000, replay buffer size (balanced buffer)
FlatLink.min_thres = 0.05, minimum value for thres
FlatNet.loss_value_ = 0.01, weight for value loss
FlatNet.loss_entropy_ = 0.05, weight for entropy loss

Remarks
Convergation rate is very slow, no proper training weigths available at this time.