Experimental a3c implementation
Dependecies
- pytorch 1.0 with gpu support
- numpy
Installation
1.) Put all files in this folder into same folder as flatland OR 2.) Add path to flatland in "main.py" to sys.path
Running
- Training: python3 main.py generate
- Replay: python3 main.py test
Settings
PLEASE NOTE: THIS CODE IS EXPERIMENTAL AND AS IT IS!
Tested using complex rail generator with 5 Agents and 4o extra connections. The gridworld is 32x32, local observation is a cropped region of 8x8 out of the gridworld. Observation consist 3 temporal step
- Transitions: 8x8 tensor float
- Positions ( all agents): 8x8 tensor float
- Position (agent x): 8x8 tensor float
- Target (agent x): 8x8 tensor float
Fusion of 3 temporal steps -> state has size 3x4x8x8
The grid-size and the region-size are hardcoded in main.py!
Hyperparameters
FlatLink.n_step_return_ = 50 , length of n-step return FlatLink.gamma_ = 0.5, discount factor FlatLink.num_epochs = 20, epochs used for training model FlatLink.play_epochs = 50, overll number of training epochs (number of sequences play, train-model) FlatLink.thres = 2.00, threshold for choose random or predicted action FlatLink.max_iter = 300, maximum number steps allowed per game FlatLink.play_game_number = 10, number of games played per training epoch FlatLink.rate_decay = 0.97, decay rate for thres per epoch FlatLink.replay_buffer_size = 4000, replay buffer size (balanced buffer) FlatLink.min_thres = 0.05, minimum value for thres
FlatNet.loss_value_ = 0.01, weight for value loss FlatNet.loss_entropy_ = 0.05, weight for entropy loss
Remarks
Convergation rate is very slow, no proper training weigths available at this time.