I think there is a little bug in training_navigation.py
In torch_training/training_navigation.py
at lines 151 you passed the same state to both arguments state and next_state of agent.step() method.
for a in range(env.get_num_agents()):
# Only update the values when we are done or when an action was taken and thus relevant information is present
if update_values or done[a]:
agent.step(agent_obs_buffer[a], agent_action_buffer[a], all_rewards[a],
agent_obs[a], done[a])
cummulated_reward[a] = 0.
agent_obs_buffer[a] = agent_obs[a].copy()
agent_action_buffer[a] = action_dict[a]
if next_obs[a]:
agent_obs[a] = normalize_observation(next_obs[a], tree_depth, observation_radius=10)
I suggest the following change:
for a in range(env.get_num_agents()):
if next_obs[a]:
agent_next_obs[a] = normalize_observation(next_obs[a], tree_depth, observation_radius=10)
agent_obs[a] = agent_next_obs[a]
# Only update the values when we are done or when an action was taken and thus relevant information is present
if update_values or done[a]:
agent.step(agent_obs_buffer[a], agent_action_buffer[a], all_rewards[a],
agent_next_obs[a], done[a])
cummulated_reward[a] = 0.
agent_obs_buffer[a] = agent_obs[a].copy()
agent_action_buffer[a] = action_dict[a]
(Moreover there is a non-python typo in cummulated_reward)
Let me know if it can help you and sorry if I wasted your time and
Edited by lorenzo_palloni