I think there is a little bug in training_navigation.py

In torch_training/training_navigation.py at lines 151 you passed the same state to both arguments state and next_state of agent.step() method.

            for a in range(env.get_num_agents()):
                # Only update the values when we are done or when an action was taken and thus relevant information is present
                if update_values or done[a]:
                    agent.step(agent_obs_buffer[a], agent_action_buffer[a], all_rewards[a],
                               agent_obs[a], done[a])
                    cummulated_reward[a] = 0.

                    agent_obs_buffer[a] = agent_obs[a].copy()
                    agent_action_buffer[a] = action_dict[a]
                if next_obs[a]:
                    agent_obs[a] = normalize_observation(next_obs[a], tree_depth, observation_radius=10)

I suggest the following change:

        for a in range(env.get_num_agents()):
            if next_obs[a]:
                agent_next_obs[a] = normalize_observation(next_obs[a], tree_depth, observation_radius=10)
                agent_obs[a] = agent_next_obs[a]
            # Only update the values when we are done or when an action was taken and thus relevant information is present
            if update_values or done[a]:
                agent.step(agent_obs_buffer[a], agent_action_buffer[a], all_rewards[a],
                           agent_next_obs[a], done[a])
                cummulated_reward[a] = 0.

                agent_obs_buffer[a] = agent_obs[a].copy()
                agent_action_buffer[a] = action_dict[a]

(Moreover there is a non-python typo in cummulated_reward)

Let me know if it can help you and sorry if I wasted your time and

Edited Nov 24, 2019 by lorenzo_palloni