Malfunction last step missed
I simulated further until the agent’s malfunction ends and it seems that the agent “exits” from the malfunction with the position_fraction that I was expecting it to have before the malfunction started (in this case: 0.666666). To give some concrete data for the same agent as before:
I read from env.agents the following data: position_fraction=0.333333 malfunction=1 next_malfunction=40
I call env.step(…)
I read from env.agents the following data: position_fraction=0.666666 malfunction=0 next_malfunction=40
So it seems that the move from position_fraction 0.333333 to 0.666666 is not “lost”, but rather delayed. I guess it’s all caused by a different expectation of when malfunction is updated.
From these examples, I guess malfunction is updated at the beginning of the env.step(…) call, while to me it seems more natural to have it updated at the end of env.step(…), so that:
malfunction >= 1 means the agent is blocked for that many env.step(…) calls (now it doesn’t mean that)
next_malfunction >= 1 means that there are that many env.step(…) calls left before the agent is blocked by the next malfunction (now it doesn’t mean that)
Is there any reason for the current behavior compared to the one I’m expecting? Of course, now that I sort of reverse engineered the issue, I can work around it, but it still seems a bit unnatural to me.