Synchronization of agent steps

Story

As a Project Member, I want to discuss and update the way steps are executed in agents. Currently each agents action is executed squentially. This means that optimizers struggle to find optimal solutions as there is an asymmetry introduced by the agent index. We want to change this behavior by removing this assymetry by either:

There is a minimal release time of each cell (e.g. 1 step). This means that a new agent can only enter a cell when the previous cell has been empty for at least one time step.
We test for all legal actions sequentially and only update the agent true position after all agents have performed their tests. This however will introduce potential conflicts where multiple agents want to enter the same cell at the same time. This could be regarded as a crash and environment would terminate. This is too much of a change for the current running challenge

Acceptance Criteria

We agree as a team on what way to go
We open an issue to resolve the problem by implementing the discussed solution