Reward function test
Write a test that validated reward function:
- Reward is always
-time_penalty
even if:- No action is chosen
- invalid action is chosen
- agents get stuck at each other
- Reward is + 1 for all agents if and only if all agents reach target