Provide Information for to make reward shaping
I think the environment should return a dict with all (reward) relevant information, such as
agent tried to perform an illeagal move agent tried to move to an occupied cell agent reached goal I think it would also be useful to provide some of the checks in env.step() as functions. for example def is_valid_move(agent, action) --> bool