Save episode timesteps for Javascript renderer
- provide a flag in the RailEnv constructor to store the episode information
- Record agents' position and orientation, timestep by timestep, in the env
- include the agent positions and orientations in the msgpack serialized env, under an episode key (maybe store an array of episodes)
So create a new key "episodes" in the root dict, and episodes--*episode--*timestep--*agent-[row, col, orientation] where A--*B means A contains many Bs. agent here just means a tuple of (int row, int col, int orientation) alternatively, episodes has a list of episodes, each episode is a 3d-array of with dimensions timesteps x agents x [row, col, orientation]