evaluator service timeout - return done all and -1 reward

instead of terminating the session, the episode should be terminated by setting done[__all__] and reward should be minus 1.

If possible also merge the episode action saving code from evaluator2 in neurips2020_flatland_baselines.

Questions: