Group by scores in the evaluation service by the test groups (#321) · Issues · Flatland / Flatland

Group by scores in the evaluation service by the test groups

At the moment, the evaluation service simply takes a mean of rewards across all the episodes. We want to compute the final scores by grouping all the envs by the test groups. So the final scores will be mean of mean of the said score in a single test group.