Create Proper Baseline
What: Create a basic learning agent that will remain (mostly?) frozen against which proper comparisons can be made. In particular, the architecture seen in Figure 3 of https://arxiv.org/pdf/2006.13760.pdf should be runnable in RLLib.
Why: Ensures that touted improvements can be demonstrated to be better