Changes for pure and ppo and imitation converted to use custom train fn

1 job for rllib-IL in 13 minutes and 1 second (queued for 52 seconds)