# RLlib NetHackChallenge Benchmark This is a baseline model for the NetHack Challenge based on [RLlib](https://github.com/ray-project/ray#rllib-quick-start). It comes with all the code you need to train, run and submit a model, and you can choose from a variety of algorithms implemented in RLlib. We provide default configuration and hyperparameters for 4 algorithms: * IMPALA * DQN * PPO * A2C You're not restricted to using these algorithms - others could be added with minimal effort in `train.py` and `util/loading.py`. This implementation runs many simultaneous environments with dynamic batching. ## Installation To get this running, you'll want to create a virtual environment (probably with conda) ```bash conda create -n nle-competition python=3.8 conda activate nle-competition ``` Then you'll want to install the requirements at the root of this repository, both from the `requirements.txt` and the `setup.py`: ```bash pip install -r requirements.txt pip install . -e ``` This will install the repository as a python package in editable mode, meaning any changes you make to the code will be recognised. ## Running The Baseline Once installed, from the root of the repository run: ```bash python nethack_baselines/rllib/train.py ``` This will train the default algorithm (IMPALA) with default hyperparameters. You can choose a different algorithm as follows: ```bash python nethack_baselines/rllib/train.py algo=ppo ``` You can also control other hyperparameters on the command line: ```bash python nethack_baselines/rllib/train.py algo=ppo num_sgd_iter=5 total_steps=1000000 ``` An important configuration is the number of cpus and gpus that are available, which can be set with `num_gpus` and `num_cpus` - the higher these numbers (especially cpus) the faster training will be. This configuration can also be changed the adjusting `config.yaml` The output of training will be in a directory `outputs` at the root of the repository, with each run having a date and time-based folder. ## Making a submission Once the training is complete, model checkpoints will be available in `outputs//