diff --git a/README.md b/README.md index 83333603b1037b86168991a4e367ded7e213ded9..025677acde96011426e5fd7cf47c0fb3eba9e6d0 100644 --- a/README.md +++ b/README.md @@ -1,59 +1,76 @@  -# Nethack Challenge - Starter Kit +# **NeurIPS 2021 - The NetHack Challenge** - Getting started +* **Challenge page** - https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge +* **IMPORTANT - [Accept the rules before you submit](https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge/challenge_rules)** +* **Join the discord server** - https://discord.gg/zkFWQmSWBA +* Clone the starter kit to start competing - TODO Add final starter kit link -👉 [Challenge page](https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge) - - -💬 [Join the discord server](https://discord.gg/zkFWQmSWBA) - - -This repository is the Nethack Challenge **Submission template and Starter kit**! - -Clone the repository to compete now! - -**This repository contains**: +This repository is the Nethack Challenge **Submission template and Starter kit**! It contains: * **Documentation** on how to submit your models to the leaderboard * **The procedure** for best practices and information on how we evaluate your agent, etc. -* **Starter code** for you to get started! - +* **Baselines** for you to get started with training easily +<p style="text-align:center"><img style="text-align:center" src="https://raw.githubusercontent.com/facebookresearch/nle/master/dat/nle/example_run.gif"></p> # Table of Contents 1. [Competition Procedure](#competition-procedure) -2. [How to access and use dataset](#how-to-access-and-use-dataset) -3. [How to start participating](#how-to-start-participating) -4. [How do I specify my software runtime / dependencies?](#how-do-i-specify-my-software-runtime-dependencies-) -5. [What should my code structure be like ?](#what-should-my-code-structure-be-like-) -6. [How to make submission](#how-to-make-submission) -7. [Other concepts](#other-concepts) -8. [Important links](#-important-links) - - -<p style="text-align:center"><img style="text-align:center" src="https://raw.githubusercontent.com/facebookresearch/nle/master/dat/nle/example_run.gif"></p> # Competition Procedure The NetHack Learning Environment (NLE) is a Reinforcement Learning environment presented at NeurIPS 2020. NLE is based on NetHack 3.6.6 and designed to provide a standard RL interface to the game, and comes with tasks that function as a first step to evaluate agents on this new environment. You can read more about NLE in the NeurIPS 2020 paper. - We are excited that this competition offers machine learning students, researchers and NetHack-bot builders the opportunity to participate in a grand challenge in AI without prohibitive computational costs—and we are eagerly looking forward to the wide variety of submissions. **The following is a high level description of how this process works** - - 1. **Sign up** to join the competition [on the AIcrowd website](https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge). 2. **Clone** this repo and start developing your solution. 3. **Train** your models on NLE and write rollout code in `rollout.py`. 4. [**Submit**](#how-to-submit-a-model) your trained models to [AIcrowd Gitlab](https://gitlab.aicrowd.com) for evaluation [(full instructions below)](#how-to-submit-a-model). The automated evaluation setup will evaluate the submissions against the NLE environment for a fixed number of rollouts to compute and report the metrics on the leaderboard of the competition. -# How to run the environment + + +# Installation - Nethack Learning Environment -Install the environment from the [original nethack repository](https://github.com/facebookresearch/nle) +NLE requires `python>=3.5`, `cmake>=3.14` to be installed and available both when building the +package, and at runtime. + +On **MacOS**, one can use `Homebrew` as follows: + +``` bash +$ brew install cmake +``` + +On a plain **Ubuntu 18.04** distribution, `cmake` and other dependencies +can be installed by doing: + +```bash +# Python and most build deps +$ sudo apt-get install -y build-essential autoconf libtool pkg-config \ + python3-dev python3-pip python3-numpy git flex bison libbz2-dev + +# recent cmake version +$ wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | sudo apt-key add - +$ sudo apt-add-repository 'deb https://apt.kitware.com/ubuntu/ bionic main' +$ sudo apt-get update && apt-get --allow-unauthenticated install -y \ + cmake \ + kitware-archive-keyring +``` + +Afterwards it's a matter of setting up your environment. We advise using a conda +environment for this: + +```bash +$ conda create -n nle python=3.8 +$ conda activate nle +$ pip install git+https://github.com/facebookresearch/nle.git@eric/competition --no-binary:nle +``` + +Find more details on the [original nethack repository](https://github.com/facebookresearch/nle) # How to start participating @@ -63,7 +80,7 @@ Install the environment from the [original nethack repository](https://github.co You can add your SSH Keys to your GitLab account by going to your profile settings [here](https://gitlab.aicrowd.com/profile/keys). If you do not have SSH Keys, you will first need to [generate one](https://docs.gitlab.com/ee/ssh/README.html#generating-a-new-ssh-key-pair). -2. **Clone the repository** +2. **Clone the repository** - TODO ``` git clone git@github.com:AIcrowd/neurips-2021-nethack-starter-kit.git @@ -71,20 +88,22 @@ You can add your SSH Keys to your GitLab account by going to your profile settin 3. **Install** competition specific dependencies! ``` - cd neurips-2021-nethack-starter-kit - pip install -r requirements.txt + pip install aicrowd-api + pip install aicrowd-gym + + ## Install NLE according to the instructions above ``` 4. Try out random rollout script in `rollout.py`. -## How do I specify my software runtime / dependencies ? +## How do I specify my software runtime / dependencies ? - TODO We accept submissions with custom runtime, so you don't need to worry about which libraries or framework to pick from. -The configuration files typically include `requirements.txt` (pypi packages), `environment.yml` (conda environment), `apt.txt` (apt packages) or even your own `Dockerfile`. +The configuration files typically include `requirements.txt` (pypi packages), `apt.txt` (apt packages) or even your own `Dockerfile`. -You can check detailed information about the same in the 👉 [RUNTIME.md](/docs/RUNTIME.md) file. +You can check detailed information about the same in the [RUNTIME.md](/docs/RUNTIME.md) file. ## What should my code structure be like ? @@ -96,7 +115,7 @@ The different files and directories have following meaning: ├── aicrowd.json # Submission meta information - like your username ├── apt.txt # Packages to be installed inside docker image ├── requirements.txt # Python packages to be installed -├── rollout.py # Your rollout code +├── rollout.py # Your rollout code - can use a batched agent ├── run.sh # Submission entrypoint └── utility # The utility scripts to provide smoother experience to you. ├── docker_build.sh @@ -130,7 +149,7 @@ The submission entrypoint is a bash script `run.sh`, you can call any arbitrary 👉 [SUBMISSION.md](/docs/SUBMISSION.md) -**Best of Luck** 🎉 🎉 + # Other Information @@ -142,21 +161,23 @@ To be added. To be added. -## Contributing +## Contributing? - TODO To be added -## Contributors +## Contributors - TODO -- [Shivam Khandelwal](https://www.aicrowd.com/participants/shivam) - [Jyotish Poonganam](https://www.aicrowd.com/participants/jyotish) - [Dipam chakraborty](https://www.aicrowd.com/participants/dipam) +- [Shivam Khandelwal](https://www.aicrowd.com/participants/shivam) -# 📎 Important links +# 📎 Important links - TODO 💪 Challenge Page: https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge ðŸ—£ï¸ Discussion Forum: https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge/discussion 🆠Leaderboard: https://www.aicrowd.com/challenges/neurips-2021-nethack-challenge/leaderboards + +**Best of Luck** 🎉 🎉 \ No newline at end of file diff --git a/agent.py b/agent.py deleted file mode 100644 index 86d3a85a7efc6abc32d596c824b1d399cfa5fe69..0000000000000000000000000000000000000000 --- a/agent.py +++ /dev/null @@ -1,99 +0,0 @@ -import aicrowd_gym -import nle -import numpy as np -from tqdm import trange -from custom_wrappers import EarlyTerminationNethack - -from batched_env import BactchedEnv - -class BatchedAgent: - """ - Simple Batched agent interface - Main motivation is to speedup runs by increasing gpu utilization - """ - def __init__(self, num_envs): - """ - Setup your model - Load your weights etc - """ - self.num_envs = num_envs - - def preprocess_observations(self, observations, rewards, dones, infos): - """ - Add any preprocessing steps, for example rerodering/stacking for torch/tf in your model - """ - pass - - def batched_step(self): - """ - Return a list of actions - """ - pass - -class RandomBatchedAgent(BatchedAgent): - def __init__(self, num_envs, num_actions): - super().__init__(num_envs) - self.num_actions = num_actions - self.seeded_state = np.random.RandomState(42) - - def preprocess_observations(self, observations, rewards, dones, infos): - return observations, rewards, dones, infos - - def batched_step(self, observations, rewards, dones, infos): - rets = self.preprocess_observations(observations, rewards, dones, infos) - observations, rewards, dones, infos = rets - actions = self.seeded_state.randint(self.num_actions, size=self.num_envs) - return actions - - -if __name__ == '__main__': - - def nethack_make_fn(): - - # These settings will be fixed by the aicrowd evaluator - env = aicrowd_gym.make('NetHackChallenge-v0', - observation_keys=("glyphs", - "chars", - "colors", - "specials", - "blstats", - "message", - "tty_chars", - "tty_colors", - "tty_cursor",)) - - # This wrapper will always be added on the aicrowd evaluator - env = EarlyTerminationNethack(env=env, - minimum_score=1000, - cutoff_timesteps=50000) - - # Add any wrappers you need - - return env - - - # Change the num_envs as you need, for example reduce if your GPU doesn't fit - # but increasing above 32 is not advisable for the Nethack Challenge 2021 - num_envs = 16 - batched_env = BactchedEnv(env_make_fn=nethack_make_fn, num_envs=num_envs) - - # This part can be left as is - observations = batched_env.batch_reset() - rewards = [0.0 for _ in range(num_envs)] - dones = [False for _ in range(num_envs)] - infos = [{} for _ in range(num_envs)] - - # Change this to your agent interface - num_actions = batched_env.envs[0].action_space.n - agent = RandomBatchedAgent(num_envs, num_actions) - - # The evaluation setup will automatically stop after the requisite number of rollouts - # But you can change this if you want - for _ in trange(1000000000000): - - # Ideally this part can be left unchanged - actions = agent.batched_step(observations, rewards, dones, infos) - - observations, rewards, dones, infos = batched_env.batch_step(actions) - for done_idx in np.where(dones)[0]: - observations[done_idx] = batched_env.single_env_reset(done_idx) diff --git a/local_evaluation.py b/local_evaluation.py new file mode 100644 index 0000000000000000000000000000000000000000..1ae846db360bd8784da333f143d13b8b903e0636 --- /dev/null +++ b/local_evaluation.py @@ -0,0 +1,37 @@ +## This file is intended to emulate the evaluation on AIcrowd + +# IMPORTANT - Differences to expect +# * All the environment's functions are not available +# * The run might be slower than your local run +# * Resources might vary from your local machine + +from submission_agent import SubmissionConfig, LocalEvaluationConfig + +from rollout import run_batched_rollout +from nethack_baselines.utils.batched_env import BactchedEnv + + +# Ideally you shouldn't need to change anything below +def add_evaluation_wrappers_fn(env_make_fn): + max_episodes = LocalEvaluationConfig.LOCAL_EVALUATION_NUM_EPISODES + # TOOD: use LOCAL_EVALUATION_NUM_EPISODES for limiting episodes + return env_make_fn + +def evaluate(): + submission_env_make_fn = SubmissionConfig.submission_env_make_fn + num_envs = SubmissionConfig.NUM_PARALLEL_ENVIRONMENTS + Agent = SubmissionConfig.Submision_Agent + + evaluation_env_fn = add_evaluation_wrappers_fn(submission_env_make_fn) + batched_env = BactchedEnv(env_make_fn=evaluation_env_fn, + num_envs=num_envs) + + num_envs = batched_env.num_envs + num_actions = batched_env.num_actions + + agent = Agent(num_envs, num_actions) + + run_batched_rollout(batched_env, agent) + +if __name__ == '__main__': + evaluate() diff --git a/nethack_baselines/other_examples/random_rollouts.py b/nethack_baselines/other_examples/random_rollouts.py new file mode 100644 index 0000000000000000000000000000000000000000..1e1ca7f1b0f1ccda8c7f3408550b1ed0803c6893 --- /dev/null +++ b/nethack_baselines/other_examples/random_rollouts.py @@ -0,0 +1,27 @@ +# This is intended as an example of a barebones submission +# Do not that not using BatchedEnv not meet the timeout requirement. + +import aicrowd_gym +import nle + +def main(): + """ + This function will be called for training phase. + """ + + # This allows us to limit the features of the environment + # that we don't want participants to use during the submission + env = aicrowd_gym.make("NetHackChallenge-v0") + + env.reset() + done = False + episode_count = 0 + while episode_count < 200: + _, _, done, _ = env.step(env.action_space.sample()) + if done: + episode_count += 1 + print(episode_count) + env.reset() + +if __name__ == "__main__": + main() diff --git a/nethack_baselines/random_submission_agent.py b/nethack_baselines/random_submission_agent.py new file mode 100644 index 0000000000000000000000000000000000000000..f215651f8acdc1daef462281065d8f99062d6c45 --- /dev/null +++ b/nethack_baselines/random_submission_agent.py @@ -0,0 +1,21 @@ +import numpy as np + +from nethack_baselines.utils.batched_agent import BatchedAgent + +class RandomAgent(BatchedAgent): + def __init__(self, num_envs, num_actions): + super().__init__(num_envs, num_actions) + self.seeded_state = np.random.RandomState(42) + + def preprocess_observations(self, observations, rewards, dones, infos): + return observations, rewards, dones, infos + + def postprocess_actions(self, actions): + return actions + + def batched_step(self, observations, rewards, dones, infos): + rets = self.preprocess_observations(observations, rewards, dones, infos) + observations, rewards, dones, infos = rets + actions = self.seeded_state.randint(self.num_actions, size=self.num_envs) + actions = self.postprocess_actions(actions) + return actions \ No newline at end of file diff --git a/nethack_baselines/rllib_submission_agent.py b/nethack_baselines/rllib_submission_agent.py new file mode 100644 index 0000000000000000000000000000000000000000..48cdce85287243a96a9e7d47855104acbcb79837 --- /dev/null +++ b/nethack_baselines/rllib_submission_agent.py @@ -0,0 +1 @@ +placeholder diff --git a/nethack_baselines/torchbeast_submission_agent.py b/nethack_baselines/torchbeast_submission_agent.py new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/nethack_baselines/utils/batched_agent.py b/nethack_baselines/utils/batched_agent.py new file mode 100644 index 0000000000000000000000000000000000000000..71089572bec7f5a4fa4548264a932e39581d503c --- /dev/null +++ b/nethack_baselines/utils/batched_agent.py @@ -0,0 +1,30 @@ +class BatchedAgent: + """ + Simple Batched agent interface + Main motivation is to speedup runs by increasing gpu utilization + """ + def __init__(self, num_envs, num_actions): + """ + Setup your model + Load your weights etc + """ + self.num_envs = num_envs + self.num_actions = num_actions + + def preprocess_observations(self, observations, rewards, dones, infos): + """ + Add any preprocessing steps, for example rerodering/stacking for torch/tf in your model + """ + pass + + def preprocess_actions(self, actions): + """ + Add any postprocessing steps, for example converting to lists + """ + pass + + def batched_step(self): + """ + Return a list of actions + """ + pass diff --git a/batched_env.py b/nethack_baselines/utils/batched_env.py similarity index 97% rename from batched_env.py rename to nethack_baselines/utils/batched_env.py index a12c0fc4869f702448685882da274d7a490c4b6b..cff66a6286851dbb592f97d56096c72c19147057 100644 --- a/batched_env.py +++ b/nethack_baselines/utils/batched_env.py @@ -1,5 +1,4 @@ import gym -import nle import numpy as np from tqdm import trange from collections.abc import Iterable @@ -11,6 +10,7 @@ class BactchedEnv: """ self.num_envs = num_envs self.envs = [env_make_fn() for _ in range(self.num_envs)] + self.num_actions = self.envs[0].action_space.n # TODO: Can have different settings for each env? Probably not needed for Nethack def batch_step(self, actions): @@ -67,7 +67,7 @@ if __name__ == '__main__': "tty_colors", "tty_cursor",)) - num_envs = 16 + num_envs = 4 batched_env = BactchedEnv(env_make_fn=nethack_make_fn, num_envs=num_envs) observations = batched_env.batch_reset() num_actions = batched_env.envs[0].action_space.n diff --git a/custom_wrappers.py b/nethack_baselines/utils/evaluation_utils/custom_wrappers.py similarity index 100% rename from custom_wrappers.py rename to nethack_baselines/utils/evaluation_utils/custom_wrappers.py diff --git a/nethack_baselines/utils/nethack_env_creation.py b/nethack_baselines/utils/nethack_env_creation.py new file mode 100644 index 0000000000000000000000000000000000000000..893f63ca0fd7c029c1b015e2c5751bae61acad2c --- /dev/null +++ b/nethack_baselines/utils/nethack_env_creation.py @@ -0,0 +1,19 @@ +import nle + +# For your local evaluation, aicrowd_gym is completely identical to gym +import aicrowd_gym + +def nethack_make_fn(): + # These settings will be fixed by the AIcrowd evaluator + # This allows us to limit the features of the environment + # that we don't want participants to use during the submission + return aicrowd_gym.make('NetHackChallenge-v0', + observation_keys=("glyphs", + "chars", + "colors", + "specials", + "blstats", + "message", + "tty_chars", + "tty_colors", + "tty_cursor",)) \ No newline at end of file diff --git a/rollout.py b/rollout.py index 69146af9ad2d4005f1a4dd4733b0f12fba962b77..a267eb0e0d15e6a2aea4c91bd76e7940e74cea2f 100644 --- a/rollout.py +++ b/rollout.py @@ -1,28 +1,51 @@ #!/usr/bin/env python -# This file is the entrypoint for your submission -# You can modify this file to include your code or directly call your functions/modules from here. -import aicrowd_gym -import nle +############################################################ +## Ideally you shouldn't need to change this file at all ## +############################################################ -def main(): +import numpy as np + +from nethack_baselines.utils.batched_env import BactchedEnv +from submission_agent import SubmissionConfig + +def run_batched_rollout(batched_env, agent): """ - This function will be called for training phase. + This function will be called the rollout """ - # This allows us to limit the features of the environment - # that we don't want participants to use during the submission - env = aicrowd_gym.make("NetHackChallenge-v0") + num_envs = batched_env.num_envs + + # This part can be left as is + observations = batched_env.batch_reset() + rewards = [0.0 for _ in range(num_envs)] + dones = [False for _ in range(num_envs)] + infos = [{} for _ in range(num_envs)] - env.reset() - done = False episode_count = 0 - while episode_count < 20: - _, _, done, _ = env.step(env.action_space.sample()) - if done: + + # The evaluator will automatically stop after the episodes based on the development/test phase + while episode_count < 10000: + actions = agent.batched_step(observations, rewards, dones, infos) + + observations, rewards, dones, infos = batched_env.batch_step(actions) + for done_idx in np.where(dones)[0]: + observations[done_idx] = batched_env.single_env_reset(done_idx) episode_count += 1 - print(episode_count) - env.reset() + print("Episodes Completed :", episode_count) if __name__ == "__main__": - main() + + submission_env_make_fn = SubmissionConfig.submission_env_make_fn + NUM_PARALLEL_ENVIRONMENTS = SubmissionConfig.NUM_PARALLEL_ENVIRONMENTS + Agent = SubmissionConfig.Submision_Agent + + batched_env = BactchedEnv(env_make_fn=submission_env_make_fn, + num_envs=NUM_PARALLEL_ENVIRONMENTS) + + num_envs = batched_env.num_envs + num_actions = batched_env.num_actions + + agent = Agent(num_envs, num_actions) + + run_batched_rollout(batched_env, agent) diff --git a/run.sh b/run.sh index 8d5ce5024b87c19ce4ebffe3c37cb94150c62d70..76852fd5ce2094e13de8cae65ec63d98bb365477 100755 --- a/run.sh +++ b/run.sh @@ -1,4 +1,4 @@ #!/bin/bash -python agent.py +python rollout.py diff --git a/submission_agent.py b/submission_agent.py new file mode 100644 index 0000000000000000000000000000000000000000..76a7ea9af4019ce1ed8a32fe0a4b13a37c3efb48 --- /dev/null +++ b/submission_agent.py @@ -0,0 +1,40 @@ +from nethack_baselines.random_submission_agent import RandomAgent +# from nethack_baselines.torchbeast_submission_agent import TorchBeastAgent +# from nethack_baselines.rllib_submission_agent import RLlibAgent + +from wrappers import addtimelimitwrapper_fn + +################################################ +# Import your own agent code # +# Set Submision_Agent to your agent # +# Set NUM_PARALLEL_ENVIRONMENTS as needed # +# Set submission_env_make_fn to your wrappers # +# Test with local_evaluation.py # +################################################ + + +class SubmissionConfig: + ## Add your own agent class + # Submision_Agent = TorchBeastAgent + # Submision_Agent = RLlibAgent + Submision_Agent = RandomAgent + + + ## Change the NUM_PARALLEL_ENVIRONMENTS as you need + ## for example reduce it if your GPU doesn't fit + ## Increasing above 32 is not advisable for the Nethack Challenge 2021 + NUM_PARALLEL_ENVIRONMENTS = 16 + + + ## Add a function that creates your nethack env + ## Mainly this is to add wrappers + ## Add your wrappers to wrappers.py and change the name here + ## IMPORTANT: Don't "call" the function, only provide the name + submission_env_make_fn = addtimelimitwrapper_fn + + +class LocalEvaluationConfig: + # Change this to locally check a different number of rollouts + # The AIcrowd submission evaluator will not use this + # It is only for your local evaluation + LOCAL_EVALUATION_NUM_EPISODES = 50 diff --git a/wrappers.py b/wrappers.py new file mode 100644 index 0000000000000000000000000000000000000000..a89fe5ee99e0662942b221db350b3ea39778d130 --- /dev/null +++ b/wrappers.py @@ -0,0 +1,12 @@ +from gym.wrappers import TimeLimit + +from nethack_baselines.utils.nethack_env_creation import nethack_make_fn + +def addtimelimitwrapper_fn(): + """ + An example of how to add wrappers to the nethack_make_fn + Should return a gym env which wraps the nethack gym env + """ + env = nethack_make_fn() + env = TimeLimit(env, max_episode_steps=10_000_0000) + return env \ No newline at end of file