Commit ae1d25d2 authored by roberta_raileanu's avatar roberta_raileanu
Browse files

Fix some typos and links.

parent 78a7ed3e
......@@ -11,19 +11,19 @@
%% Cell type:markdown id:fdc6fcd4 tags:
# What is NetHack?
NetHack is a [roguelike](https://en.wikipedia.org/wiki/Roguelike#:~:text=Roguelike%20(or%20rogue%2Dlike),death%20of%20the%20player%20character.) computer game, which was first introduced in the late 1980s. At the beginning of the game your hero is placed into a dungeon, with the goal to descend to the bottom of over 50 procedurally generated levels to retrieve the Amulet of Yendor. Once obtained, your hero must subsequently escape the dungeon, unlocking five extremely challenging final levels, before offering the Amulet to your in-game deity.
NetHack is a [roguelike](https://en.wikipedia.org/wiki/Roguelike) computer game, which was first introduced in the late 1980s. At the beginning of the game your hero is placed into a dungeon, with the goal to descend to the bottom of over 50 procedurally generated levels to retrieve the Amulet of Yendor. Once obtained, your hero must subsequently escape the dungeon, unlocking five extremely challenging final levels, before offering the Amulet to your in-game deity.
A key component of NetHack is that it is *visually* simple, with observations solely making use of ascii characters, yet it is complex in almost every other way!
There are several reasons why it is particularly challengng:
There are several reasons why it is particularly challenging:
1) The game is randomized, with everything from the layout of maps to the impact of actions based on the roll of a dice.
1) The game is randomized, with everything from the map layouts to the impact of actions based on the roll of a dice.
2) Unlike modern games, it is impossible to save, instead when you die you begin from scatch. Given the randomness (see above) this makes it especially "unforgiving" (as described on the wiki). Indeed, deaths are so common there is even an acronym - YASD, which stands for Yet Another Stupid Death.
2) Unlike modern games, it is impossible to save. Instead when you die, you begin from scatch. Given the game's randomness (see above) this makes it especially "unforgiving" (as described on the wiki). Indeed, deaths are so common there is even an acronym - YASD, which stands for Yet Another Stupid Death.
3) It is incredibly complex, with hundreds of different characters to observe and many more potential sequences of actions.
Thus, unlike other games played by AI agents, NetHack is not solvable by the average human in just a few hours of gameplay. Instead - expert players often take many years to solve it - assuming they are even able to!
......@@ -42,11 +42,11 @@
> The player character can be any one of the following roles: archeologist, barbarian, cave[wo]man, healer, knight, monk, priest[ess], ranger, rogue, samurai, tourist, valkyrie, or wizard. They each have varying difficulties, strengths, weaknesses, quests and starting items.
>
> The player can also choose from the five races: human, elf, dwarf, gnome, or orc, and the three alignments: lawful, neutral or chaotic. The available races and alignments are dependent on the role one picks.
Each different starting combination will alter the game experience, and thus impact the difficulty of the game and the most suitable strategy. For example, wizards start with magic and magical items, while rangers begin with a bow and arrow; elves are generally intelligent whereas dwarves will be strong!
Each starting combination will alter the game experience, and thus impact the difficulty of the game and the most suitable strategy. For example, wizards start with magic and magical items, while rangers begin with a bow and arrow; elves are generally intelligent whereas dwarves are strong!
It's worth noting these different starting characters can really affect the performance of agents learning to play the game. In the original NLE paper, agents on the Score task (most similar to the NetHack Challenge) averaged 738 for monk, 538 for valkyrie, 314 for wizard - but only 11 for tourist! For the purposes of the NetHack Challenge, the character is randomized during evaluation for the competition, so it is likely wise to consider agents that can perform well across a variety of hero configurations.
%% Cell type:markdown id:1e8bc401 tags:
......@@ -64,16 +64,16 @@
* `.` : Dungeon Floor
* `<` and `>` : Stairs up and down
* `|` and `-` : Walls
* `+` : Doors
While it is also common to see Fountains: `{`, Traps: `^`, Altars: `_` and Hallways: `#`.
while it is also common to see Fountains: `{`, Traps: `^`, Altars: `_` and Hallways: `#`.
#### Items
NetHack has a [vast number of items](https://nethackwiki.com/wiki/Item) for in-game use, and many objects can be picked up and included in inventory. Once included, the agent can choose to use them in a number of different ways - often with some imaginative consequences: you can `apply` a towel to a weapon to clean off grease, but you can `wear` it too (it will wrap around your head)!
NetHack has a [vast number of items](https://nethackwiki.com/wiki/Item) for in-game use, and many objects can be picked up and included in the inventory. Once included, the agent can choose to use them in a number of different ways - often with some imaginative consequences: you can `apply` a towel to a weapon to clean off grease, but you can `wear` it too (it will wrap around your head)!
Heros will need to use items as best as possible to navigate the dungeons, not least in finding fresh food to eat (unless they can find a [different way](https://nethackwiki.com/wiki/Prayer) to stave off hunger)...
Heros will need to use items as best as possible to navigate the dungeons, not least in finding fresh food to eat (unless they can find a [different way](https://nethackwiki.com/wiki/Prayer) to stave off hunger).
%% Cell type:markdown id:143f1ca8 tags:
......@@ -87,23 +87,23 @@
%% Cell type:markdown id:b449cc83 tags:
#### Taking Actions
In order to make the vast array of complex skills possible to achieve, NetHack has a large action space (referred to as `commands`). The game of NetHack takes inputs directly corresponding to keys on the keyboard, including modifiers such as ctrl, shift and meta. The [full list of commands](https://nethackwiki.com/wiki/Commands_(by_key)) is extensive, including both actions, and meta-commands such as help, or viewing the inventory.
In order to make the vast array of complex skills possible to achieve, NetHack has a large action space (referred to as `commands`). The game of NetHack takes inputs directly corresponding to keys on the keyboard, including modifiers such as ctrl, shift and meta. The [full list of commands](https://nethackwiki.com/wiki/Commands) is extensive, including both actions and meta-commands such as help or viewing the inventory.
For the NetHack Challenge we provide an action space that is as close to full set of commands as possible - blocking only a few commands like modifying option settings. This should provide a significant challenge to all AI agents, while also offering them the potential to fully master the game. We note that it may be worthwhile to constrain this with some inductive bias, possibly even considering a curriculum of [increasing action space](http://proceedings.mlr.press/v119/farquhar20a.html).
For the NetHack Challenge we provide an action space that is as close to the full set of commands as possible - blocking only a few commands like modifying option settings. This should provide a significant challenge to all AI agents, while also offering them the potential to fully master the game. We note that it may be worthwhile to constrain this with some inductive bias, possibly even considering a curriculum of [increasing action spaces](http://proceedings.mlr.press/v119/farquhar20a.html).
%% Cell type:markdown id:85bfc579 tags:
#### Structure of the NetHack world
The collective name for all levels of the game is the "Mazes of Menace". Your heor starts on the inital Dungeons of Doom, which is above the underworld Gehennom and below the five Planes which form the final stages of the game.
The collective name for all levels of the game is the "Mazes of Menace". Your hero starts on the inital Dungeons of Doom, which is above the underworld Gehennom and below the five Planes which form the final stages of the game.
The Dungeons also contain various branches, the locations of which are often randomized. For example, the Gnomish Mines will always be generated between dungeon levels 2-4. There is also a Sokoban branch, located between levels 2-9. In order to reach the Amulet (and win the game), adventurers must complete the Quest, another branch, the location of which varies depending on the role.
The Dungeons also contain various branches, the locations of which are often randomized. For example, the Gnomish Mines will always be generated between dungeon levels 2 and 4. There is also a Sokoban branch, located between levels 2 and 9. In order to reach the Amulet (and win the game), adventurers must complete the Quest, another branch, the location of which varies depending on the role.
This is just a brief foray into the details of the game, for more detail on the Mazes of Menace see the [nethackwiki page](https://nethackwiki.com/wiki/Mazes_of_Menace).
This is just a brief foray into the details of the game. For more detail on the Mazes of Menace see the [nethackwiki page](https://nethackwiki.com/wiki/Mazes_of_Menace).
%% Cell type:markdown id:bc29bf70 tags:
# What is the NetHack Learning Environment (NLE)
......@@ -112,11 +112,11 @@
### `NetHackChallenge-v0`
The NLE contains different NetHack based tasks for agent training, but a new environment has been created especially for the competition: 'NetHackChallenge-v0'. The new environment is based on the 'NetHackScore-v0' task used in the NeurIPS paper, but contains some key modifications to bring out the full experience of NetHack. These are:
* The action space of the environment is greatly expanded to allow all keys on the keyboard
* Menus, yes/no questions, cursor-movement, and text-input modalities are enabled.
* A random character (represented as '@' ) instead of a single default (eg 'mon-hum-neu-mal')
* A random character (represented as '@' ) instead of a single default (e.g. 'mon-hum-neu-mal')
This makes the game particularly challenging, while also providing additional opportunity for savvy agents!
NLE is loaded as a gym environment, with all the typical functions that reinforcement learning (RL) researchers will be familiar with. For those using a symbolic approach, this means we typically follow the following few steps:
......@@ -128,11 +128,11 @@
action = agent.act(obs) # agent processes observation and computes an action
obs, reward, done, info = env.step(action) # updates the new observation and provides the reward/done
total_reward += reward # keep track of cumulative reward
```
When the episode is over (very likely YASD) the total_reward will be the score of the agent, used for training RL agents, and to get an idea of the current performance for symbolic ones.
When the episode is over (very likely YASD) the total_reward will be the score of the agent. This is used to train RL agents and to get an idea of the current performance of symbolic agents.
## Code Examples
%% Cell type:code id:5a513468 tags:
......@@ -172,14 +172,14 @@
#### Observing the Dungeon
The elements **`glyphs`**, **`chars`**, **`colors`**, and **`specials`** are tensors representing the (batched) 2D symbolic observation of the dungeon. Our agents primarily use the first three.
* **`glyphs`** - are the single integers representing the specific object at a square in the dungeon (eg a hell-hound)
* **`chars`** - are the characters used to render the glyphs on the screen (eg `d`)
* **`colors`** - are the colors used to render the glyphs on the screen (eg red)
* **`specials`** - are any special modifications to render the glyphs on the screen (eg its invisible!)
* **`glyphs`** - they single integers representing the specific object at a square in the dungeon (e.g. a hell-hound)
* **`chars`** - the characters used to render the glyphs on the screen (e.g. `d`)
* **`colors`** - the colors used to render the glyphs on the screen (e.g. red)
* **`specials`** - any special modifications to render the glyphs on the screen (e.g. its invisible!)
%% Cell type:code id:7c8649d5 tags:
``` python
......@@ -191,11 +191,11 @@
%% Cell type:markdown id:ab199a8b tags:
#### BLStats and Message
Along the top of the screen is a topline message that the game uses to communicate with you. Paying close attention to what the game can often result in the difference between life and death! The encoding of this message is presented in the observation **`message`**
Along the top of the screen is a topline message that the game uses to communicate with you. Paying close attention to what the game tells you can often result in the difference between life and death! The encoding of this message is presented in the observation **`message`**
Also of interest are the stats along the bottom line of the screen. These are extract in **`blstats`** and contain a lot of useful infomation visible below.
%% Cell type:code id:36b8f0b0 tags:
......@@ -245,18 +245,18 @@
%% Cell type:markdown id:ccc71153 tags:
#### Terminal Rendering
Finally NLE provides you with the raw outputs of the terminal screen, should you decide you want to learn from this. This allows you to render menus and popups that might not otherwise be shown on the dungeon.
Finally NLE provides you with the raw outputs of the terminal screen, should you decide you want to learn from these. This allows you to render menus and popups that might not otherwise be shown on the dungeon.
The observations are simple:
* **`tty_chars`** the characters at each point on the screen
* **`tty_colors`** the colors at each point on the screen
* **`tty_cursor`** the location of the cursor on the screen (note!:its not always on the hero!)
* **`tty_cursor`** the location of the cursor on the screen (NOTE: it's not always on the hero!)
These first two are whats rendered when you call `env.render()` in human mode, and the cursor is pretty self explanatory.
These first two are what's rendered when you call `env.render()` in human mode, and the cursor is pretty self explanatory.
%% Cell type:code id:fdd30978 tags:
``` python
print(obs['tty_cursor'])
......@@ -265,20 +265,20 @@
%% Cell type:markdown id:250d9f6a tags:
# Next Steps?
Included in the starter kit is a [Torchbeast](https://arxiv.org/abs/1910.03552) implementation of [IMPALA](https://arxiv.org/abs/1802.01561), a large scale distributed RL algorithm, adapted for NLE. A similar model was used in the original NLE paper to produce non-trivial learning curves for environments such as NetHackScore-v0.
Included in the starter kit is a [Torchbeast](https://arxiv.org/abs/1910.03552) implementation of [IMPALA](https://arxiv.org/abs/1802.01561), a large scale distributed RL algorithm adapted for NLE. A similar model was used in the original NLE paper to produce non-trivial learning curves for environments such as NetHackScore-v0.
In the original NLE paper, the agent architecture was as follows:
![Model](./model.png)
As can be seen, the model utilized both an agent centric view and a global view, which are both processed with convolutional neural network (CNN) layers. In addition, the blstats are processed with an MLP. Finally, the embeddings are passed into an LSTM to deal with partial observability.
The baseline is almost identical except with one key difference - we haven't added an CNN encoder for the `message` observation. This architecture may provide a promising starting point for development, but the sky is the limit for new ideas! Check out the [README.md](./nethack_baselines/torchbeast/README.md) to get started!
The baseline is almost identical except one key difference - we haven't added a CNN encoder for the `message` observation. This architecture may provide a promising starting point for development, but the sky is the limit for new ideas! Check out the [README.md](./nethack_baselines/torchbeast/README.md) to get started!
%% Cell type:markdown id:af86ddfe tags:
And if you want to learn more about NetHack, checkout:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment