README.md 4.88 KB
Newer Older
bowu's avatar
bowu committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# IJCAI2022-NMMO-PVE-STAGE1 BASELINES
## Install
```bash
pip install git+http://gitlab.aicrowd.com/henryz/ijcai2022nmmo.git
pip install -r requirements.txt
```

A modified [monobeast](https://github.com/facebookresearch/) baseline is provided in `monobeast/`. 
- `monobeast/my-submission/`: Code for submission. 
    - For successful submission, one must copy all files under this directory and the model checkpoint to [`ijcai2022-nmmo-starter-kit/my-submission/`](https://gitlab.aicrowd.com/neural-mmo/ijcai2022-nmmo-starter-kit/-/tree/main/my-submission).
- `monobeast/training/`: Code for training.

## Train and evaluation
```bash
cd monobeast/training

# train
bash train.sh

# plot
python plot.py

# local evaluation
cd monobeast/my-submission
python eval.py
```

## Implementation Details

bowu's avatar
bowu committed
30
We provide a baseline implementation. It is a improved version based on the previous baseline[tag]. It can reach 0.8 top1ratio in 2 days' training, using 1 V100 and 8 cpu cores. The following is the training curve
bowu's avatar
bowu committed
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

![training curve](plot.png)


### **Overview: RL Move + Scripted Attack**

The action consists of two types, move and attack, which can be executed simultaneously. Forage happens by itself whenever a player steps on a grass or water tile and therefore is not considered into action space.

For simplicity, we only use RL to learn move strategy and use a script for attack strategy. The script always targets the closest defeatable enemy.

```python
class Attack(Scripted):
    '''attack'''
    name = 'Attack_'

    def __call__(self, obs):
        super().__call__(obs)

        self.scan_agents()
        self.target_weak()
        self.style = nmmo.action.Range
        self.attack()
        return self.actions


class AttackTeam(ScriptedTeam):
    agent_klass = Attack
```

### **Feature and network**
We convert the observation to a 15\*15\*17 (width, height, channel) local map. The content for each channel is described as below. For details refer to [`FeatureParser`](./monobeast/training/torchbeast/neural_mmo/train_wrapper.py)

- terrain (channel 0-6): lava, water, grass, scrub, forest, stone
- camp (channel 7-10): none, teammate, npc, opponent
- entity (channel 11-16): level, damage, timealive, food, water, health, is_freezed

```python
feature_spec = {
    "terrain": spaces.Box(low=0, high=6, shape=(15, 15), dtype=np.int64),
    "camp": spaces.Box(low=0, high=4, shape=(15, 15), dtype=np.int64),
    "entity": spaces.Box(low=0,
                            high=4,
                            shape=(7, 15, 15),
                            dtype=np.float32),
    "va": spaces.Box(low=0, high=2, shape=(5, ), dtype=np.int64),
}
```

The network is implemented as a 5-layer CNN. See [net.py](./monobeast/training/torchbeast/neural_mmo/net.py) for details.

### **Reward**
We provide a simple design for computing reward, shown as below. See
[`this`](./monobeast/training/torchbeast/neural_mmo/train_wrapper.py) for implementation details.
```math
bowu's avatar
bowu committed
85
R_t = \Delta_{PlayerDefeats} + \Delta_{Equipment} + \Delta_{Exploration} + \Delta_{Foraging}
bowu's avatar
bowu committed
86
87
88
89
90
91
92
```

It is a dense reward that provides frequent instructional signal to players, 
which makes learning easy but has no direct relation with [evaluation metrics](https://www.aicrowd.com/challenges/ijcai-2022-the-neural-mmo-challenge#evaluation). 



bowu's avatar
bowu committed
93
### **Hints for getting stronger agents...**
bowu's avatar
bowu committed
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
- Learn attack strategy using RL.
- Optimize feature design and network architecture.
- Use team reward instead of individual reward.
- Tune hyper-parameters.
- Advanced options: distributed RL, self-play, league training, PBT, ... 


## Baselines based on other frameworks
The [NeuralMMO-baselines](https://github.com/NeuralMMO/baselines/tree/ijcai-competition) (ijcai-competition branch) has implemented some baseline agents based on commonly used RL frameworks, such as [cleanrl](https://github.com/vwxyzjn/cleanrl), [sb3](https://github.com/DLR-RM/stable-baselines3), [rllib](https://github.com/ray-project/ray/tree/master/rllib). These baselines are provided to the participants who is familiar with and prefer these frameworks. Choose your favorite to implement your own agent.


## FAQ

##### 1. How can I speed up training?
Ans: You can increase `num_actors`, but restrict to the maximum available cpu cores you have. 

For example, if you have a machine with 16 cores,  you can set `num_actors` to 15 for the fastest training speed. However, this will consume most of your compute resource and make your machine very slow. We recommend setting `num_actors` to 12 at this circumstance.

##### 2. How do I handle "unable to open shared memory object" error when run monobeast baseline?
Ans: This error is usually encountered by the number of open file descriptors exceeding your system's limits. 

You can raise them or try smaller `num_actors, batch_size, unroll_length`. Please refer to [pytorch multiprocessing document](https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies) for more information.