nethack issueshttps://gitlab.aicrowd.com/groups/nethack/-/issues2021-07-18T10:07:02Zhttps://gitlab.aicrowd.com/nethack/neurips-2021-the-nethack-challenge/-/issues/11crash with hydra colorlog error2021-07-18T10:07:02Zchristophe_cerisaracrash with hydra colorlog errorI installed first torchbeast as stated (on a RedHat linux with 1 high-end GPU), then pip install in the same conda environment the requirements from this repo, and finally run
```
HYDRA_FULL_ERROR=1 python polyhydra.py actor_device=cpu
...I installed first torchbeast as stated (on a RedHat linux with 1 high-end GPU), then pip install in the same conda environment the requirements from this repo, and finally run
```
HYDRA_FULL_ERROR=1 python polyhydra.py actor_device=cpu
```
it crashes with error:
```
[DEBUG:49231 cmd:817 2021-06-13 15:48:15,820] Popen(['git', 'version'], cwd=/gpfsdswork/projects/rech/knb/uyr14tk/home/xtofNLE/neurips-2021-the-nethack-challenge/nethack_baselines/torchbeast, universal_newlines=False, shell=None, istream=None)
/gpfsdswork/projects/rech/knb/uyr14tk/home/xtofNLE/neurips-2021-the-nethack-challenge/nethack_baselines/torchbeast/polyhydra.py:108: UserWarning:
config_path is not specified in @hydra.main().
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_hydra_main_config_path for more information.
@hydra.main(config_name="config")
[DEBUG:49231 utils:252 2021-06-13 15:48:16,605] Setting JobRuntime:name=UNKNOWN_NAME
[DEBUG:49231 utils:252 2021-06-13 15:48:16,606] Setting JobRuntime:name=polyhydra
/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py:389: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/defaults_list_override for more information.
warnings.warn(msg, UserWarning)
Traceback (most recent call last):
File "/gpfsdswork/projects/rech/knb/uyr14tk/home/xtofNLE/neurips-2021-the-nethack-challenge/nethack_baselines/torchbeast/polyhydra.py", line 149, in <module>
main()
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/main.py", line 49, in decorated_main
_run_hydra(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/utils.py", line 367, in _run_hydra
run_and_report(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/utils.py", line 368, in <lambda>
lambda: hydra.run(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 87, in run
cfg = self.compose_config(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 564, in compose_config
cfg = self.config_loader.load_configuration(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 146, in load_configuration
return self._load_configuration_impl(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 239, in _load_configuration_impl
defaults_list = create_defaults_list(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 719, in create_defaults_list
defaults, tree = _create_defaults_list(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 689, in _create_defaults_list
defaults_tree = _create_defaults_tree(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 337, in _create_defaults_tree
ret = _create_defaults_tree_impl(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 420, in _create_defaults_tree_impl
return _expand_virtual_root(repo, root, overrides, skip_missing)
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 262, in _expand_virtual_root
subtree = _create_defaults_tree_impl(
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 476, in _create_defaults_tree_impl
_update_overrides(defaults_list, overrides, parent, interpolated_subtree)
File "/gpfswork/rech/knb/uyr14tk/home/.conda/envs/torchbeast/lib/python3.9/site-packages/hydra/_internal/defaults_list.py", line 367, in _update_overrides
raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: In config: Override 'hydra/job_logging : colorlog' is defined before 'hydra/hydra_logging: colorlog'.
Overrides must be at the end of the defaults list
```https://gitlab.aicrowd.com/nethack/neurips-2021-the-nethack-challenge/-/issues/14loading pre-trained checkpoint failed2021-09-24T10:12:17ZGhost Userloading pre-trained checkpoint failed@eric_hammy I set `AGENT = TorchBeastAgent` in submission_config.py, and `MODEL_DIR = "./saved_models/torchbeast/pretrained_0.5B"` in agents/torchbeast_agent.py.
Then I ran `$ python test_submission.py ` but it gives me the following er...@eric_hammy I set `AGENT = TorchBeastAgent` in submission_config.py, and `MODEL_DIR = "./saved_models/torchbeast/pretrained_0.5B"` in agents/torchbeast_agent.py.
Then I ran `$ python test_submission.py ` but it gives me the following error message. (commit id:
3f9ef7f14a)
```
Traceback (most recent call last):
File "test_submission.py", line 36, in <module>
evaluate()
File "test_submission.py", line 25, in evaluate
agent = Agent(num_envs, batched_env.num_actions)
File "/data/private/research/AgentLearning/nethack_challenge/agents/torchbeast_agent.py", line 26, in __init__
self.model = load_model(MODEL_DIR, self.device)
File "/data/private/research/AgentLearning/nethack_challenge/nethack_baselines/torchbeast/models/__init__.py", line 54, in load_model
checkpoint_states = torch.load(flags.checkpoint, map_location=device)
File "/root/anaconda3/envs/nle_challenge/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/root/anaconda3/envs/nle_challenge/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/root/anaconda3/envs/nle_challenge/lib/python3.8/site-packages/omegaconf/basecontainer.py", line 108, in __setstate__
key_type = d["_metadata"].key_type
KeyError: '_metadata'
```https://gitlab.aicrowd.com/nethack/neurips-2021-the-nethack-challenge/-/issues/15the median score from evaluation server seems to be much lower than local test2021-09-24T10:13:47ZGhost Userthe median score from evaluation server seems to be much lower than local test@eric_hammy I trained torchbeast_agent from scratch for 1B, and tested the model using test_submission.py. I did 5 test runs, where each run evaluated 512 episodes, and the resulting median score was **400 +- 25.** Then I submitted exact...@eric_hammy I trained torchbeast_agent from scratch for 1B, and tested the model using test_submission.py. I did 5 test runs, where each run evaluated 512 episodes, and the resulting median score was **400 +- 25.** Then I submitted exactly the same model and code to the evaluation server, but the computed median score was only **322**, which is much lower than my local test result (400 +- 25).
What could make the difference between local and remote test results? The submission ID is `149676`, and config.yaml is as below.
```
name: null
wandb: false
project: nethack_challenge
entity: user1
group: group1
mock: false
single_ttyrec: true
num_seeds: 0
write_profiler_trace: false
fn_penalty_step: constant
penalty_time: 0.0
penalty_step: -0.01
reward_lose: 0
reward_win: 100
state_counter: none
character: '@'
mode: train
env: challenge
num_actors: 256
total_steps: 1000000000.0
batch_size: 32
unroll_length: 80
num_learner_threads: 1
num_inference_threads: 1
disable_cuda: false
learner_device: cuda:1
actor_device: cuda:0
max_learner_queue_size: null
learning_rate: 0.0002
grad_norm_clipping: 40
alpha: 0.99
momentum: 0
epsilon: 1.0e-06
entropy_cost: 0.001
baseline_cost: 0.5
discounting: 0.999
normalize_reward: true
model: baseline
use_lstm: true
hidden_dim: 256
embedding_dim: 64
layers: 5
crop_dim: 9
use_index_select: true
restrict_action_space: true
msg:
hidden_dim: 64
embedding_dim: 32
load_dir: null
savedir: ../../NetHackChallenge-v0-random-char
```