Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • neurips-2021-the-nethack-challenge neurips-2021-the-nethack-challenge
  • Project information
    • Project information
    • Activity
    • Labels
    • Planning hierarchy
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 4
    • Issues 4
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 3
    • Merge requests 3
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nethack
  • neurips-2021-the-nethack-challengeneurips-2021-the-nethack-challenge
  • Issues
  • #15

Closed
Open
Created Jul 06, 2021 by Ghost User@ghost

the median score from evaluation server seems to be much lower than local test

@eric_hammy I trained torchbeast_agent from scratch for 1B, and tested the model using test_submission.py. I did 5 test runs, where each run evaluated 512 episodes, and the resulting median score was 400 +- 25. Then I submitted exactly the same model and code to the evaluation server, but the computed median score was only 322, which is much lower than my local test result (400 +- 25). What could make the difference between local and remote test results? The submission ID is 149676, and config.yaml is as below.

name: null
wandb: false
project: nethack_challenge
entity: user1
group: group1
mock: false
single_ttyrec: true
num_seeds: 0
write_profiler_trace: false
fn_penalty_step: constant
penalty_time: 0.0
penalty_step: -0.01
reward_lose: 0
reward_win: 100
state_counter: none
character: '@'
mode: train
env: challenge
num_actors: 256
total_steps: 1000000000.0
batch_size: 32
unroll_length: 80
num_learner_threads: 1
num_inference_threads: 1
disable_cuda: false
learner_device: cuda:1
actor_device: cuda:0
max_learner_queue_size: null
learning_rate: 0.0002
grad_norm_clipping: 40
alpha: 0.99
momentum: 0
epsilon: 1.0e-06
entropy_cost: 0.001
baseline_cost: 0.5
discounting: 0.999
normalize_reward: true
model: baseline
use_lstm: true
hidden_dim: 256
embedding_dim: 64
layers: 5
crop_dim: 9
use_index_select: true
restrict_action_space: true
msg:
  hidden_dim: 64
  embedding_dim: 32
load_dir: null
savedir: ../../NetHackChallenge-v0-random-char
Edited Jul 06, 2021 by Ghost User
Assignee
Assign to
Time tracking