Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Flatland
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
sfwatergit
Flatland
Commits
d36631af
Commit
d36631af
authored
5 years ago
by
Erik Nygren
Browse files
Options
Downloads
Patches
Plain Diff
removed reward function bug which led to agent chosing invalid actions
parent
5d1de868
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
flatland/envs/rail_env.py
+24
-24
24 additions, 24 deletions
flatland/envs/rail_env.py
with
24 additions
and
24 deletions
flatland/envs/rail_env.py
+
24
−
24
View file @
d36631af
...
...
@@ -165,8 +165,8 @@ class RailEnv(Environment):
self
.
restart_agents
()
for
i
A
ge
n
t
in
range
(
self
.
get_num_agents
()):
agent
=
self
.
agents
[
i
A
ge
n
t
]
for
i
_a
ge
m
t
in
range
(
self
.
get_num_agents
()):
agent
=
self
.
agents
[
i
_a
ge
m
t
]
agent
.
speed_data
[
'
position_fraction
'
]
=
0.0
self
.
num_resets
+=
1
...
...
@@ -195,31 +195,31 @@ class RailEnv(Environment):
# Reset the step rewards
self
.
rewards_dict
=
dict
()
for
i
A
ge
n
t
in
range
(
self
.
get_num_agents
()):
self
.
rewards_dict
[
i
A
ge
n
t
]
=
0
for
i
_a
ge
m
t
in
range
(
self
.
get_num_agents
()):
self
.
rewards_dict
[
i
_a
ge
m
t
]
=
0
if
self
.
dones
[
"
__all__
"
]:
self
.
rewards_dict
=
{
i
:
r
+
global_reward
for
i
,
r
in
self
.
rewards_dict
.
items
()}
return
self
.
_get_observations
(),
self
.
rewards_dict
,
self
.
dones
,
{}
# for i in range(len(self.agents_handles)):
for
i
A
ge
n
t
in
range
(
self
.
get_num_agents
()):
agent
=
self
.
agents
[
i
A
ge
n
t
]
for
i
_a
ge
m
t
in
range
(
self
.
get_num_agents
()):
agent
=
self
.
agents
[
i
_a
ge
m
t
]
agent
.
old_direction
=
agent
.
direction
agent
.
old_position
=
agent
.
position
if
self
.
dones
[
i
A
ge
n
t
]:
# this agent has already completed...
if
self
.
dones
[
i
_a
ge
m
t
]:
# this agent has already completed...
continue
if
i
A
ge
n
t
not
in
action_dict
:
# no action has been supplied for this agent
action_dict
[
i
A
ge
n
t
]
=
RailEnvActions
.
DO_NOTHING
if
i
_a
ge
m
t
not
in
action_dict
:
# no action has been supplied for this agent
action_dict
[
i
_a
ge
m
t
]
=
RailEnvActions
.
DO_NOTHING
if
action_dict
[
i
A
ge
n
t
]
<
0
or
action_dict
[
i
A
ge
n
t
]
>
len
(
RailEnvActions
):
print
(
'
ERROR: illegal action=
'
,
action_dict
[
i
A
ge
n
t
],
'
for agent with index=
'
,
i
A
ge
n
t
,
if
action_dict
[
i
_a
ge
m
t
]
<
0
or
action_dict
[
i
_a
ge
m
t
]
>
len
(
RailEnvActions
):
print
(
'
ERROR: illegal action=
'
,
action_dict
[
i
_a
ge
m
t
],
'
for agent with index=
'
,
i
_a
ge
m
t
,
'"
DO NOTHING
"
will be executed instead
'
)
action_dict
[
i
A
ge
n
t
]
=
RailEnvActions
.
DO_NOTHING
action_dict
[
i
_a
ge
m
t
]
=
RailEnvActions
.
DO_NOTHING
action
=
action_dict
[
i
A
ge
n
t
]
action
=
action_dict
[
i
_a
ge
m
t
]
if
action
==
RailEnvActions
.
DO_NOTHING
and
agent
.
moving
:
# Keep moving
...
...
@@ -228,12 +228,12 @@ class RailEnv(Environment):
if
action
==
RailEnvActions
.
STOP_MOVING
and
agent
.
moving
and
agent
.
speed_data
[
'
position_fraction
'
]
==
0.
:
# Only allow halting an agent on entering new cells.
agent
.
moving
=
False
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
stop_penalty
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
stop_penalty
if
not
agent
.
moving
and
not
(
action
==
RailEnvActions
.
DO_NOTHING
or
action
==
RailEnvActions
.
STOP_MOVING
):
# Allow agent to start with any forward or direction action
agent
.
moving
=
True
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
start_penalty
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
start_penalty
# Now perform a movement.
# If the agent is in an initial position within a new cell (agent.speed_data['position_fraction']<eps)
...
...
@@ -269,18 +269,18 @@ class RailEnv(Environment):
else
:
# TODO: an invalid action was chosen after entering the cell. The agent cannot move.
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
invalid_action_penalty
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
step_penalty
*
agent
.
speed_data
[
'
speed
'
]
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
invalid_action_penalty
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
step_penalty
*
agent
.
speed_data
[
'
speed
'
]
agent
.
moving
=
False
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
stop_penalty
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
stop_penalty
continue
else
:
# TODO: an invalid action was chosen after entering the cell. The agent cannot move.
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
invalid_action_penalty
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
step_penalty
*
agent
.
speed_data
[
'
speed
'
]
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
invalid_action_penalty
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
step_penalty
*
agent
.
speed_data
[
'
speed
'
]
agent
.
moving
=
False
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
stop_penalty
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
stop_penalty
continue
...
...
@@ -302,9 +302,9 @@ class RailEnv(Environment):
agent
.
speed_data
[
'
position_fraction
'
]
=
0.0
if
np
.
equal
(
agent
.
position
,
agent
.
target
).
all
():
self
.
dones
[
i
A
ge
n
t
]
=
True
self
.
dones
[
i
_a
ge
m
t
]
=
True
else
:
self
.
rewards_dict
[
i
A
ge
n
t
]
+=
step_penalty
*
agent
.
speed_data
[
'
speed
'
]
self
.
rewards_dict
[
i
_a
ge
m
t
]
+=
step_penalty
*
agent
.
speed_data
[
'
speed
'
]
# Check for end of episode + add global reward to all rewards!
if
np
.
all
([
np
.
array_equal
(
agent2
.
position
,
agent2
.
target
)
for
agent2
in
self
.
agents
]):
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment