Skip to content
Snippets Groups Projects
Commit 15a725f0 authored by u214892's avatar u214892
Browse files

#67 multiple agents

parent e959427a
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Simple Example 3 - Manual Control ### Simple Example 3 - Manual Control
By default this runs a few "move forward" actions for two agents, in a separate window. By default this runs a few "move forward" actions for two agents, in a separate window.
If you uncomment the "input" line below, it opens a text box in the Jupyter notebook, allowing basic manual control. If you uncomment the "input" line below, it opens a text box in the Jupyter notebook, allowing basic manual control.
eg Enter `"0 2 s<enter>"` to tell agent 0 to move forward, and step the environment. eg Enter `"0 2 s<enter>"` to tell agent 0 to move forward, and step the environment.
You should be able to see the red agent step forward, and get a reward from the env, looking like this: You should be able to see the red agent step forward, and get a reward from the env, looking like this:
`Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ]` `Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ]`
Note that this example is set up to use the straightforward "PIL" renderer - without the special SBB artwork! Note that this example is set up to use the straightforward "PIL" renderer - without the special SBB artwork!
The agent observations are displayed as squares of varying sizes, with a paler version of the agent colour. The targets are half-size squares in the full agent colour. The agent observations are displayed as squares of varying sizes, with a paler version of the agent colour. The targets are half-size squares in the full agent colour.
You can switch to the "PILSVG" renderer which is prettier but currently renders the agents one step behind, because it needs to know which way the agent is turning. This can be confusing if you are debugging step-by-step. You can switch to the "PILSVG" renderer which is prettier but currently renders the agents one step behind, because it needs to know which way the agent is turning. This can be confusing if you are debugging step-by-step.
The image below is what the separate window should look like. The image below is what the separate window should look like.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
![simple_example_3.png](simple_example_3.png) ![simple_example_3.png](simple_example_3.png)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import random import random
import numpy as np import numpy as np
import time import time
from flatland.envs.generators import random_rail_generator from flatland.envs.generators import random_rail_generator
from flatland.envs.observations import TreeObsForRailEnv from flatland.envs.observations import TreeObsForRailEnv
from flatland.envs.rail_env import RailEnv from flatland.envs.rail_env import RailEnv
from flatland.utils.rendertools import RenderTool from flatland.utils.rendertools import RenderTool
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
random.seed(1) random.seed(1)
np.random.seed(1) np.random.seed(1)
env = RailEnv(width=7, env = RailEnv(width=7,
height=7, height=7,
rail_generator=random_rail_generator(), rail_generator=random_rail_generator(),
number_of_agents=2, number_of_agents=2,
obs_builder_object=TreeObsForRailEnv(max_depth=2)) obs_builder_object=TreeObsForRailEnv(max_depth=2))
# Print the observation vector for agent 0 # Print the observation vector for agent 0
obs, all_rewards, done, _ = env.step({0: 0}) obs, all_rewards, done, _ = env.step({0: 0})
for i in range(env.get_num_agents()): for i in range(env.get_num_agents()):
env.obs_builder.util_print_obs_subtree(tree=obs[i], num_features_per_node=7) env.obs_builder.util_print_obs_subtree(tree=obs[i])
env_renderer = RenderTool(env, gl="PIL") env_renderer = RenderTool(env, gl="PIL")
# env_renderer = RenderTool(env, gl="PILSVG") # env_renderer = RenderTool(env, gl="PILSVG")
env_renderer.renderEnv(show=True, frames=True) env_renderer.renderEnv(show=True, frames=True)
print("Manual control: s=perform step, q=quit, [agent id] [1-2-3 action] \ print("Manual control: s=perform step, q=quit, [agent id] [1-2-3 action] \
(turnleft+move, move to front, turnright+move)") (turnleft+move, move to front, turnright+move)")
``` ```
%% Output %% Output
[0, 0, 0, 0, 3.0, 0, 0] [0, 0, 0, 0, 3.0, 0, 0]
L: [0, -inf, -inf, -inf, -inf, -inf, -inf] L: [0, -inf, -inf, -inf, -inf, -inf, -inf]
L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, inf, inf, inf, 2, 1.0, 0] F: [-inf, inf, inf, inf, 2, 1.0, 0]
L: [0, 0, 1, inf, inf, 3, 0] L: [0, 0, 1, inf, inf, 3, 0]
F: [0, 0, inf, inf, inf, 5, 8.0] F: [0, 0, inf, inf, inf, 5, 8.0]
R: [0, 0, -inf, -inf, -inf, -inf, -inf] R: [0, 0, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
[0, 0, 0, 0, 8.0, 0, 0] [0, 0, 0, 0, 8.0, 0, 0]
L: [0, -inf, -inf, -inf, -inf, -inf, -inf] L: [0, -inf, -inf, -inf, -inf, -inf, -inf]
L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, inf, inf, inf, 2, 6.0, 0] F: [-inf, inf, inf, inf, 2, 6.0, 0]
L: [0, 0, -inf, -inf, -inf, -inf, -inf] L: [0, 0, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, inf, inf, inf, 3, 5.0] F: [-inf, -inf, inf, inf, inf, 3, 5.0]
R: [0, 0, inf, inf, inf, 6, 6.0] R: [0, 0, inf, inf, inf, 6, 6.0]
B: [0, 0, -inf, -inf, -inf, -inf, -inf] B: [0, 0, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] L: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] F: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] R: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf] B: [-inf, -inf, -inf, -inf, -inf, -inf, -inf]
Manual control: s=perform step, q=quit, [agent id] [1-2-3 action] (turnleft+move, move to front, turnright+move) Manual control: s=perform step, q=quit, [agent id] [1-2-3 action] (turnleft+move, move to front, turnright+move)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for step in range(10): for step in range(10):
# This is an example command, setting agent 0's action to 2 (move forward), and agent 1's action to 2, # This is an example command, setting agent 0's action to 2 (move forward), and agent 1's action to 2,
# then stepping the environment. # then stepping the environment.
cmd = "0 2 1 2 s" cmd = "0 2 1 2 s"
# uncomment this input statement if you want to try interactive manual commands # uncomment this input statement if you want to try interactive manual commands
# cmd = input(">> ") # cmd = input(">> ")
cmds = cmd.split(" ") cmds = cmd.split(" ")
action_dict = {} action_dict = {}
i = 0 i = 0
while i < len(cmds): while i < len(cmds):
if cmds[i] == 'q': if cmds[i] == 'q':
import sys import sys
sys.exit() sys.exit()
elif cmds[i] == 's': elif cmds[i] == 's':
obs, all_rewards, done, _ = env.step(action_dict) obs, all_rewards, done, _ = env.step(action_dict)
action_dict = {} action_dict = {}
print("Rewards: ", all_rewards, " [done=", done, "]") print("Rewards: ", all_rewards, " [done=", done, "]")
else: else:
agent_id = int(cmds[i]) agent_id = int(cmds[i])
action = int(cmds[i + 1]) action = int(cmds[i + 1])
action_dict[agent_id] = action action_dict[agent_id] = action
i = i + 1 i = i + 1
i += 1 i += 1
env_renderer.renderEnv(show=True, frames=True) env_renderer.renderEnv(show=True, frames=True)
time.sleep(0.3) time.sleep(0.3)
``` ```
%% Output %% Output
Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: -1.0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: -1.0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: 0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: 0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: 0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: 0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
Rewards: {0: 0, 1: 0} [done= {0: False, 1: False, '__all__': False} ] Rewards: {0: 0, 1: 0} [done= {0: False, 1: False, '__all__': False} ]
......
...@@ -113,8 +113,6 @@ deps = ...@@ -113,8 +113,6 @@ deps =
-r{toxinidir}/requirements_dev.txt -r{toxinidir}/requirements_dev.txt
-r{toxinidir}/requirements_continuous_integration.txt -r{toxinidir}/requirements_continuous_integration.txt
commands = commands =
; install current version of flatland to be used by notebooks
sh -c 'python setup.py install'
; run tests from subfolder to ensure that resources are accessed via resources and not via relative paths ; run tests from subfolder to ensure that resources are accessed via resources and not via relative paths
sh -c 'mkdir -p {envtmpdir}/6f59bc68108c3895b1828abdd04b9a06' sh -c 'mkdir -p {envtmpdir}/6f59bc68108c3895b1828abdd04b9a06'
sh -c 'jupyter nbextension install --py --sys-prefix widgetsnbextension' sh -c 'jupyter nbextension install --py --sys-prefix widgetsnbextension'
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment