diff --git a/README.rst b/README.rst index 1559be647c840b90c149a59d9ec612fa79ba0704..12a1a2b8130444ac4dc0ccee5338045879b95d20 100644 --- a/README.rst +++ b/README.rst @@ -7,12 +7,12 @@ Flatland .. image:: https://gitlab.aicrowd.com/flatland/flatland/badges/master/pipeline.svg :target: https://gitlab.aicrowd.com/flatland/flatland/pipelines :alt: Test Running - + .. image:: https://gitlab.aicrowd.com/flatland/flatland/badges/master/coverage.svg :target: https://gitlab.aicrowd.com/flatland/flatland/pipelines :alt: Test Coverage -' +' .. image:: https://i.imgur.com/0rnbSLY.gif :width: 800 @@ -21,9 +21,9 @@ Flatland Flatland is a opensource toolkit for developing and comparing Multi Agent Reinforcement Learning algorithms in little (or ridiculously large !) gridworlds. The base environment is a two-dimensional grid in which many agents can be placed, and each agent must solve one or more navigational tasks in the grid world. More details about the environment and the problem statement can be found in the `official docs <http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/>`_. -This library was developed by `SBB <https://www.sbb.ch/en/>`_ , `AIcrowd <https://www.aicrowd.com/>`_ and numerous contributors and AIcrowd research fellows from the AIcrowd community. +This library was developed by `SBB <https://www.sbb.ch/en/>`_ , `AIcrowd <https://www.aicrowd.com/>`_ and numerous contributors and AIcrowd research fellows from the AIcrowd community. -This library was developed specifically for the `Flatland Challenge <https://www.aicrowd.com/challenges/flatland-challenge>`_ in which we strongly encourage you to take part in. +This library was developed specifically for the `Flatland Challenge <https://www.aicrowd.com/challenges/flatland-challenge>`_ in which we strongly encourage you to take part in. **NOTE This document is best viewed in the official documentation site at** `Flatland-RL Docs <http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/readme.html>`_ @@ -45,13 +45,13 @@ Quick Start * Install `Anaconda <https://www.anaconda.com/distribution/>`_ by following the instructions `here <https://www.anaconda.com/distribution/>`_ * Install the dependencies and the library - + .. code-block:: console $ conda create python=3.6 --name flatland-rl $ conda activate flatland-rl $ conda install -c conda-forge cairosvg pycairo - $ conda install -c anaconda tk + $ conda install -c anaconda tk $ pip install flatland-rl * Test that the installation works @@ -70,7 +70,8 @@ Basic usage of the RailEnv environment used by the Flatland Challenge import numpy as np import time - from flatland.envs.generators import complex_rail_generator + from flatland.envs.rail_generators import complex_rail_generator + from flatland.envs.schedule_generators import complex_schedule_generator from flatland.envs.rail_env import RailEnv from flatland.utils.rendertools import RenderTool @@ -84,6 +85,7 @@ Basic usage of the RailEnv environment used by the Flatland Challenge min_dist=8, max_dist=99999, seed=0), + schedule_generator=complex_schedule_generator() number_of_agents=NUMBER_OF_AGENTS) env_renderer = RenderTool(env) @@ -105,7 +107,7 @@ Basic usage of the RailEnv environment used by the Flatland Challenge env_renderer.render_env(show=True, frames=False, show_observations=False) time.sleep(0.3) -and **ideally** you should see something along the lines of +and **ideally** you should see something along the lines of .. image:: https://i.imgur.com/VrTQVeM.gif :align: center @@ -118,7 +120,7 @@ Contributions Flatland is an opensource project, and we very much value all and any contributions you make towards the project. Please follow the `Contribution Guidelines <http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/contributing.html>`_ for more details on how you can successfully contribute to the project. We enthusiastically look forward to your contributions. -Partners +Partners ============ .. image:: https://i.imgur.com/OSCXtde.png :target: https://sbb.ch @@ -142,4 +144,4 @@ Authors Acknowledgements ==================== * Vaibhav Agrawal <theinfamouswayne@gmail.com> -* Anurag Ghosh +* Anurag Ghosh diff --git a/docs/gettingstarted.rst b/docs/gettingstarted.rst index 22b9be7d79df7f88273761369fffa0c47811ed1a..0cfa43bb130880b84a71f2ce982cfefefb9deffd 100644 --- a/docs/gettingstarted.rst +++ b/docs/gettingstarted.rst @@ -5,8 +5,8 @@ Getting Started Overview -------------- -Following are three short tutorials to help new users get acquainted with how -to create RailEnvs, how to train simple DQN agents on them, and how to customize +Following are three short tutorials to help new users get acquainted with how +to create RailEnvs, how to train simple DQN agents on them, and how to customize them. To use flatland in a project: @@ -19,17 +19,17 @@ To use flatland in a project: Part 1 : Basic Usage -------------- -The basic usage of RailEnv environments consists in creating a RailEnv object -endowed with a rail generator, that generates new rail networks on each reset, -and an observation generator object, that is supplied with environment-specific -information at each time step and provides a suitable observation vector to the +The basic usage of RailEnv environments consists in creating a RailEnv object +endowed with a rail generator, that generates new rail networks on each reset, +and an observation generator object, that is supplied with environment-specific +information at each time step and provides a suitable observation vector to the agents. -The simplest rail generators are envs.generators.rail_from_manual_specifications_generator -and envs.generators.random_rail_generator. +The simplest rail generators are envs.rail_generators.rail_from_manual_specifications_generator +and envs.rail_generators.random_rail_generator. -The first one accepts a list of lists whose each element is a 2-tuple, whose -entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and +The first one accepts a list of lists whose each element is a 2-tuple, whose +entries represent the 'cell_type' (see core.transitions.RailEnvTransitions) and the desired clockwise rotation of the cell contents (0, 90, 180 or 270 degrees). For example, @@ -46,8 +46,8 @@ For example, number_of_agents=1, obs_builder_object=TreeObsForRailEnv(max_depth=2)) -Alternatively, a random environment can be generated (optionally specifying -weights for each cell type to increase or decrease their proportion in the +Alternatively, a random environment can be generated (optionally specifying +weights for each cell type to increase or decrease their proportion in the generated rail networks). .. code-block:: python @@ -64,7 +64,7 @@ generated rail networks). 0.2, # Case 8 - turn left 0.2, # Case 9 - turn right 1.0] # Case 10 - mirrored switch - + # Example generate a random rail env = RailEnv(width=10, height=10, @@ -82,8 +82,8 @@ Environments can be rendered using the utils.rendertools utilities, for example: env_renderer.render_env(show=True) -Finally, the environment can be run by supplying the environment step function -with a dictionary of actions whose keys are agents' handles (returned by +Finally, the environment can be run by supplying the environment step function +with a dictionary of actions whose keys are agents' handles (returned by env.get_agent_handles() ) and the corresponding values the selected actions. For example, for a 2-agents environment: @@ -93,25 +93,25 @@ For example, for a 2-agents environment: action_dict = {handles[0]:0, handles[1]:0} obs, all_rewards, done, _ = env.step(action_dict) -where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents' -handles, whose values correspond to the relevant observations, rewards and terminal -status for each agent. Further, the 'dones' dictionary returns an extra key +where 'obs', 'all_rewards', and 'done' are also dictionary indexed by the agents' +handles, whose values correspond to the relevant observations, rewards and terminal +status for each agent. Further, the 'dones' dictionary returns an extra key '__all__' that is set to True after all agents have reached their goals. -In the specific case a TreeObsForRailEnv observation builder is used, it is -possible to print a representation of the returned observations with the +In the specific case a TreeObsForRailEnv observation builder is used, it is +possible to print a representation of the returned observations with the following code. Also, tree observation data is displayed by RenderTool by default. .. code-block:: python for i in range(env.get_num_agents()): env.obs_builder.util_print_obs_subtree( - tree=obs[i], + tree=obs[i], num_features_per_node=5 ) -The complete code for this part of the Getting Started guide can be found in +The complete code for this part of the Getting Started guide can be found in * `examples/simple_example_1.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_1.py>`_ * `examples/simple_example_2.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_2.py>`_ @@ -130,7 +130,8 @@ We start by importing the necessary Flatland libraries .. code-block:: python - from flatland.envs.generators import complex_rail_generator + from flatland.envs.rail_generators import complex_rail_generator + from flatland.envs.schedule_generators import complex_schedule_generator from flatland.envs.rail_env import RailEnv The complex_rail_generator is used in order to guarantee feasible railway network configurations for training. @@ -141,19 +142,19 @@ Next we configure the difficulty of our task by modifying the complex_rail_gener env = RailEnv( width=15, height=15, rail_generator=complex_rail_generator( - nr_start_goal=10, - nr_extra=10, - min_dist=10, - max_dist=99999, + nr_start_goal=10, + nr_extra=10, + min_dist=10, + max_dist=99999, seed=0), number_of_agents=5) - + The difficulty of a railway network depends on the dimensions (`width` x `height`) and the number of agents in the network. By varying the number of start and goal connections (nr_start_goal) and the number of extra railway elements added (nr_extra) the number of alternative paths of each agents can be modified. The more possible paths an agent has to reach its target the easier the task becomes. Here we don't specify any observation builder but rather use the standard tree observation. If you would like to use a custom obervation please follow the instructions in the next tutorial. -Feel free to vary these parameters to see how your own agent holds up on different setting. The evalutation set of railway configurations will +Feel free to vary these parameters to see how your own agent holds up on different setting. The evalutation set of railway configurations will cover the whole spectrum from easy to complex tasks. Once we are set with the environment we can load our preferred agent from either RLlib or any other ressource. Here we use a random agent to illustrate the code. @@ -182,18 +183,18 @@ This dictionary is then passed to the environment which checks the validity of a .. code-block:: python next_obs, all_rewards, done, _ = env.step(action_dict) - + The environment returns an array of new observations, reward dictionary for all agents as well as a flag for which agents are done. This information can be used to update the policy of your agent and if done['__all__'] == True the episode terminates. Part 3 : Customizing Observations and Level Generators -------------- -Example code for generating custom observations given a RailEnv and to generate -random rail maps are available in examples/custom_observation_example.py and +Example code for generating custom observations given a RailEnv and to generate +random rail maps are available in examples/custom_observation_example.py and examples/custom_railmap_example.py . -Custom observations can be produced by deriving a new object from the +Custom observations can be produced by deriving a new object from the core.env_observation_builder.ObservationBuilder base class, for example as follows: .. code-block:: python @@ -201,16 +202,16 @@ core.env_observation_builder.ObservationBuilder base class, for example as follo class CustomObs(ObservationBuilder): def __init__(self): self.observation_space = [5] - + def reset(self): return - + def get(self, handle): observation = handle*np.ones((5,)) return observation -It is important that an observation_space is defined with a list of dimensions -of the returned observation tensors. get() returns the observation for each agent, +It is important that an observation_space is defined with a list of dimensions +of the returned observation tensors. get() returns the observation for each agent, of handle 'handle'. A RailEnv environment can then be created as usual: @@ -223,14 +224,14 @@ A RailEnv environment can then be created as usual: number_of_agents=3, obs_builder_object=CustomObs()) -As for generating custom rail maps, the RailEnv class accepts a rail_generator -argument that must be a function with arguments `width`, `height`, `num_agents`, +As for generating custom rail maps, the RailEnv class accepts a rail_generator +argument that must be a function with arguments `width`, `height`, `num_agents`, and `num_resets=0`, and that has to return a GridTransitionMap object (the rail map), -and three lists of tuples containing the (row,column) coordinates of each of -num_agent agents, their initial orientation **(0=North, 1=East, 2=South, 3=West)**, +and three lists of tuples containing the (row,column) coordinates of each of +num_agent agents, their initial orientation **(0=North, 1=East, 2=South, 3=West)**, and the position of their targets. -For example, the following custom rail map generator returns an empty map of +For example, the following custom rail map generator returns an empty map of size (height, width), with no agents (regardless of num_agents): .. code-block:: python @@ -241,18 +242,18 @@ size (height, width), with no agents (regardless of num_agents): grid_map = GridTransitionMap(width=width, height=height, transitions=rail_trans) rail_array = grid_map.grid rail_array.fill(0) - + agents_positions = [] agents_direction = [] agents_target = [] - + return grid_map, agents_positions, agents_direction, agents_target return generator -It is worth to note that helpful utilities to manage RailEnv environments and their -related data structures are available in 'envs.env_utils'. In particular, -envs.env_utils.get_rnd_agents_pos_tgt_dir_on_rail is fairly handy to fill in -random (but consistent) agents along with their targets and initial directions, +It is worth to note that helpful utilities to manage RailEnv environments and their +related data structures are available in 'envs.env_utils'. In particular, +envs.env_utils.get_rnd_agents_pos_tgt_dir_on_rail is fairly handy to fill in +random (but consistent) agents along with their targets and initial directions, given a rail map (GridTransitionMap object) and the desired number of agents: .. code-block:: python