From 651ce77b8d6441f78a6bb7b74a3c9785a75f4785 Mon Sep 17 00:00:00 2001
From: hagrid67 <jdhwatson@gmail.com>
Date: Mon, 29 Jul 2019 20:30:30 +0100
Subject: [PATCH] made rst http link to Wikipedia Pareto Optimality

---
 docs/getting_start_with_observationbuilder.rst | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/getting_start_with_observationbuilder.rst b/docs/getting_start_with_observationbuilder.rst
index 5f655989..c74e198f 100644
--- a/docs/getting_start_with_observationbuilder.rst
+++ b/docs/getting_start_with_observationbuilder.rst
@@ -58,11 +58,13 @@ Example 2 : Single-agent navigation
 
 Observation builder objects can of course derive from existing concrete subclasses of ObservationBuilder.
 For example, it may be useful to extend the TreeObsForRailEnv_ observation builder.
-A feature of this class is that on :code:`reset()`, it pre-computes the length of the shortest paths from all
+A feature of this class is that on :code:`reset()`, it pre-computes the lengths of the shortest paths from all
 cells and orientations to the target of each agent, i.e. a distance map for each agent.
+
 In this example we exploit these distance maps by implementing an observation builder that shows the current shortest path for each agent as a one-hot observation vector of length 3, whose components represent the possible directions an agent can take (LEFT, FORWARD, RIGHT). All values of the observation vector are set to :code:`0` except for the shortest direction where it is set to :code:`1`.
-Using this observation with highly engineered features indicating the agent's shortest path, an agent can then learn to take the corresponding action at each time-step, or we could even hardcode the optimal policy. 
-Please note, however, that this simple strategy fails when multiple agents are present, as each agent would only attempt its greedy solution, which is not usually Pareto-optimal in this context.
+
+Using this observation with highly engineered features indicating the agent's shortest path, an agent can then learn to take the corresponding action at each time-step; or we could even hardcode the optimal policy. 
+Note that this simple strategy fails when multiple agents are present, as each agent would only attempt its greedy solution, which is not usually `Pareto-optimal <https://en.wikipedia.org/wiki/Pareto_efficiency>`_ in this context.
 
 .. _TreeObsForRailEnv: https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/observations.py#L14
 
-- 
GitLab