diff --git a/AUTHORS.md b/AUTHORS.md
index 272f8db9e2713c06d5489f17031ae2922409402b..f4bb6c13e046e25a243293b8c277c068f73ac675 100644
--- a/AUTHORS.md
+++ b/AUTHORS.md
@@ -1,5 +1,4 @@
-Credits
-=======
+# Credits
 
 Development
 -----------
diff --git a/FAQ.md b/FAQ.md
new file mode 100644
index 0000000000000000000000000000000000000000..b43ee05e3100e5d292efb5387cfa837099eae634
--- /dev/null
+++ b/FAQ.md
@@ -0,0 +1,117 @@
+# Frequently Asked Questions (FAQs)
+
+## Questions about the Flatland Challenge:
+These are the most common questions regarding the [Flatland Challenge](https://www.aicrowd.com/challenges/flatland-challenge).
+If your questions are not answered please check the [Forum](https://discourse.aicrowd.com/c/flatland-challenge?_ga=2.33753761.1627822449.1571622829-1432296534.1549103074) and post your question there.
+
+### How can I win prizes in this challenge?
+You can win prizes in different categories.
+
+Best Solution Prize: Won by the participants with the best performing submission on our test set. Only your rankings from the Round 1 and Round 2 are taken into account. Check the leader board on this site regularly for the latest information on your ranking.
+
+The top three submissions in this category will be awarded the following cash prizes (in Swiss Francs):
+
+CHF 7â€™500.- for first prize
+
+CHF 5â€™000.- for second prize
+
+CHF 2â€™500.- for third prize
+
+Community Contributions Prize: Awarded to the person/group who makes the biggest contribution to the community - done through generating new observations and sharing them with the community.
+
+The top submission in this category will be awarded the following cash prize (in Swiss Francs): CHF 5â€™000.-
+
+In addition, we will hand-pick and award up to five (5) travel grants to the Applied Machine Learning Days 2019 in Lausanne, Switzerland. Participants with promising solutions may be invited to present their solutions at SBB in Bern, Switzerland.
+
+To check your eligibility please read the prizes section in the [rules](https://www.aicrowd.com/challenges/flatland-challenge/challenge_rules/68).
+
+### What are the deadlines for the flatland challenge?
+- The beta round starts on the 1st of July 2019 and ends on the 30th of July 2019
+- Round 1 closed on Sunday, 13th of October 2019, 12 PM. UTC +1
+- Round 2 closes on Sunday, 5th of January 2020, 12 PM. UTC +1
+
+### How is the score of a submission computed?
+The scores of your submission are computed as follows:
+
+1. Mean number of agents done, in other words how many agents reached their target in time.
+2. Mean reward is just the mean of the cummulated reward.
+3. If multiple participants have the same number of done agents we compute a "nomralized" reward as follows:
+```
+normalized_reward =cumulative_reward / (self.env._max_episode_steps +self.env.get_num_agents()
+```
+The mean number of agents done is the primary score value, only when it is tied to we use the "normalized" reward to determine the position on the leaderboard.
+
+### How do I submit to the Flatland Challenge?
+Follow the instructions in the [starter kit](https://github.com/AIcrowd/flatland-challenge-starter-kit) to get your first submission.
+
+### Can I use env variables with my controller?
+Yes you can. You can access all environment variables as you please. We recommend you use a custom observation builder to do so as explained [here](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html#custom-observations-and-custom-predictors-tutorial).
+
+### What are the time limits for my submission?
+If there is no action on the server for 10 minutes the submission will be cancelled and a time-out error wil be produced.
+
+If the submissions in total takes longer than 8 hours a time-out will occur.
+
+### What are the parameters for the environments for the submission scoring?
+The environments vary in size and number of agents as well as malfunction parameters. The upper limit of these variables for submissions are:
+- `(x_dim, y_dim) <= (150, 150)`
+- `n_agents <= 250` (this might be updated)
+- `malfunction rates` this is currently being refactored
+
+## Questions about the Flatland Repository:
+This section provides you with information about the most common questions around the Flatland repository. If your question is still not answered either reach out to the contacts listed on the repository directly or open an issue by following these [guidlines](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/06_contributing.html).
+### How can I get started with Flatland?
+Install Flatland by running `pip install -U flatland-rl` or directly from source by cloning the flatland repository and running `python setup.py --install` in the repository directory.
+
+These [Tutorials](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html) help you get a basic understanding of the flatland environment.
+### How do I train agents on Flatland?
+Once you have installed Flatland, head over to the [baselines repository](https://gitlab.aicrowd.com/flatland/baselines) to see how you can train your own reinforcement learning agent on Flatland.
+
+Check out this [tutorial](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/Getting_Started_Training.md?_ga=2.193077805.1627822449.1571622829-1432296534.1549103074) to get a sense of how it works.
+
+### What is a observation builder and which should I use?
+Observation builders give you the possibility to generate custom observations for your controller (reinfocement learning agent, optimization algorithm,...). The observation builder has access to all environment data and can perform any operations on them as long as they are not changed.
+This [tutorial](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html#custom-observations-and-custom-predictors-tutorial) will give you a sense on how to use them.
+### What is a predictor and which one should I use?
+Because railway traffic is limited to rails, many decisions that you have to take need to consider future situations and detect upcoming conflicts ahead of time. Therefore, flatland provides the possibility of predictors that predict where agents will be in the future. We provide a stock predictor that assumes each agent just travels along its shortest path.
+You can build more elaborate predictors and use them as part of your observation builder. You find more information [here](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html#custom-observations-and-custom-predictors-tutorial).
+### What information is available about each agent?
+Each agent is an object and contains the following information:
+
+- `initial_position = attrib(type=Tuple[int, int])`: The initial position of an agent. This is where the agent will enter the environment. It is the start of the agent journey.
+- `position = attrib(default=None, type=Optional[Tuple[int, int]])`: This is the actual position of the agent. It is updated every step of the environment. Before the agent has entered the environment and after it leaves the environment it is set to `None`
+- `direction = attrib(type=Grid4TransitionsEnum)`: This is the direction an agent is facing. The values for directions are `North:0`, `East:1`, `South:2` and `West:3`.
+- `target = attrib(type=Tuple[int, int])`: This is the target position the agent has to find and reach. Once the agent reaches this position its taks is done.
+- `moving = attrib(default=False, type=bool)`: Because agents can have malfunctions or be stopped because their path is blocked we store the current state of an agent. If `agent.moving == True` the agent is currently advancing. If it is `False` the agent is either blocked or broken.
+- `speed_data = attrib(default=Factory(lambda: dict({'position_fraction': 0.0, 'speed': 1.0, 'transition_action_on_cellexit': 0})))`: This contains all the relevant information about the speed of an agent:
+    - The attribute `'position_fraction'` indicates how far the agent has advanced within the cell. As soon as this value becomes larger than `1` the agent advances to the next cell as defined by `'transition_action_on_cellexit'`.
+    - The attribute `'speed''` defines the travel speed of an agent. It can be any fraction smaller than 1.
+    - The attribute `'transition_action_on_cellexit'` contains the information about the action that will be performed at the exit of the cell. Due to speeds smaller than 1. agents have to take several steps within a cell. We however only allow an action to be chosen at cell entry.
+- `malfunction_data = attrib(default=Factory(lambda: dict({'malfunction': 0, 'malfunction_rate': 0, 'next_malfunction': 0, 'nr_malfunctions': 0,'moving_before_malfunction': False})))`: Contains all information relevant for agent malfunctions:
+    - The attribute `'malfunction` indicates if the agent is currently broken. If the value is larger than `0` the agent is broken. The integer value represents the number of `env.step()` calls the agent will still be broken.
+    - The attribute `'next_malfunction'` will be REMOVED as it serves no purpose anymore, malfunctions are now generated by a poisson process.
+    - The attribute `'nr_malfunctions'` is a counter that keeps track of the number of malfunctions a specific agent has had.
+    - The attribute `'moving_before_malfunction'` is an internal parameter used to restart agents that were moving automatically after the malfunction is fixed.
+- `status = attrib(default=RailAgentStatus.READY_TO_DEPART, type=RailAgentStatus)`: The status of the agent explains what the agent is currently doing. It can be in either one of these states:
+    - `READY_TO_DEPART` not in grid yet (position is None) 
+    - `ACTIVE` in grid (position is not None), not done
+    - `DONE` in grid (position is not None), but done
+    - `DONE_REMOVED` removed from grid (position is None)
+
+### Can I use my own reward function?
+Yes you can do reward shaping as you please. All information can be accessed directly in the env.
+### What are rail and schedule generators?
+To generate environments for Flatland you need to provide a railway infrastructure (rail) and a set of tasks for each agent to complete (schedule).
+### What is the max number of timesteps per episode?
+The maximum number of timesteps is `max_time_steps = 4 * 2 * (env.width + env.height + 20)`
+### What are malfunctions and what can i do to resolve them?
+Malfunctions occur according to a Poisson process. The hinder an agent from performing its actions and update its position. While an agent is malfunctioning it is blocking the paths for other agents. There is nothing you can do to fix an agent, it will get fixed automatically as soon as `agent.malfunction_data['malfunction'] == 0` .
+You can however adjust the other agent actions to avoid delay propagation within the railway network and keeping traffic as smooth as possible.
+
+### Can agents communication with each other?
+There is no communitcation layer built into Flatland directly. You can however build a communication layer outside of the Flatland environment if necessary.
+
+## Questions about bugs
+### Why are my trains drawn outside of the rails?
+If you render your environment and the agents appear to be off the rail it is usually due to changes in the railway infrastructure. Make sure that you reset your renderer anytime the infrastructure changes by calling `env_renderer.reset().
+`
diff --git a/FAQ_Bugs.md b/FAQ_Bugs.md
new file mode 100644
index 0000000000000000000000000000000000000000..ac63dbf06294d4d736c4058fadd254f1e6d5be7b
--- /dev/null
+++ b/FAQ_Bugs.md
@@ -0,0 +1,5 @@
+# FAQ about bugs
+
+### Why are my trains drawn outside of the rails?
+If you render your environment and the agents appear to be off the rail it is usually due to changes in the railway infrastructure. Make sure that you reset your renderer anytime the infrastructure changes by calling `env_renderer.reset().
+`
diff --git a/FAQ_Challenge.md b/FAQ_Challenge.md
new file mode 100644
index 0000000000000000000000000000000000000000..988e76a4d2c497b485f7bb422d7c68ec15229c80
--- /dev/null
+++ b/FAQ_Challenge.md
@@ -0,0 +1,55 @@
+# FAQ about the Flatland Challenge
+
+These are the most common questions regarding the [Flatland Challenge](https://www.aicrowd.com/challenges/flatland-challenge).
+If your questions are not answered please check the [Forum](https://discourse.aicrowd.com/c/flatland-challenge?_ga=2.33753761.1627822449.1571622829-1432296534.1549103074) and post your question there.
+
+### How can I win prizes in this challenge?
+You can win prizes in different categories.
+
+Best Solution Prize: Won by the participants with the best performing submission on our test set. Only your rankings from the Round 1 and Round 2 are taken into account. Check the leader board on this site regularly for the latest information on your ranking.
+
+The top three submissions in this category will be awarded the following cash prizes (in Swiss Francs):
+
+- CHF 7'500.- for first prize
+- CHF 5'000.- for second prize
+- CHF 2'500.- for third prize
+
+Community Contributions Prize: Awarded to the person/group who makes the biggest contribution to the community - done through generating new observations and sharing them with the community.
+
+The top submission in this category will be awarded the following cash prize (in Swiss Francs): CHF 5'000.-
+
+In addition, we will hand-pick and award up to five (5) travel grants (up to 1'500 CHF each)to the Applied Machine Learning Days 2019 in Lausanne, Switzerland. Participants with promising solutions may be invited to present their solutions at SBB in Bern, Switzerland.
+
+To check your eligibility please read the prizes section in the [rules](https://www.aicrowd.com/challenges/flatland-challenge/challenge_rules/68).
+### What are the deadlines for the flatland challenge?
+- The beta round starts on the 1st of July 2019 and ends on the 30th of July 2019
+- Round 1 closed on Sunday, 13th of October 2019, 12 PM. UTC +1
+- Round 2 closes on Sunday, 5th of January 2020, 12 PM. UTC +1
+
+### How is the score of a submission computed?
+The scores of your submission are computed as follows:
+
+1. Mean number of agents done, in other words how many agents reached their target in time.
+2. Mean reward is just the mean of the cummulated reward.
+3. If multiple participants have the same number of done agents we compute a "nomralized" reward as follows:
+```
+normalized_reward =cumulative_reward / (self.env._max_episode_steps +self.env.get_num_agents()
+```
+The mean number of agents done is the primary score value, only when it is tied to we use the "normalized" reward to determine the position on the leaderboard.
+
+### How do I submit to the Flatland Challenge?
+Follow the instructions in the [starter kit](https://github.com/AIcrowd/flatland-challenge-starter-kit) to get your first submission.
+
+### Can I use env variables with my controller?
+Yes you can. You can access all environment variables as you please. We recommend you use a custom observation builder to do so as explained [here](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html#custom-observations-and-custom-predictors-tutorial).
+
+### What are the time limits for my submission?
+If there is no action on the server for 10 minutes the submission will be cancelled and a time-out error wil be produced.
+
+If the submissions in total takes longer than 8 hours a time-out will occur.
+
+### What are the parameters for the environments for the submission scoring?
+The environments vary in size and number of agents as well as malfunction parameters. The upper limit of these variables for submissions are:
+- `(x_dim, y_dim) <= (150, 150)`
+- `n_agents <= 250` (this might be updated)
+- `malfunction rates` this is currently being refactored
diff --git a/FAQ_Repository.md b/FAQ_Repository.md
new file mode 100644
index 0000000000000000000000000000000000000000..18528d184f37be9a78515c4a7ba34eda31fd9ab0
--- /dev/null
+++ b/FAQ_Repository.md
@@ -0,0 +1,53 @@
+# FAQ about the Flatland Repository
+
+This section provides you with information about the most common questions around the Flatland repository. If your question is still not answered either reach out to the contacts listed on the repository directly or open an issue by following these [guidlines](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/06_contributing.html).
+### How can I get started with Flatland?
+Install Flatland by running `pip install -U flatland-rl` or directly from source by cloning the flatland repository and running `python setup.py --install` in the repository directory.
+
+These [Tutorials](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html) help you get a basic understanding of the flatland environment.
+### How do I train agents on Flatland?
+Once you have installed Flatland, head over to the [baselines repository](https://gitlab.aicrowd.com/flatland/baselines) to see how you can train your own reinforcement learning agent on Flatland.
+
+Check out this [tutorial](https://gitlab.aicrowd.com/flatland/baselines/blob/master/torch_training/Getting_Started_Training.md?_ga=2.193077805.1627822449.1571622829-1432296534.1549103074) to get a sense of how it works.
+
+### What is a observation builder and which should I use?
+Observation builders give you the possibility to generate custom observations for your controller (reinfocement learning agent, optimization algorithm,...). The observation builder has access to all environment data and can perform any operations on them as long as they are not changed.
+This [tutorial](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html#custom-observations-and-custom-predictors-tutorial) will give you a sense on how to use them.
+### What is a predictor and which one should I use?
+Because railway traffic is limited to rails, many decisions that you have to take need to consider future situations and detect upcoming conflicts ahead of time. Therefore, flatland provides the possibility of predictors that predict where agents will be in the future. We provide a stock predictor that assumes each agent just travels along its shortest path.
+You can build more elaborate predictors and use them as part of your observation builder. You find more information [here](http://flatland-rl-docs.s3-website.eu-central-1.amazonaws.com/03_tutorials.html#custom-observations-and-custom-predictors-tutorial).
+### What information is available about each agent?
+Each agent is an object and contains the following information:
+
+- `initial_position = attrib(type=Tuple[int, int])`: The initial position of an agent. This is where the agent will enter the environment. It is the start of the agent journey.
+- `position = attrib(default=None, type=Optional[Tuple[int, int]])`: This is the actual position of the agent. It is updated every step of the environment. Before the agent has entered the environment and after it leaves the environment it is set to `None`
+- `direction = attrib(type=Grid4TransitionsEnum)`: This is the direction an agent is facing. The values for directions are `North:0`, `East:1`, `South:2` and `West:3`.
+- `target = attrib(type=Tuple[int, int])`: This is the target position the agent has to find and reach. Once the agent reaches this position its taks is done.
+- `moving = attrib(default=False, type=bool)`: Because agents can have malfunctions or be stopped because their path is blocked we store the current state of an agent. If `agent.moving == True` the agent is currently advancing. If it is `False` the agent is either blocked or broken.
+- `speed_data = attrib(default=Factory(lambda: dict({'position_fraction': 0.0, 'speed': 1.0, 'transition_action_on_cellexit': 0})))`: This contains all the relevant information about the speed of an agent:
+    - The attribute `'position_fraction'` indicates how far the agent has advanced within the cell. As soon as this value becomes larger than `1` the agent advances to the next cell as defined by `'transition_action_on_cellexit'`.
+    - The attribute `'speed''` defines the travel speed of an agent. It can be any fraction smaller than 1.
+    - The attribute `'transition_action_on_cellexit'` contains the information about the action that will be performed at the exit of the cell. Due to speeds smaller than 1. agents have to take several steps within a cell. We however only allow an action to be chosen at cell entry.
+- `malfunction_data = attrib(default=Factory(lambda: dict({'malfunction': 0, 'malfunction_rate': 0, 'next_malfunction': 0, 'nr_malfunctions': 0,'moving_before_malfunction': False})))`: Contains all information relevant for agent malfunctions:
+    - The attribute `'malfunction` indicates if the agent is currently broken. If the value is larger than `0` the agent is broken. The integer value represents the number of `env.step()` calls the agent will still be broken.
+    - The attribute `'next_malfunction'` will be REMOVED as it serves no purpose anymore, malfunctions are now generated by a poisson process.
+    - The attribute `'nr_malfunctions'` is a counter that keeps track of the number of malfunctions a specific agent has had.
+    - The attribute `'moving_before_malfunction'` is an internal parameter used to restart agents that were moving automatically after the malfunction is fixed.
+- `status = attrib(default=RailAgentStatus.READY_TO_DEPART, type=RailAgentStatus)`: The status of the agent explains what the agent is currently doing. It can be in either one of these states:
+    - `READY_TO_DEPART` not in grid yet (position is None) 
+    - `ACTIVE` in grid (position is not None), not done
+    - `DONE` in grid (position is not None), but done
+    - `DONE_REMOVED` removed from grid (position is None)
+
+### Can I use my own reward function?
+Yes you can do reward shaping as you please. All information can be accessed directly in the env.
+### What are rail and schedule generators?
+To generate environments for Flatland you need to provide a railway infrastructure (rail) and a set of tasks for each agent to complete (schedule).
+### What is the max number of timesteps per episode?
+The maximum number of timesteps is `max_time_steps = 4 * 2 * (env.width + env.height + 20)`
+### What are malfunctions and what can i do to resolve them?
+Malfunctions occur according to a Poisson process. The hinder an agent from performing its actions and update its position. While an agent is malfunctioning it is blocking the paths for other agents. There is nothing you can do to fix an agent, it will get fixed automatically as soon as `agent.malfunction_data['malfunction'] == 0` .
+You can however adjust the other agent actions to avoid delay propagation within the railway network and keeping traffic as smooth as possible.
+
+### Can agents communication with each other?
+There is no communitcation layer built into Flatland directly. You can however build a communication layer outside of the Flatland environment if necessary.
diff --git a/changelog.md b/changelog.md
index 8a4da1f11c78ba21da8e249250000ce68cac736e..543fc87a23aa808edf53f1a00707381c875aefa2 100644
--- a/changelog.md
+++ b/changelog.md
@@ -3,6 +3,13 @@ Changelog
 
 Changes since Flatland 2.0.0
 --------------------------
+### Changes in `EnvAgent`
+- class `EnvAgentStatic` was removed, so there is only class `EnvAgent` left which should simplify the handling of agents. The member `self.agents_static` of `RailEnv` was therefore also removed. Old Scence saved as pickle files cannot be loaded anymore.
+
+### Changes in malfunction behavior
+- agent attribute `next_malfunction`is not used anymore, it will be removed fully in future versions.
+- `break_agent()` function is introduced which induces malfunctions in agent according to poisson process
+- `_fix_agent_after_malfunction()` fixes agents after attribute `malfunction == 0`
 
 ### Changes in `Environment`
 - moving of member variable `distance_map_computed` to new class `DistanceMap`
diff --git a/docs/04_specifications.rst b/docs/04_specifications.rst
index 4a7ffee65dac39e4d22ccfad47abecc7b6bb616f..df22008ad972d201fd503ecae4eb9263af698c3e 100644
--- a/docs/04_specifications.rst
+++ b/docs/04_specifications.rst
@@ -4,4 +4,3 @@
 .. include:: specifications/intro_observation_actions.rst
 .. include:: specifications/rendering.rst
 .. include:: specifications/visualization.rst
-.. include:: specifications/FAQ.rst
diff --git a/docs/08_authors.rst b/docs/08_authors.rst
index e122f914a87b277e565fc9567af1a7545ec9872b..c7558862a35ab54880c763cf1c61f7c271e639d5 100644
--- a/docs/08_authors.rst
+++ b/docs/08_authors.rst
@@ -1 +1,6 @@
+Authors
+=======
+.. toctree::
+   :maxdepth: 2
+
 .. include:: ../AUTHORS.rst
diff --git a/docs/09_faq.rst b/docs/09_faq.rst
new file mode 100644
index 0000000000000000000000000000000000000000..0b015cf7ccd5e4560695f8d0b22860ccdda24c1c
--- /dev/null
+++ b/docs/09_faq.rst
@@ -0,0 +1,4 @@
+.. include:: ../FAQ_Challenge.rst
+
+.. include:: ../FAQ_Repository.rst
+.. include:: ../FAQ_Bugs.rst
diff --git a/docs/09_faq_toc.rst b/docs/09_faq_toc.rst
new file mode 100644
index 0000000000000000000000000000000000000000..9fe3936395523fbc3d0757988bddd9f84aecb5a0
--- /dev/null
+++ b/docs/09_faq_toc.rst
@@ -0,0 +1,7 @@
+FAQ
+===
+
+.. toctree::
+   :maxdepth: 2
+
+   09_faq
diff --git a/docs/index.rst b/docs/index.rst
index 94efbc91d33db3b4c459d31665d98f1c7a333b54..852ef7f31d32f79f4a0537f5db8a942d8c35c841 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -14,6 +14,7 @@ Welcome to flatland's documentation!
    06_contributing
    07_changes
    08_authors
+   09_faq_toc
 
 Indices and tables
 ==================
diff --git a/docs/specifications/FAQ.rst b/docs/specifications/FAQ.rst
deleted file mode 100644
index ee50befd1abc430fc26b401e89f148cc4625ccf3..0000000000000000000000000000000000000000
--- a/docs/specifications/FAQ.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-========================================
-Frequently Asked Questions (FAQs)
-========================================
-
--   I get a runtime error with `Click` complaining about the encoding
-
-    .. code-block:: python
-
-        RuntimeError('Click will abort further execution because Python 3 \
-        was configured to use ASCII as encoding for ...sk_SK.UTF-8, \
-        sl_SI.UTF-8, sr_YU.UTF-8, sv_SE.UTF-8, tr_TR.UTF-8, \
-        uk_UA.UTF-8, zh_CN.UTF-8, zh_HK.UTF-8, zh_TW.UTF-8')
-
-    This can be solved by :
-
-    .. code-block:: bash
-
-        export LC_ALL=en_US.utf-8
-        export LANG=en_US.utf-8
-
-
diff --git a/docs/tutorials/01_gettingstarted.rst b/docs/tutorials/01_gettingstarted.rst
index c818742144b6ccdc547e96855794a4ab40066394..c5d73178384109758cf0e8bb5c2f25802d8700ca 100644
--- a/docs/tutorials/01_gettingstarted.rst
+++ b/docs/tutorials/01_gettingstarted.rst
@@ -109,15 +109,12 @@ following code. Also, tree observation data is displayed by RenderTool by defaul
     for i in range(env.get_num_agents()):
         env.obs_builder.util_print_obs_subtree(
                 tree=obs[i],
-                num_features_per_node=5
                 )
 
 The complete code for this part of the Getting Started guide can be found in
 
 * `examples/simple_example_1.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_1.py>`_
 * `examples/simple_example_2.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_2.py>`_
-* `examples/simple_example_3.py <https://gitlab.aicrowd.com/flatland/flatland/blob/master/examples/simple_example_3.py>`_
-
 
 
 Part 2 : Training a Simple an Agent on Flatland
diff --git a/examples/introduction_flatland_2_1.py b/examples/introduction_flatland_2_1.py
index b832717c76735f7b89768e31df5013af77874c33..cf6b69dc30096d27064290aede6af91cec24d5f0 100644
--- a/examples/introduction_flatland_2_1.py
+++ b/examples/introduction_flatland_2_1.py
@@ -1,9 +1,12 @@
+import numpy as np
+
 # In Flatland you can use custom observation builders and predicitors
 # Observation builders generate the observation needed by the controller
 # Preditctors can be used to do short time prediction which can help in avoiding conflicts in the network
 from flatland.envs.observations import GlobalObsForRailEnv
 # First of all we import the Flatland rail environment
 from flatland.envs.rail_env import RailEnv
+from flatland.envs.rail_env import RailEnvActions
 from flatland.envs.rail_generators import sparse_rail_generator
 from flatland.envs.schedule_generators import sparse_schedule_generator
 # We also include a renderer because we want to visualize what is going on in the environment
@@ -25,10 +28,10 @@ from flatland.utils.rendertools import RenderTool, AgentRenderVariant
 # The railway infrastructure can be build using any of the provided generators in env/rail_generators.py
 # Here we use the sparse_rail_generator with the following parameters
 
-width = 50  # With of map
-height = 50  # Height of map
+width = 16 * 7  # With of map
+height = 9 * 7  # Height of map
 nr_trains = 20  # Number of trains that have an assigned task in the env
-cities_in_map = 12  # Number of cities where agents can start or end
+cities_in_map = 20  # Number of cities where agents can start or end
 seed = 14  # Random seed
 grid_distribution_of_cities = False  # Type of city distribution, if False cities are randomly placed
 max_rails_between_cities = 2  # Max number of tracks allowed between cities. This is number of entry point to a city
@@ -58,10 +61,9 @@ schedule_generator = sparse_schedule_generator(speed_ration_map)
 # We can furthermore pass stochastic data to the RailEnv constructor which will allow for stochastic malfunctions
 # during an episode.
 
-stochastic_data = {'prop_malfunction': 0.3,  # Percentage of defective agents
-                   'malfunction_rate': 30,  # Rate of malfunction occurence
-                   'min_duration': 3,  # Minimal duration of malfunction
-                   'max_duration': 20  # Max duration of malfunction
+stochastic_data = {'malfunction_rate': 100,  # Rate of malfunction occurence of single agent
+                   'min_duration': 15,  # Minimal duration of malfunction
+                   'max_duration': 50  # Max duration of malfunction
                    }
 
 # Custom observation builder without predictor
@@ -86,8 +88,8 @@ env.reset()
 env_renderer = RenderTool(env, gl="PILSVG",
                           agent_render_variant=AgentRenderVariant.AGENT_SHOWS_OPTIONS_AND_BOX,
                           show_debug=False,
-                          screen_height=1000,  # Adjust these parameters to fit your resolution
-                          screen_width=1000)  # Adjust these parameters to fit your resolution
+                          screen_height=600,  # Adjust these parameters to fit your resolution
+                          screen_width=800)  # Adjust these parameters to fit your resolution
 
 
 # The first thing we notice is that some agents don't have feasible paths to their target.
@@ -108,7 +110,8 @@ class RandomAgent:
         :param state: input is the observation of the agent
         :return: returns an action
         """
-        return 2  # np.random.choice(np.arange(self.action_size))
+        return np.random.choice([RailEnvActions.MOVE_FORWARD, RailEnvActions.MOVE_RIGHT, RailEnvActions.MOVE_LEFT,
+                                 RailEnvActions.STOP_MOVING])
 
     def step(self, memories):
         """
@@ -204,9 +207,8 @@ print("========================================")
 
 for agent_idx, agent in enumerate(env.agents):
     print(
-        "Agent {} will malfunction = {} at a rate of {}, the next malfunction will occur in {} step. Agent OK = {}".format(
-            agent_idx, agent.malfunction_data['malfunction_rate'] > 0, agent.malfunction_data['malfunction_rate'],
-            agent.malfunction_data['next_malfunction'], agent.malfunction_data['malfunction'] < 1))
+        "Agent {} is OK = {}".format(
+            agent_idx, agent.malfunction_data['malfunction'] < 1))
 
 # Now that you have seen these novel concepts that were introduced you will realize that agents don't need to take
 # an action at every time step as it will only change the outcome when actions are chosen at cell entry.
@@ -242,7 +244,7 @@ score = 0
 # Run episode
 frame_step = 0
 
-for step in range(100):
+for step in range(500):
     # Chose an action for each agent in the environment
     for a in range(env.get_num_agents()):
         action = controller.act(observations[a])
@@ -254,6 +256,7 @@ for step in range(100):
     next_obs, all_rewards, done, _ = env.step(action_dict)
 
     env_renderer.render_env(show=True, show_observations=False, show_predictions=False)
+    # env_renderer.gl.save_image('./misc/Fames2/flatland_frame_{:04d}.png'.format(step))
     frame_step += 1
     # Update replay buffer and train agent
     for a in range(env.get_num_agents()):
@@ -263,5 +266,4 @@ for step in range(100):
     observations = next_obs.copy()
     if done['__all__']:
         break
-
     print('Episode: Steps {}\t Score = {}'.format(step, score))
diff --git a/examples/simple_example_3.py b/examples/simple_example_3.py
deleted file mode 100644
index ccbe8682fe5c8744737a452c77257dd4570b6f75..0000000000000000000000000000000000000000
--- a/examples/simple_example_3.py
+++ /dev/null
@@ -1,53 +0,0 @@
-import random
-
-import numpy as np
-
-from flatland.envs.observations import TreeObsForRailEnv
-from flatland.envs.rail_env import RailEnv
-from flatland.envs.rail_generators import complex_rail_generator
-from flatland.envs.schedule_generators import complex_schedule_generator
-from flatland.utils.rendertools import RenderTool
-
-random.seed(1)
-np.random.seed(1)
-
-env = RailEnv(width=7,
-              height=7,
-              rail_generator=complex_rail_generator(nr_start_goal=10, nr_extra=1, min_dist=5, max_dist=99999, seed=1),
-              schedule_generator=complex_schedule_generator(),
-              number_of_agents=2,
-              obs_builder_object=TreeObsForRailEnv(max_depth=2))
-
-env.reset()
-
-# Print the observation vector for agent 0
-obs, all_rewards, done, _ = env.step({0: 0})
-for i in range(env.get_num_agents()):
-    env.obs_builder.util_print_obs_subtree(tree=obs[i])
-
-env_renderer = RenderTool(env)
-env_renderer.render_env(show=True, frames=True)
-
-print("Manual control: s=perform step, q=quit, [agent id] [1-2-3 action] \
-       (turnleft+move, move to front, turnright+move)")
-for step in range(100):
-    cmd = input(">> ")
-    cmds = cmd.split(" ")
-
-    action_dict = {}
-
-    i = 0
-    while i < len(cmds):
-        if cmds[i] == 'q':
-            break
-        elif cmds[i] == 's':
-            obs, all_rewards, done, _ = env.step(action_dict)
-            action_dict = {}
-            print("Rewards: ", all_rewards, "  [done=", done, "]")
-        else:
-            agent_id = int(cmds[i])
-            action = int(cmds[i + 1])
-            action_dict[agent_id] = action
-            i = i + 1
-        i += 1
-        env_renderer.render_env(show=True, frames=True)
diff --git a/flatland/envs/agent_utils.py b/flatland/envs/agent_utils.py
index 6a0e595bbd2e47202a9fc2e78c64f438f8190684..dd639997f2b759e86b0879a84e4ab91f7ffc824b 100644
--- a/flatland/envs/agent_utils.py
+++ b/flatland/envs/agent_utils.py
@@ -1,8 +1,7 @@
 from enum import IntEnum
 from itertools import starmap
-from typing import Tuple, Optional
+from typing import Tuple, Optional, NamedTuple
 
-import numpy as np
 from attr import attrs, attrib, Factory
 
 from flatland.core.grid.grid4 import Grid4TransitionsEnum
@@ -16,14 +15,25 @@ class RailAgentStatus(IntEnum):
     DONE_REMOVED = 3  # removed from grid (position is None) -> prediction is None
 
 
+Agent = NamedTuple('Agent', [('initial_position', Tuple[int, int]),
+                             ('initial_direction', Grid4TransitionsEnum),
+                             ('direction', Grid4TransitionsEnum),
+                             ('target', Tuple[int, int]),
+                             ('moving', bool),
+                             ('speed_data', dict),
+                             ('malfunction_data', dict),
+                             ('handle', int),
+                             ('status', RailAgentStatus),
+                             ('position', Tuple[int, int]),
+                             ('old_direction', Grid4TransitionsEnum),
+                             ('old_position', Tuple[int, int])])
+
+
 @attrs
-class EnvAgentStatic(object):
-    """ EnvAgentStatic - Stores initial position, direction and target.
-        This is like static data for the environment - it's where an agent starts,
-        rather than where it is at the moment.
-        The target should also be stored here.
-    """
+class EnvAgent:
+
     initial_position = attrib(type=Tuple[int, int])
+    initial_direction = attrib(type=Grid4TransitionsEnum)
     direction = attrib(type=Grid4TransitionsEnum)
     target = attrib(type=Tuple[int, int])
     moving = attrib(default=False, type=bool)
@@ -42,12 +52,33 @@ class EnvAgentStatic(object):
             lambda: dict({'malfunction': 0, 'malfunction_rate': 0, 'next_malfunction': 0, 'nr_malfunctions': 0,
                           'moving_before_malfunction': False})))
 
+    handle = attrib(default=None)
+
     status = attrib(default=RailAgentStatus.READY_TO_DEPART, type=RailAgentStatus)
     position = attrib(default=None, type=Optional[Tuple[int, int]])
 
+    # used in rendering
+    old_direction = attrib(default=None)
+    old_position = attrib(default=None)
+
+    def reset(self):
+        self.position = None
+        # TODO: set direction to None: https://gitlab.aicrowd.com/flatland/flatland/issues/280
+        self.direction = self.initial_direction
+        self.status = RailAgentStatus.READY_TO_DEPART
+        self.old_position = None
+        self.old_direction = None
+        self.moving = False
+
+    def to_agent(self) -> Agent:
+        return Agent(initial_position=self.initial_position, initial_direction=self.initial_direction,
+                     direction=self.direction, target=self.target, moving=self.moving, speed_data=self.speed_data,
+                     malfunction_data=self.malfunction_data, handle=self.handle, status=self.status,
+                     position=self.position, old_direction=self.old_direction, old_position=self.old_position)
+
     @classmethod
-    def from_lists(cls, schedule: Schedule):
-        """ Create a list of EnvAgentStatics from lists of positions, directions and targets
+    def from_schedule(cls, schedule: Schedule):
+        """ Create a list of EnvAgent from lists of positions, directions and targets
         """
         speed_datas = []
 
@@ -56,69 +87,19 @@ class EnvAgentStatic(object):
                                 'speed': schedule.agent_speeds[i] if schedule.agent_speeds is not None else 1.0,
                                 'transition_action_on_cellexit': 0})
 
-        # TODO: on initialization, all agents are re-set as non-broken. Perhaps it may be desirable to set
-        # some as broken?
-
         malfunction_datas = []
         for i in range(len(schedule.agent_positions)):
             malfunction_datas.append({'malfunction': 0,
-                                      'malfunction_rate': schedule.agent_malfunction_rates[i] if schedule.agent_malfunction_rates is not None else 0.,
+                                      'malfunction_rate': schedule.agent_malfunction_rates[
+                                          i] if schedule.agent_malfunction_rates is not None else 0.,
                                       'next_malfunction': 0,
                                       'nr_malfunctions': 0})
 
-        return list(starmap(EnvAgentStatic, zip(schedule.agent_positions,
-                                                schedule.agent_directions,
-                                                schedule.agent_targets,
-                                                [False] * len(schedule.agent_positions),
-                                                speed_datas,
-                                                malfunction_datas)))
-
-    def to_list(self):
-
-        # I can't find an expression which works on both tuples, lists and ndarrays
-        # which converts them all to a list of native python ints.
-        lPos = self.initial_position
-        if type(lPos) is np.ndarray:
-            lPos = lPos.tolist()
-
-        lTarget = self.target
-        if type(lTarget) is np.ndarray:
-            lTarget = lTarget.tolist()
-
-        return [lPos, int(self.direction), lTarget, int(self.moving), self.speed_data, self.malfunction_data]
-
-
-@attrs
-class EnvAgent(EnvAgentStatic):
-    """ EnvAgent - replace separate agent_* lists with a single list
-        of agent objects.  The EnvAgent represent's the environment's view
-        of the dynamic agent state.
-        We are duplicating target in the EnvAgent, which seems simpler than
-        forcing the env to refer to it in the EnvAgentStatic
-    """
-    handle = attrib(default=None)
-    old_direction = attrib(default=None)
-    old_position = attrib(default=None)
-
-    def to_list(self):
-        return [
-            self.position, self.direction, self.target, self.handle,
-            self.old_direction, self.old_position, self.moving, self.speed_data, self.malfunction_data]
-
-    @classmethod
-    def from_static(cls, oStatic):
-        """ Create an EnvAgent from the EnvAgentStatic,
-        copying all the fields, and adding handle with the default 0.
-        """
-        return EnvAgent(*oStatic.__dict__, handle=0)
-
-    @classmethod
-    def list_from_static(cls, lEnvAgentStatic, handles=None):
-        """ Create an EnvAgent from the EnvAgentStatic,
-        copying all the fields, and adding handle with the default 0.
-        """
-        if handles is None:
-            handles = range(len(lEnvAgentStatic))
-
-        return [EnvAgent(**oEAS.__dict__, handle=handle)
-                for handle, oEAS in zip(handles, lEnvAgentStatic)]
+        return list(starmap(EnvAgent, zip(schedule.agent_positions,
+                                          schedule.agent_directions,
+                                          schedule.agent_directions,
+                                          schedule.agent_targets,
+                                          [False] * len(schedule.agent_positions),
+                                          speed_datas,
+                                          malfunction_datas,
+                                          range(len(schedule.agent_positions)))))
diff --git a/flatland/envs/predictions.py b/flatland/envs/predictions.py
index 6a4899995ac84257b1845265a5402db9048bd654..c2d342d6b43a445c1deb93dc59476478875bc786 100644
--- a/flatland/envs/predictions.py
+++ b/flatland/envs/predictions.py
@@ -157,8 +157,8 @@ class ShortestPathPredictorForRailEnv(PredictionBuilder):
             new_position = agent_virtual_position
             visited = OrderedSet()
             for index in range(1, self.max_depth + 1):
-                # if we're at the target or not moving, stop moving until max_depth is reached
-                if new_position == agent.target or not agent.moving or not shortest_path:
+                # if we're at the target, stop moving until max_depth is reached
+                if new_position == agent.target or not shortest_path:
                     prediction[index] = [index, *new_position, new_direction, RailEnvActions.STOP_MOVING]
                     visited.add((*new_position, agent.direction))
                     continue
diff --git a/flatland/envs/rail_env.py b/flatland/envs/rail_env.py
index 125f8f13b445fba6b137bcf2ed521843f84bc278..0dd58813b5568ba95e11f984d15f6bd256100c0f 100644
--- a/flatland/envs/rail_env.py
+++ b/flatland/envs/rail_env.py
@@ -1,6 +1,7 @@
 """
 Definition of the RailEnv environment.
 """
+import random
 # TODO:  _ this is a global method --> utils or remove later
 from enum import IntEnum
 from typing import List, NamedTuple, Optional, Dict
@@ -16,7 +17,7 @@ from flatland.core.grid.grid4 import Grid4TransitionsEnum, Grid4Transitions
 from flatland.core.grid.grid4_utils import get_new_position
 from flatland.core.grid.grid_utils import IntVector2D
 from flatland.core.transition_map import GridTransitionMap
-from flatland.envs.agent_utils import EnvAgentStatic, EnvAgent, RailAgentStatus
+from flatland.envs.agent_utils import EnvAgent, RailAgentStatus
 from flatland.envs.distance_map import DistanceMap
 from flatland.envs.observations import GlobalObsForRailEnv
 from flatland.envs.rail_generators import random_rail_generator, RailGenerator
@@ -181,8 +182,8 @@ class RailEnv(Environment):
         self.dev_obs_dict = {}
         self.dev_pred_dict = {}
 
-        self.agents: List[EnvAgent] = [None] * number_of_agents  # live agents
-        self.agents_static: List[EnvAgentStatic] = [None] * number_of_agents  # static agent information
+        self.agents: List[EnvAgent] = []
+        self.number_of_agents = number_of_agents
         self.num_resets = 0
         self.distance_map = DistanceMap(self.agents, self.height, self.width)
 
@@ -196,52 +197,45 @@ class RailEnv(Environment):
 
         # Stochastic train malfunctioning parameters
         if stochastic_data is not None:
-            prop_malfunction = stochastic_data['prop_malfunction']
             mean_malfunction_rate = stochastic_data['malfunction_rate']
             malfunction_min_duration = stochastic_data['min_duration']
             malfunction_max_duration = stochastic_data['max_duration']
         else:
-            prop_malfunction = 0.
             mean_malfunction_rate = 0.
             malfunction_min_duration = 0.
             malfunction_max_duration = 0.
 
-        # percentage of malfunctioning trains
-        self.proportion_malfunctioning_trains = prop_malfunction
-
-        # Mean malfunction in number of stops
+        # Mean malfunction in number of time steps
         self.mean_malfunction_rate = mean_malfunction_rate
 
         # Uniform distribution parameters for malfunction duration
         self.min_number_of_steps_broken = malfunction_min_duration
         self.max_number_of_steps_broken = malfunction_max_duration
-        # Reset environment
 
         self.valid_positions = None
 
-        # global numpy array of agents position, True means that there is an agent at that cell
-        self.agent_positions: np.ndarray = np.full((height, width), False)
+        # global numpy array of agents position, -1 means that the cell is free, otherwise the agent handle is placed
+        # inside the cell
+        self.agent_positions: np.ndarray = np.zeros((height, width), dtype=int) - 1
 
     def _seed(self, seed=None):
         self.np_random, seed = seeding.np_random(seed)
+        random.seed(seed)
         return [seed]
 
     # no more agent_handles
     def get_agent_handles(self):
         return range(self.get_num_agents())
 
-    def get_num_agents(self, static=True):
-        if static:
-            return len(self.agents_static)
-        else:
-            return len(self.agents)
+    def get_num_agents(self) -> int:
+        return len(self.agents)
 
-    def add_agent_static(self, agent_static):
+    def add_agent(self, agent):
         """ Add static info for a single agent.
             Returns the index of the new agent.
         """
-        self.agents_static.append(agent_static)
-        return len(self.agents_static) - 1
+        self.agents.append(agent)
+        return len(self.agents) - 1
 
     def set_agent_active(self, handle: int):
         agent = self.agents[handle]
@@ -250,9 +244,11 @@ class RailEnv(Environment):
             self._set_agent_to_initial_position(agent, agent.initial_position)
 
     def restart_agents(self):
-        """ Reset the agents to their starting positions defined in agents_static
+        """ Reset the agents to their starting positions
         """
-        self.agents = EnvAgent.list_from_static(self.agents_static)
+        for agent in self.agents:
+            agent.reset()
+        self.active_agents = [i for i in range(len(self.agents))]
 
     @staticmethod
     def compute_max_episode_steps(width: int, height: int, ratio_nr_agents_to_nr_cities: float = 20.0) -> int:
@@ -329,7 +325,7 @@ class RailEnv(Environment):
 
         optionals = {}
         if regenerate_rail or self.rail is None:
-            rail, optionals = self.rail_generator(self.width, self.height, self.get_num_agents(), self.num_resets)
+            rail, optionals = self.rail_generator(self.width, self.height, self.number_of_agents, self.num_resets)
 
             self.rail = rail
             self.height, self.width = self.rail.grid.shape
@@ -342,17 +338,13 @@ class RailEnv(Environment):
         if optionals and 'distance_map' in optionals:
             self.distance_map.set(optionals['distance_map'])
 
-        # todo change self.agents_static[0] with the refactoring for agents_static -> issue nr. 185
-        # https://gitlab.aicrowd.com/flatland/flatland/issues/185
-        if regenerate_schedule or regenerate_rail or self.agents_static[0] is None:
+        if regenerate_schedule or regenerate_rail or self.get_num_agents() == 0:
             agents_hints = None
             if optionals and 'agents_hints' in optionals:
                 agents_hints = optionals['agents_hints']
 
-            # TODO https://gitlab.aicrowd.com/flatland/flatland/issues/185
-            #  why do we need static agents? could we it more elegantly?
-            schedule = self.schedule_generator(self.rail, self.get_num_agents(), agents_hints, self.num_resets)
-            self.agents_static = EnvAgentStatic.from_lists(schedule)
+            schedule = self.schedule_generator(self.rail, self.number_of_agents, agents_hints, self.num_resets)
+            self.agents = EnvAgent.from_schedule(schedule)
 
             if agents_hints and 'city_orientations' in agents_hints:
                 ratio_nr_agents_to_nr_cities = self.get_num_agents() / len(agents_hints['city_orientations'])
@@ -362,7 +354,7 @@ class RailEnv(Environment):
             else:
                 self._max_episode_steps = self.compute_max_episode_steps(width=self.width, height=self.height)
 
-        self.agent_positions = np.full((self.height, self.width), False)
+        self.agent_positions = np.zeros((self.height, self.width), dtype=int) - 1
 
         self.restart_agents()
 
@@ -370,20 +362,16 @@ class RailEnv(Environment):
             for i_agent in range(self.get_num_agents()):
                 self.set_agent_active(i_agent)
 
-        for i_agent, agent in enumerate(self.agents):
-            # A proportion of agent in the environment will receive a positive malfunction rate
-            if self.np_random.rand() < self.proportion_malfunctioning_trains:
-                agent.malfunction_data['malfunction_rate'] = self.mean_malfunction_rate
-                next_breakdown = int(
-                    self._exp_distirbution_synced(rate=agent.malfunction_data['malfunction_rate']))
-                agent.malfunction_data['next_malfunction'] = next_breakdown
-            agent.malfunction_data['malfunction'] = 0
+        for agent in self.agents:
+            # Induce malfunctions
+            self._break_agent(self.mean_malfunction_rate, agent)
 
-            initial_malfunction = self._agent_malfunction(i_agent)
-
-            if initial_malfunction:
+            if agent.malfunction_data["malfunction"] > 0:
                 agent.speed_data['transition_action_on_cellexit'] = RailEnvActions.DO_NOTHING
 
+            # Fix agents that finished their malfunction
+            self._fix_agent_after_malfunction(agent)
+
         self.num_resets += 1
         self._elapsed_steps = 0
 
@@ -397,62 +385,56 @@ class RailEnv(Environment):
         info_dict: Dict = {
             'action_required': {i: self.action_required(agent) for i, agent in enumerate(self.agents)},
             'malfunction': {
-                i: self.agents[i].malfunction_data['malfunction'] for i in range(self.get_num_agents())
+                i: agent.malfunction_data['malfunction'] for i, agent in enumerate(self.agents)
             },
-            'speed': {i: self.agents[i].speed_data['speed'] for i in range(self.get_num_agents())},
+            'speed': {i: agent.speed_data['speed'] for i, agent in enumerate(self.agents)},
             'status': {i: agent.status for i, agent in enumerate(self.agents)}
         }
         # Return the new observation vectors for each agent
         observation_dict: Dict = self._get_observations()
         return observation_dict, info_dict
 
-    def _agent_malfunction(self, i_agent) -> bool:
+    def _fix_agent_after_malfunction(self, agent: EnvAgent):
         """
-        Returns true if the agent enters into malfunction. (False, if not broken down or already broken down before).
+        Updates agent malfunction variables and fixes broken agents
+
+        Parameters
+        ----------
+        agent
         """
-        agent = self.agents[i_agent]
 
-        # Decrease counter for next event only if agent is currently not broken and agent has a malfunction rate
-        if agent.malfunction_data['malfunction_rate'] >= 1 and agent.malfunction_data['next_malfunction'] > 0 and \
-            agent.malfunction_data['malfunction'] < 1:
-            agent.malfunction_data['next_malfunction'] -= 1
-
-        # Only agents that have a positive rate for malfunctions and are not currently broken are considered
-        # If counter has come to zero --> Agent has malfunction
-        # set next malfunction time and duration of current malfunction
-        if agent.malfunction_data['malfunction_rate'] >= 1 and 1 > agent.malfunction_data['malfunction'] and \
-            agent.malfunction_data['next_malfunction'] < 1:
-            # Increase number of malfunctions
-            agent.malfunction_data['nr_malfunctions'] += 1
-
-            # Next malfunction in number of stops
-            next_breakdown = int(
-                self._exp_distirbution_synced(rate=agent.malfunction_data['malfunction_rate']))
-            agent.malfunction_data['next_malfunction'] = max(next_breakdown, 1)
-            # Duration of current malfunction
-            num_broken_steps = self.np_random.randint(self.min_number_of_steps_broken,
-                                                      self.max_number_of_steps_broken + 1) + 1
-            agent.malfunction_data['malfunction'] = num_broken_steps
-            agent.malfunction_data['moving_before_malfunction'] = agent.moving
-
-            return True
-        else:
-            # The train was broken before...
-            if agent.malfunction_data['malfunction'] > 0:
+        # Ignore agents that are OK
+        if self._is_agent_ok(agent):
+            return
 
-                # Last step of malfunction --> Agent starts moving again after getting fixed
-                if agent.malfunction_data['malfunction'] < 2:
-                    agent.malfunction_data['malfunction'] -= 1
+        # Reduce number of malfunction steps left
+        if agent.malfunction_data['malfunction'] > 1:
+            agent.malfunction_data['malfunction'] -= 1
+            return
 
-                    # restore moving state before malfunction without further penalty
-                    self.agents[i_agent].moving = agent.malfunction_data['moving_before_malfunction']
+        # Restart agents at the end of their malfunction
+        agent.malfunction_data['malfunction'] -= 1
+        if 'moving_before_malfunction' in agent.malfunction_data:
+            agent.moving = agent.malfunction_data['moving_before_malfunction']
+            return
 
-                else:
-                    agent.malfunction_data['malfunction'] -= 1
+    def _break_agent(self, rate: float, agent) -> bool:
+        """
+        Malfunction generator that breaks agents at a given rate.
 
-                    # Nothing left to do with broken agent
-                    return True
-        return False
+        Parameters
+        ----------
+        agent
+
+        """
+        if agent.malfunction_data['malfunction'] < 1:
+            if self.np_random.rand() < self._malfunction_prob(rate):
+                num_broken_steps = self.np_random.randint(self.min_number_of_steps_broken,
+                                                          self.max_number_of_steps_broken + 1) + 1
+                agent.malfunction_data['malfunction'] = num_broken_steps
+                agent.malfunction_data['moving_before_malfunction'] = agent.moving
+                agent.malfunction_data['nr_malfunctions'] += 1
+        return
 
     def step(self, action_dict_: Dict[int, RailEnvActions]):
         """
@@ -492,10 +474,14 @@ class RailEnv(Environment):
             "status": {},
         }
         have_all_agents_ended = True  # boolean flag to check if all agents are done
+
         for i_agent, agent in enumerate(self.agents):
             # Reset the step rewards
             self.rewards_dict[i_agent] = 0
 
+            # Induce malfunction before we do a step, thus a broken agent can't move in this step
+            self._break_agent(self.mean_malfunction_rate, agent)
+
             # Perform step on the agent
             self._step_agent(i_agent, action_dict_.get(i_agent))
 
@@ -508,6 +494,9 @@ class RailEnv(Environment):
             info_dict["speed"][i_agent] = agent.speed_data['speed']
             info_dict["status"][i_agent] = agent.status
 
+            # Fix agents that finished their malfunction such that they can perform an action in the next step
+            self._fix_agent_after_malfunction(agent)
+
         # Check for end of episode + set global reward to all rewards!
         if have_all_agents_ended:
             self.dones["__all__"] = True
@@ -552,12 +541,9 @@ class RailEnv(Environment):
         agent.old_direction = agent.direction
         agent.old_position = agent.position
 
-        # is the agent malfunctioning?
-        malfunction = self._agent_malfunction(i_agent)
-
         # if agent is broken, actions are ignored and agent does not move.
         # full step penalty in this case
-        if malfunction:
+        if agent.malfunction_data['malfunction'] > 0:
             self.rewards_dict[i_agent] += self.step_penalty * agent.speed_data['speed']
             return
 
@@ -644,6 +630,7 @@ class RailEnv(Environment):
             if np.equal(agent.position, agent.target).all():
                 agent.status = RailAgentStatus.DONE
                 self.dones[i_agent] = True
+                self.active_agents.remove(i_agent)
                 agent.moving = False
                 self._remove_agent_from_scene(agent)
             else:
@@ -663,7 +650,7 @@ class RailEnv(Environment):
         new_position: IntVector2D
         """
         agent.position = new_position
-        self.agent_positions[agent.position] = True
+        self.agent_positions[agent.position] = agent.handle
 
     def _move_agent_to_new_position(self, agent: EnvAgent, new_position: IntVector2D):
         """
@@ -676,8 +663,8 @@ class RailEnv(Environment):
         new_position: IntVector2D
         """
         agent.position = new_position
-        self.agent_positions[agent.old_position] = False
-        self.agent_positions[agent.position] = True
+        self.agent_positions[agent.old_position] = -1
+        self.agent_positions[agent.position] = agent.handle
 
     def _remove_agent_from_scene(self, agent: EnvAgent):
         """
@@ -688,7 +675,7 @@ class RailEnv(Environment):
         -------
         agent: EnvAgent object
         """
-        self.agent_positions[agent.position] = False
+        self.agent_positions[agent.position] = -1
         if self.remove_agents_at_target:
             agent.position = None
             agent.status = RailAgentStatus.DONE_REMOVED
@@ -753,7 +740,7 @@ class RailEnv(Environment):
             is the cell free or not?
 
         """
-        return not self.agent_positions[position]
+        return self.agent_positions[position] == -1
 
     def check_action(self, agent: EnvAgent, action: RailEnvActions):
         """
@@ -826,14 +813,11 @@ class RailEnv(Environment):
         Returns state of environment in msgpack object
         """
         grid_data = self.rail.grid.tolist()
-        agent_static_data = [agent.to_list() for agent in self.agents_static]
-        agent_data = [agent.to_list() for agent in self.agents]
+        agent_data = [agent.to_agent() for agent in self.agents]
         msgpack.packb(grid_data, use_bin_type=True)
         msgpack.packb(agent_data, use_bin_type=True)
-        msgpack.packb(agent_static_data, use_bin_type=True)
         msg_data = {
             "grid": grid_data,
-            "agents_static": agent_static_data,
             "agents": agent_data}
         return msgpack.packb(msg_data, use_bin_type=True)
 
@@ -841,7 +825,7 @@ class RailEnv(Environment):
         """
         Returns agents information in msgpack object
         """
-        agent_data = [agent.to_list() for agent in self.agents]
+        agent_data = [agent.to_agent() for agent in self.agents]
         msg_data = {
             "agents": agent_data}
         return msgpack.packb(msg_data, use_bin_type=True)
@@ -857,8 +841,7 @@ class RailEnv(Environment):
         data = msgpack.unpackb(msg_data, use_list=False, encoding='utf-8')
         self.rail.grid = np.array(data["grid"])
         # agents are always reset as not moving
-        self.agents_static = [EnvAgentStatic(d[0], d[1], d[2], moving=False) for d in data["agents_static"]]
-        self.agents = [EnvAgent(d[0], d[1], d[2], d[3], d[4], d[5], d[6], d[7], d[8]) for d in data["agents"]]
+        self.agents = [EnvAgent(*d[0:12]) for d in data["agents"]]
         # setup with loaded data
         self.height, self.width = self.rail.grid.shape
         self.rail.height = self.height
@@ -876,8 +859,7 @@ class RailEnv(Environment):
         data = msgpack.unpackb(msg_data, use_list=False, encoding='utf-8')
         self.rail.grid = np.array(data["grid"])
         # agents are always reset as not moving
-        self.agents_static = [EnvAgentStatic(d[0], d[1], d[2], moving=False) for d in data["agents_static"]]
-        self.agents = [EnvAgent(d[0], d[1], d[2], d[3], d[4], d[5], d[6], d[7], d[8]) for d in data["agents"]]
+        self.agents = [EnvAgent(*d[0:12]) for d in data["agents"]]
         if "distance_map" in data.keys():
             self.distance_map.set(data["distance_map"])
         # setup with loaded data
@@ -891,16 +873,13 @@ class RailEnv(Environment):
         Returns environment information with distance map information as msgpack object
         """
         grid_data = self.rail.grid.tolist()
-        agent_static_data = [agent.to_list() for agent in self.agents_static]
-        agent_data = [agent.to_list() for agent in self.agents]
+        agent_data = [agent.to_agent() for agent in self.agents]
         msgpack.packb(grid_data, use_bin_type=True)
         msgpack.packb(agent_data, use_bin_type=True)
-        msgpack.packb(agent_static_data, use_bin_type=True)
         distance_map_data = self.distance_map.get()
         msgpack.packb(distance_map_data, use_bin_type=True)
         msg_data = {
             "grid": grid_data,
-            "agents_static": agent_static_data,
             "agents": agent_data,
             "distance_map": distance_map_data}
 
@@ -960,7 +939,7 @@ class RailEnv(Environment):
         load_data = read_binary(package, resource)
         self.set_full_state_msg(load_data)
 
-    def _exp_distirbution_synced(self, rate):
+    def _exp_distirbution_synced(self, rate: float) -> float:
         """
         Generates sample from exponential distribution
         We need this to guarantee synchronity between different instances with same seed.
@@ -970,3 +949,28 @@ class RailEnv(Environment):
         u = self.np_random.rand()
         x = - np.log(1 - u) * rate
         return x
+
+    def _malfunction_prob(self, rate: float) -> float:
+        """
+        Probability of a single agent to break. According to Poisson process with given rate
+        :param rate:
+        :return:
+        """
+        if rate <= 0:
+            return 0.
+        else:
+            return 1 - np.exp(- (1 / rate))
+
+    def _is_agent_ok(self, agent: EnvAgent) -> bool:
+        """
+        Check if an agent is ok, meaning it can move and is not malfuncitoinig
+        Parameters
+        ----------
+        agent
+
+        Returns
+        -------
+        True if agent is ok, False otherwise
+
+        """
+        return agent.malfunction_data['malfunction'] < 1
diff --git a/flatland/envs/rail_env_utils.py b/flatland/envs/rail_env_utils.py
index dc1cff12c0c8b1860859208a13d6403734a2d2ad..7a814891635e86330c5168d944ae5dea421ea5bc 100644
--- a/flatland/envs/rail_env_utils.py
+++ b/flatland/envs/rail_env_utils.py
@@ -1,3 +1,4 @@
+from flatland.core.env_observation_builder import ObservationBuilder
 from flatland.envs.observations import TreeObsForRailEnv
 from flatland.envs.predictions import ShortestPathPredictorForRailEnv
 from flatland.envs.rail_env import RailEnv
@@ -5,15 +6,34 @@ from flatland.envs.rail_generators import rail_from_file
 from flatland.envs.schedule_generators import schedule_from_file
 
 
-def load_flatland_environment_from_file(file_name, load_from_package=None, obs_builder_object=None):
+def load_flatland_environment_from_file(file_name: str,
+                                        load_from_package: str = None,
+                                        obs_builder_object: ObservationBuilder = None) -> RailEnv:
+    """
+    Parameters
+    ----------
+    file_name : str
+        The pickle file.
+    load_from_package : str
+        The python module to import from. Example: 'env_data.tests'
+        This requires that there are `__init__.py` files in the folder structure we load the file from.
+    obs_builder_object: ObservationBuilder
+        The obs builder for the `RailEnv` that is created.
+
+
+    Returns
+    -------
+    RailEnv
+        The environment loaded from the pickle file.
+    """
     if obs_builder_object is None:
         obs_builder_object = TreeObsForRailEnv(
             max_depth=2,
             predictor=ShortestPathPredictorForRailEnv(max_depth=10))
-    environment = RailEnv(width=1,
-                          height=1,
+    environment = RailEnv(width=1,  # will be overridden when loading from file
+                          height=1,  # will be overridden when loading from file
                           rail_generator=rail_from_file(file_name, load_from_package),
-                          number_of_agents=1,
+                          number_of_agents=1,  # will be overridden when loading from file
                           schedule_generator=schedule_from_file(file_name, load_from_package),
                           obs_builder_object=obs_builder_object)
     return environment
diff --git a/flatland/envs/schedule_generators.py b/flatland/envs/schedule_generators.py
index cb8b1537080f34f5851130a67e4e907b7593371a..656bf70c4e2d6977cf7c2f3816bd6376bb642719 100644
--- a/flatland/envs/schedule_generators.py
+++ b/flatland/envs/schedule_generators.py
@@ -7,7 +7,7 @@ import numpy as np
 
 from flatland.core.grid.grid4_utils import get_new_position
 from flatland.core.transition_map import GridTransitionMap
-from flatland.envs.agent_utils import EnvAgentStatic
+from flatland.envs.agent_utils import EnvAgent
 from flatland.envs.schedule_utils import Schedule
 
 AgentPosition = Tuple[int, int]
@@ -187,7 +187,7 @@ def random_schedule_generator(speed_ratio_map: Optional[Mapping[float, float]] =
     """
 
     def generator(rail: GridTransitionMap, num_agents: int, hints: Any = None,
-                num_resets: int = 0) -> Schedule:
+                  num_resets: int = 0) -> Schedule:
         _runtime_seed = seed + num_resets
 
         np.random.seed(_runtime_seed)
@@ -204,7 +204,7 @@ def random_schedule_generator(speed_ratio_map: Optional[Mapping[float, float]] =
         if len(valid_positions) < num_agents:
             warnings.warn("schedule_generators: len(valid_positions) < num_agents")
             return Schedule(agent_positions=[], agent_directions=[],
-                            agent_targets=[], agent_speeds=[], agent_malfunction_rates=None)
+                            agent_targets=[], agent_speeds=[],  agent_malfunction_rates=None)
 
         agents_position_idx = [i for i in np.random.choice(len(valid_positions), num_agents, replace=False)]
         agents_position = [valid_positions[agents_position_idx[i]] for i in range(num_agents)]
@@ -291,25 +291,16 @@ def schedule_from_file(filename, load_from_package=None) -> ScheduleGenerator:
             with open(filename, "rb") as file_in:
                 load_data = file_in.read()
         data = msgpack.unpackb(load_data, use_list=False, encoding='utf-8')
-
-        # agents are always reset as not moving
-        if len(data['agents_static'][0]) > 5:
-            agents_static = [EnvAgentStatic(d[0], d[1], d[2], d[3], d[4], d[5]) for d in data["agents_static"]]
-        else:
-            agents_static = [EnvAgentStatic(d[0], d[1], d[2], d[3]) for d in data["agents_static"]]
+        agents = [EnvAgent(*d[0:12]) for d in data["agents"]]
 
         # setup with loaded data
-        agents_position = [a.initial_position for a in agents_static]
-        agents_direction = [a.direction for a in agents_static]
-        agents_target = [a.target for a in agents_static]
-        if len(data['agents_static'][0]) > 5:
-            agents_speed = [a.speed_data['speed'] for a in agents_static]
-            agents_malfunction = [a.malfunction_data['malfunction_rate'] for a in agents_static]
-        else:
-            agents_speed = None
-            agents_malfunction = None
+        agents_position = [a.initial_position for a in agents]
+        agents_direction = [a.direction for a in agents]
+        agents_target = [a.target for a in agents]
+        agents_speed = [a.speed_data['speed'] for a in agents]
+        agents_malfunction = [a.malfunction_data['malfunction_rate'] for a in agents]
+
         return Schedule(agent_positions=agents_position, agent_directions=agents_direction,
-                        agent_targets=agents_target, agent_speeds=agents_speed,
-                        agent_malfunction_rates=agents_malfunction)
+                        agent_targets=agents_target, agent_speeds=agents_speed, agent_malfunction_rates=None)
 
     return generator
diff --git a/flatland/utils/editor.py b/flatland/utils/editor.py
index f8c9afd0358d42c2829dc9b7c1fd7f3ad5198a3e..2a77c960450ec4fbe3aa1ee5dce50d0ab98f9c50 100644
--- a/flatland/utils/editor.py
+++ b/flatland/utils/editor.py
@@ -10,7 +10,7 @@ from numpy import array
 
 import flatland.utils.rendertools as rt
 from flatland.core.grid.grid4_utils import mirror
-from flatland.envs.agent_utils import EnvAgent, EnvAgentStatic
+from flatland.envs.agent_utils import EnvAgent
 from flatland.envs.observations import TreeObsForRailEnv
 from flatland.envs.rail_env import RailEnv, random_rail_generator
 from flatland.envs.rail_generators import complex_rail_generator, empty_rail_generator
@@ -147,7 +147,7 @@ class View(object):
     def redraw(self):
         with self.output_generator:
             self.oRT.set_new_rail()
-            self.model.env.agents = self.model.env.agents_static
+            self.model.env.restart_agents()
             for a in self.model.env.agents:
                 if hasattr(a, 'old_position') is False:
                     a.old_position = a.position
@@ -178,12 +178,12 @@ class View(object):
             self.writableData[(y - 2):(y + 2), (x - 2):(x + 2), :3] = 0
 
     def xy_to_rc(self, x, y):
-        rcCell = ((array([y, x]) - self.yxBase))
+        rc_cell = ((array([y, x]) - self.yxBase))
         nX = np.floor((self.yxSize[0] - self.yxBase[0]) / self.model.env.height)
         nY = np.floor((self.yxSize[1] - self.yxBase[1]) / self.model.env.width)
-        rcCell[0] = max(0, min(np.floor(rcCell[0] / nY), self.model.env.height - 1))
-        rcCell[1] = max(0, min(np.floor(rcCell[1] / nX), self.model.env.width - 1))
-        return rcCell
+        rc_cell[0] = max(0, min(np.floor(rc_cell[0] / nY), self.model.env.height - 1))
+        rc_cell[1] = max(0, min(np.floor(rc_cell[1] / nX), self.model.env.width - 1))
+        return rc_cell
 
     def log(self, *args, **kwargs):
         if self.output_generator:
@@ -215,23 +215,23 @@ class Controller(object):
         y = event['canvasY']
         self.debug("debug:", x, y)
 
-        rcCell = self.view.xy_to_rc(x, y)
+        rc_cell = self.view.xy_to_rc(x, y)
 
         bShift = event["shiftKey"]
         bCtrl = event["ctrlKey"]
         bAlt = event["altKey"]
         if bCtrl and not bShift and not bAlt:
-            self.model.click_agent(rcCell)
+            self.model.click_agent(rc_cell)
             self.lrcStroke = []
         elif bShift and bCtrl:
-            self.model.add_target(rcCell)
+            self.model.add_target(rc_cell)
             self.lrcStroke = []
         elif bAlt and not bShift and not bCtrl:
-            self.model.clear_cell(rcCell)
+            self.model.clear_cell(rc_cell)
             self.lrcStroke = []
 
-        self.debug("click in cell", rcCell)
-        self.model.debug_cell(rcCell)
+        self.debug("click in cell", rc_cell)
+        self.model.debug_cell(rc_cell)
 
         if self.model.selected_agent is not None:
             self.lrcStroke = []
@@ -304,8 +304,8 @@ class Controller(object):
                     self.view.drag_path_element(x, y)
 
                     # Translate and scale from x,y to integer row,col (note order change)
-                    rcCell = self.view.xy_to_rc(x, y)
-                    self.editor.drag_path_element(rcCell)
+                    rc_cell = self.view.xy_to_rc(x, y)
+                    self.editor.drag_path_element(rc_cell)
 
                 self.view.redisplay_image()
 
@@ -329,7 +329,7 @@ class Controller(object):
     def rotate_agent(self, event):
         self.log("Rotate Agent:", self.model.selected_agent)
         if self.model.selected_agent is not None:
-            for agent_idx, agent in enumerate(self.model.env.agents_static):
+            for agent_idx, agent in enumerate(self.model.env.agents):
                 if agent is None:
                     continue
                 if agent_idx == self.model.selected_agent:
@@ -339,13 +339,7 @@ class Controller(object):
 
     def restart_agents(self, event):
         self.log("Restart Agents - nAgents:", self.view.regen_n_agents.value)
-        if self.model.init_agents_static is not None:
-            self.model.env.agents_static = [EnvAgentStatic(d[0], d[1], d[2], moving=False) for d in
-                                            self.model.init_agents_static]
-            self.model.env.agents = None
-            self.model.init_agents_static = None
-            self.model.env.restart_agents()
-            self.model.env.reset(False, False)
+        self.model.env.reset(False, False)
         self.refresh(event)
 
     def regenerate(self, event):
@@ -399,7 +393,6 @@ class EditorModel(object):
         self.env_filename = "temp.pkl"
         self.set_env(env)
         self.selected_agent = None
-        self.init_agents_static = None
         self.thread = None
         self.save_image_count = 0
 
@@ -420,12 +413,12 @@ class EditorModel(object):
     def set_draw_mode(self, draw_mode):
         self.draw_mode = draw_mode
 
-    def interpolate_path(self, rcLast, rcCell):
-        if np.array_equal(rcLast, rcCell):
+    def interpolate_path(self, rcLast, rc_cell):
+        if np.array_equal(rcLast, rc_cell):
             return []
         rcLast = array(rcLast)
-        rcCell = array(rcCell)
-        rcDelta = rcCell - rcLast
+        rc_cell = array(rc_cell)
+        rcDelta = rc_cell - rcLast
 
         lrcInterp = []  # extra row,col points
 
@@ -457,7 +450,7 @@ class EditorModel(object):
             lrcInterp = list(map(tuple, g2Interp))
         return lrcInterp
 
-    def drag_path_element(self, rcCell):
+    def drag_path_element(self, rc_cell):
         """Mouse motion event handler for drawing.
         """
         lrcStroke = self.lrcStroke
@@ -465,15 +458,15 @@ class EditorModel(object):
         # Store the row,col location of the click, if we have entered a new cell
         if len(lrcStroke) > 0:
             rcLast = lrcStroke[-1]
-            if not np.array_equal(rcLast, rcCell):  # only save at transition
-                lrcInterp = self.interpolate_path(rcLast, rcCell)
+            if not np.array_equal(rcLast, rc_cell):  # only save at transition
+                lrcInterp = self.interpolate_path(rcLast, rc_cell)
                 lrcStroke.extend(lrcInterp)
-                self.debug("lrcStroke ", len(lrcStroke), rcCell, "interp:", lrcInterp)
+                self.debug("lrcStroke ", len(lrcStroke), rc_cell, "interp:", lrcInterp)
 
         else:
             # This is the first cell in a mouse stroke
-            lrcStroke.append(rcCell)
-            self.debug("lrcStroke ", len(lrcStroke), rcCell)
+            lrcStroke.append(rc_cell)
+            self.debug("lrcStroke ", len(lrcStroke), rc_cell)
 
     def mod_path(self, bAddRemove):
         # disabled functionality (no longer required)
@@ -602,7 +595,6 @@ class EditorModel(object):
     def clear(self):
         self.env.rail.grid[:, :] = 0
         self.env.agents = []
-        self.env.agents_static = []
 
         self.redraw()
 
@@ -616,7 +608,7 @@ class EditorModel(object):
         self.redraw()
 
     def restart_agents(self):
-        self.env.agents = EnvAgent.list_from_static(self.env.agents_static)
+        self.env.restart_agents()
         self.redraw()
 
     def set_filename(self, filename):
@@ -634,7 +626,6 @@ class EditorModel(object):
 
             self.env.restart_agents()
             self.env.reset(False, False)
-            self.init_agents_static = None
             self.view.oRT.update_background()
             self.fix_env()
             self.set_env(self.env)
@@ -644,12 +635,7 @@ class EditorModel(object):
 
     def save(self):
         self.log("save to ", self.env_filename, " working dir: ", os.getcwd())
-        temp_store = self.env.agents
-        # clear agents before save , because we want the "init" position of the agent to expert
-        self.env.agents = []
         self.env.save(self.env_filename)
-        # reset agents current (current position)
-        self.env.agents = temp_store
 
     def save_image(self):
         self.view.oRT.gl.save_image('frame_{:04d}.bmp'.format(self.save_image_count))
@@ -689,7 +675,7 @@ class EditorModel(object):
         self.regen_size_height = size
 
     def find_agent_at(self, cell_row_col):
-        for agent_idx, agent in enumerate(self.env.agents_static):
+        for agent_idx, agent in enumerate(self.env.agents):
             if tuple(agent.position) == tuple(cell_row_col):
                 return agent_idx
         return None
@@ -709,15 +695,14 @@ class EditorModel(object):
             # No
             if self.selected_agent is None:
                 # Create a new agent and select it.
-                agent_static = EnvAgentStatic(position=cell_row_col, direction=0, target=cell_row_col, moving=False)
-                self.selected_agent = self.env.add_agent_static(agent_static)
+                agent = EnvAgent(position=cell_row_col, direction=0, target=cell_row_col, moving=False)
+                self.selected_agent = self.env.add_agent(agent)
                 self.view.oRT.update_background()
             else:
                 # Move the selected agent to this cell
-                agent_static = self.env.agents_static[self.selected_agent]
-                agent_static.position = cell_row_col
-                agent_static.old_position = cell_row_col
-                self.env.agents = []
+                agent = self.env.agents[self.selected_agent]
+                agent.position = cell_row_col
+                agent.old_position = cell_row_col
         else:
             # Yes
             # Have they clicked on the agent already selected?
@@ -728,13 +713,11 @@ class EditorModel(object):
                 # No - select the agent
                 self.selected_agent = agent_idx
 
-        self.init_agents_static = None
         self.redraw()
 
-    def add_target(self, rcCell):
+    def add_target(self, rc_cell):
         if self.selected_agent is not None:
-            self.env.agents_static[self.selected_agent].target = rcCell
-            self.init_agents_static = None
+            self.env.agents[self.selected_agent].target = rc_cell
             self.view.oRT.update_background()
             self.redraw()
 
@@ -752,11 +735,11 @@ class EditorModel(object):
         if self.debug_bool:
             self.log(*args, **kwargs)
 
-    def debug_cell(self, rcCell):
-        binTrans = self.env.rail.get_full_transitions(*rcCell)
+    def debug_cell(self, rc_cell):
+        binTrans = self.env.rail.get_full_transitions(*rc_cell)
         sbinTrans = format(binTrans, "#018b")[2:]
         self.debug("cell ",
-                   rcCell,
+                   rc_cell,
                    "Transitions: ",
                    binTrans,
                    sbinTrans,
diff --git a/flatland/utils/rendertools.py b/flatland/utils/rendertools.py
index fc96b22d737917e98ec8c5617151f4f30ae22d10..cc496cb94cd2ba0d927749bf813cd449bd70e236 100644
--- a/flatland/utils/rendertools.py
+++ b/flatland/utils/rendertools.py
@@ -77,7 +77,7 @@ class RenderTool(object):
     def update_background(self):
         # create background map
         targets = {}
-        for agent_idx, agent in enumerate(self.env.agents_static):
+        for agent_idx, agent in enumerate(self.env.agents):
             if agent is None:
                 continue
             targets[tuple(agent.target)] = agent_idx
@@ -93,10 +93,9 @@ class RenderTool(object):
         self.new_rail = True
 
     def plot_agents(self, targets=True, selected_agent=None):
-        color_map = self.gl.get_cmap('hsv',
-                                     lut=max(len(self.env.agents), len(self.env.agents_static) + 1))
+        color_map = self.gl.get_cmap('hsv', lut=(len(self.env.agents) + 1))
 
-        for agent_idx, agent in enumerate(self.env.agents_static):
+        for agent_idx, agent in enumerate(self.env.agents):
             if agent is None:
                 continue
             color = color_map(agent_idx)
@@ -515,7 +514,7 @@ class RenderTool(object):
             # store the targets
             targets = {}
             selected = {}
-            for agent_idx, agent in enumerate(self.env.agents_static):
+            for agent_idx, agent in enumerate(self.env.agents):
                 if agent is None:
                     continue
                 targets[tuple(agent.target)] = agent_idx
diff --git a/tests/test_distance_map.py b/tests/test_distance_map.py
index 3bed89b8ce0947c86593e2f1680ef6082f321d84..c6a96fbefff68c4dbe448fc666e94317729aae6b 100644
--- a/tests/test_distance_map.py
+++ b/tests/test_distance_map.py
@@ -33,13 +33,12 @@ def test_walker():
                   obs_builder_object=TreeObsForRailEnv(max_depth=2,
                                                        predictor=ShortestPathPredictorForRailEnv(max_depth=10)),
                   )
-    # reset to initialize agents_static
     env.reset()
 
     # set initial position and direction for testing...
-    env.agents_static[0].position = (0, 1)
-    env.agents_static[0].direction = 1
-    env.agents_static[0].target = (0, 0)
+    env.agents[0].position = (0, 1)
+    env.agents[0].direction = 1
+    env.agents[0].target = (0, 0)
 
     # reset to set agents from agents_static
     env.reset(False, False)
diff --git a/tests/test_flatland_core_transition_map.py b/tests/test_flatland_core_transition_map.py
index 0913e45959d08230a815c33d98fb6de8eb99d956..a569aa35534385698369980566c426cf72b7bb4b 100644
--- a/tests/test_flatland_core_transition_map.py
+++ b/tests/test_flatland_core_transition_map.py
@@ -53,13 +53,11 @@ def test_grid8_set_transitions():
 
 
 def check_path(env, rail, position, direction, target, expected, rendering=False):
-    agent = env.agents_static[0]
+    agent = env.agents[0]
     agent.position = position  # south dead-end
     agent.direction = direction  # north
     agent.target = target  # east dead-end
     agent.moving = True
-    # reset to set agents from agents_static
-    # env.reset(False, False)
     if rendering:
         renderer = RenderTool(env, gl="PILSVG")
         renderer.render_env(show=True, show_observations=False)
@@ -76,8 +74,6 @@ def test_path_exists(rendering=False):
                   number_of_agents=1,
                   obs_builder_object=TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv()),
                   )
-
-    # reset to initialize agents_static
     env.reset()
 
     check_path(
@@ -142,8 +138,6 @@ def test_path_not_exists(rendering=False):
                   number_of_agents=1,
                   obs_builder_object=TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv()),
                   )
-
-    # reset to initialize agents_static
     env.reset()
 
     check_path(
diff --git a/tests/test_flatland_envs_observations.py b/tests/test_flatland_envs_observations.py
index f425636467ec7cefa0169db006122999b862308a..4bce639c663dec947c8294d0ddbb9c3527afe62f 100644
--- a/tests/test_flatland_envs_observations.py
+++ b/tests/test_flatland_envs_observations.py
@@ -103,26 +103,37 @@ def test_reward_function_conflict(rendering=False):
                   obs_builder_object=TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv()),
                   )
     obs_builder: TreeObsForRailEnv = env.obs_builder
-    # initialize agents_static
     env.reset()
 
     # set the initial position
-    agent = env.agents_static[0]
+    agent = env.agents[0]
     agent.position = (5, 6)  # south dead-end
+    agent.initial_position = (5, 6)  # south dead-end
     agent.direction = 0  # north
+    agent.initial_direction = 0  # north
     agent.target = (3, 9)  # east dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    agent = env.agents_static[1]
+    agent = env.agents[1]
     agent.position = (3, 8)  # east dead-end
+    agent.initial_position = (3, 8)  # east dead-end
     agent.direction = 3  # west
+    agent.initial_direction = 3  # west
     agent.target = (6, 6)  # south dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    # reset to set agents from agents_static
     env.reset(False, False)
+    env.agents[0].moving = True
+    env.agents[1].moving = True
+    env.agents[0].status = RailAgentStatus.ACTIVE
+    env.agents[1].status = RailAgentStatus.ACTIVE
+    env.agents[0].position = (5, 6)
+    env.agents[1].position = (3, 8)
+    print("\n")
+    print(env.agents[0])
+    print(env.agents[1])
 
     if rendering:
         renderer = RenderTool(env, gl="PILSVG")
@@ -185,28 +196,34 @@ def test_reward_function_waiting(rendering=False):
                   remove_agents_at_target=False
                   )
     obs_builder: TreeObsForRailEnv = env.obs_builder
-    # initialize agents_static
     env.reset()
 
     # set the initial position
-    agent = env.agents_static[0]
+    agent = env.agents[0]
     agent.initial_position = (3, 8)  # east dead-end
     agent.position = (3, 8)  # east dead-end
     agent.direction = 3  # west
+    agent.initial_direction = 3  # west
     agent.target = (3, 1)  # west dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    agent = env.agents_static[1]
+    agent = env.agents[1]
     agent.initial_position = (5, 6)  # south dead-end
     agent.position = (5, 6)  # south dead-end
     agent.direction = 0  # north
+    agent.initial_direction = 0  # north
     agent.target = (3, 8)  # east dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    # reset to set agents from agents_static
     env.reset(False, False)
+    env.agents[0].moving = True
+    env.agents[1].moving = True
+    env.agents[0].status = RailAgentStatus.ACTIVE
+    env.agents[1].status = RailAgentStatus.ACTIVE
+    env.agents[0].position = (3, 8)
+    env.agents[1].position = (5, 6)
 
     if rendering:
         renderer = RenderTool(env, gl="PILSVG")
diff --git a/tests/test_flatland_envs_predictions.py b/tests/test_flatland_envs_predictions.py
index 280d1d1143d06b9b00832b9c3eb6cbf4add0ffb2..4ea41c4a7a4f82e816a2ac926a35dddfff920cbe 100644
--- a/tests/test_flatland_envs_predictions.py
+++ b/tests/test_flatland_envs_predictions.py
@@ -28,15 +28,14 @@ def test_dummy_predictor(rendering=False):
                   number_of_agents=1,
                   obs_builder_object=TreeObsForRailEnv(max_depth=2, predictor=DummyPredictorForRailEnv(max_depth=10)),
                   )
-    # reset to initialize agents_static
     env.reset()
 
     # set initial position and direction for testing...
-    env.agents_static[0].initial_position = (5, 6)
-    env.agents_static[0].direction = 0
-    env.agents_static[0].target = (3, 0)
+    env.agents[0].initial_position = (5, 6)
+    env.agents[0].initial_direction = 0
+    env.agents[0].direction = 0
+    env.agents[0].target = (3, 0)
 
-    # reset to set agents from agents_static
     env.reset(False, False)
     env.set_agent_active(0)
 
@@ -120,20 +119,18 @@ def test_shortest_path_predictor(rendering=False):
                   number_of_agents=1,
                   obs_builder_object=TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv()),
                   )
-
-    # reset to initialize agents_static
     env.reset()
 
     # set the initial position
-    agent = env.agents_static[0]
+    agent = env.agents[0]
     agent.initial_position = (5, 6)  # south dead-end
     agent.position = (5, 6)  # south dead-end
     agent.direction = 0  # north
+    agent.initial_direction = 0  # north
     agent.target = (3, 9)  # east dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    # reset to set agents from agents_static
     env.reset(False, False)
 
     if rendering:
@@ -258,27 +255,27 @@ def test_shortest_path_predictor_conflicts(rendering=False):
                   number_of_agents=2,
                   obs_builder_object=TreeObsForRailEnv(max_depth=2, predictor=ShortestPathPredictorForRailEnv()),
                   )
-    # initialize agents_static
     env.reset()
 
     # set the initial position
-    agent = env.agents_static[0]
+    agent = env.agents[0]
     agent.initial_position = (5, 6)  # south dead-end
     agent.position = (5, 6)  # south dead-end
     agent.direction = 0  # north
+    agent.initial_direction = 0  # north
     agent.target = (3, 9)  # east dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    agent = env.agents_static[1]
+    agent = env.agents[1]
     agent.initial_position = (3, 8)  # east dead-end
     agent.position = (3, 8)  # east dead-end
     agent.direction = 3  # west
+    agent.initial_direction = 3  # west
     agent.target = (6, 6)  # south dead-end
     agent.moving = True
     agent.status = RailAgentStatus.ACTIVE
 
-    # reset to set agents from agents_static
     observations, info = env.reset(False, False, True)
 
     if rendering:
diff --git a/tests/test_flatland_envs_rail_env.py b/tests/test_flatland_envs_rail_env.py
index dc4c78f9a6796d8eef3cfbeb4c54409f14406415..00ce283e09d1d8b74524423acffe3311c6117aab 100644
--- a/tests/test_flatland_envs_rail_env.py
+++ b/tests/test_flatland_envs_rail_env.py
@@ -5,7 +5,6 @@ import numpy as np
 from flatland.core.grid.rail_env_grid import RailEnvTransitions
 from flatland.core.transition_map import GridTransitionMap
 from flatland.envs.agent_utils import EnvAgent
-from flatland.envs.agent_utils import EnvAgentStatic
 from flatland.envs.observations import GlobalObsForRailEnv, TreeObsForRailEnv
 from flatland.envs.predictions import ShortestPathPredictorForRailEnv
 from flatland.envs.rail_env import RailEnv
@@ -22,8 +21,8 @@ def test_load_env():
     env.reset()
     env.load_resource('env_data.tests', 'test-10x10.mpk')
 
-    agent_static = EnvAgentStatic((0, 0), 2, (5, 5), False)
-    env.add_agent_static(agent_static)
+    agent_static = EnvAgent((0, 0), 2, (5, 5), False)
+    env.add_agent(agent_static)
     assert env.get_num_agents() == 1
 
 
@@ -33,23 +32,23 @@ def test_save_load():
                   schedule_generator=complex_schedule_generator(),
                   number_of_agents=2)
     env.reset()
-    agent_1_pos = env.agents_static[0].position
-    agent_1_dir = env.agents_static[0].direction
-    agent_1_tar = env.agents_static[0].target
-    agent_2_pos = env.agents_static[1].position
-    agent_2_dir = env.agents_static[1].direction
-    agent_2_tar = env.agents_static[1].target
+    agent_1_pos = env.agents[0].position
+    agent_1_dir = env.agents[0].direction
+    agent_1_tar = env.agents[0].target
+    agent_2_pos = env.agents[1].position
+    agent_2_dir = env.agents[1].direction
+    agent_2_tar = env.agents[1].target
     env.save("test_save.dat")
     env.load("test_save.dat")
     assert (env.width == 10)
     assert (env.height == 10)
     assert (len(env.agents) == 2)
-    assert (agent_1_pos == env.agents_static[0].position)
-    assert (agent_1_dir == env.agents_static[0].direction)
-    assert (agent_1_tar == env.agents_static[0].target)
-    assert (agent_2_pos == env.agents_static[1].position)
-    assert (agent_2_dir == env.agents_static[1].direction)
-    assert (agent_2_tar == env.agents_static[1].target)
+    assert (agent_1_pos == env.agents[0].position)
+    assert (agent_1_dir == env.agents[0].direction)
+    assert (agent_1_tar == env.agents[0].target)
+    assert (agent_2_pos == env.agents[1].position)
+    assert (agent_2_dir == env.agents[1].direction)
+    assert (agent_2_tar == env.agents[1].target)
 
 
 def test_rail_environment_single_agent():
@@ -164,10 +163,10 @@ def test_dead_end():
 
     # We try the configuration in the 4 directions:
     rail_env.reset()
-    rail_env.agents = [EnvAgent(initial_position=(0, 2), direction=1, target=(0, 0), moving=False)]
+    rail_env.agents = [EnvAgent(initial_position=(0, 2), initial_direction=1, direction=1, target=(0, 0), moving=False)]
 
     rail_env.reset()
-    rail_env.agents = [EnvAgent(initial_position=(0, 2), direction=3, target=(0, 4), moving=False)]
+    rail_env.agents = [EnvAgent(initial_position=(0, 2), initial_direction=3, direction=3, target=(0, 4), moving=False)]
 
     # In the vertical configuration:
     rail_map = np.array(
@@ -188,10 +187,10 @@ def test_dead_end():
                        obs_builder_object=GlobalObsForRailEnv())
 
     rail_env.reset()
-    rail_env.agents = [EnvAgent(initial_position=(2, 0), direction=2, target=(0, 0), moving=False)]
+    rail_env.agents = [EnvAgent(initial_position=(2, 0), initial_direction=2, direction=2, target=(0, 0), moving=False)]
 
     rail_env.reset()
-    rail_env.agents = [EnvAgent(initial_position=(2, 0), direction=0, target=(4, 0), moving=False)]
+    rail_env.agents = [EnvAgent(initial_position=(2, 0), initial_direction=0, direction=0, target=(4, 0), moving=False)]
 
     # TODO make assertions
 
@@ -246,7 +245,6 @@ def test_rail_env_reset():
     env.reset()
     env.save(file_name)
     dist_map_shape = np.shape(env.distance_map.get())
-    # initialize agents_static
     rails_initial = env.rail.grid
     agents_initial = env.agents
 
diff --git a/tests/test_flatland_envs_rail_env_shortest_paths.py b/tests/test_flatland_envs_rail_env_shortest_paths.py
index dd64d370077ab12950f0189065c15652e6ad1c6d..8b066028c2c526c1220c32be3356e6eadcd9a117 100644
--- a/tests/test_flatland_envs_rail_env_shortest_paths.py
+++ b/tests/test_flatland_envs_rail_env_shortest_paths.py
@@ -1,6 +1,7 @@
 import sys
 
 import numpy as np
+import pytest
 
 from flatland.core.grid.grid4 import Grid4TransitionsEnum
 from flatland.envs.observations import TreeObsForRailEnv
@@ -26,14 +27,13 @@ def test_get_shortest_paths_unreachable():
     env.reset()
 
     # set the initial position
-    agent = env.agents_static[0]
+    agent = env.agents[0]
     agent.position = (3, 1)  # west dead-end
     agent.initial_position = (3, 1)  # west dead-end
     agent.direction = Grid4TransitionsEnum.WEST
     agent.target = (3, 9)  # east dead-end
     agent.moving = True
 
-    # reset to set agents from agents_static
     env.reset(False, False)
 
     actual = get_shortest_paths(env.distance_map)
@@ -42,6 +42,9 @@ def test_get_shortest_paths_unreachable():
     assert actual == expected, "actual={},expected={}".format(actual, expected)
 
 
+# todo file test_002.pkl has to be generated automatically
+# see https://gitlab.aicrowd.com/flatland/flatland/issues/279
+@pytest.mark.skip
 def test_get_shortest_paths():
     env = load_flatland_environment_from_file('test_002.pkl', 'env_data.tests')
     env.reset()
@@ -171,6 +174,9 @@ def test_get_shortest_paths():
             "[{}] actual={},expected={}".format(agent_handle, actual[agent_handle], expected[agent_handle])
 
 
+# todo file test_002.pkl has to be generated automatically
+# see https://gitlab.aicrowd.com/flatland/flatland/issues/279
+@pytest.mark.skip
 def test_get_shortest_paths_max_depth():
     env = load_flatland_environment_from_file('test_002.pkl', 'env_data.tests')
     env.reset()
@@ -200,6 +206,9 @@ def test_get_shortest_paths_max_depth():
             "[{}] actual={},expected={}".format(agent_handle, actual[agent_handle], expected[agent_handle])
 
 
+# todo file Level_distance_map_shortest_path.pkl has to be generated automatically
+# see https://gitlab.aicrowd.com/flatland/flatland/issues/279
+@pytest.mark.skip
 def test_get_shortest_paths_agent_handle():
     env = load_flatland_environment_from_file('Level_distance_map_shortest_path.pkl', 'env_data.tests')
     env.reset()
diff --git a/tests/test_flatland_malfunction.py b/tests/test_flatland_malfunction.py
index d9fa74ed364aed87ba936f74c39b0e4ab31771c0..7e2343770e86fbe70ac05eb0019ca522d3985c29 100644
--- a/tests/test_flatland_malfunction.py
+++ b/tests/test_flatland_malfunction.py
@@ -66,8 +66,7 @@ class SingleAgentNavigationObs(ObservationBuilder):
 
 def test_malfunction_process():
     # Set fixed malfunction duration for this test
-    stochastic_data = {'prop_malfunction': 1.,
-                       'malfunction_rate': 1000,
+    stochastic_data = {'malfunction_rate': 1,
                        'min_duration': 3,
                        'max_duration': 3}
 
@@ -81,14 +80,8 @@ def test_malfunction_process():
                   stochastic_data=stochastic_data,  # Malfunction data generator
                   obs_builder_object=SingleAgentNavigationObs()
                   )
-    # reset to initialize agents_static
     obs, info = env.reset(False, False, True, random_seed=10)
 
-    # Check that a initial duration for malfunction was assigned
-    assert env.agents[0].malfunction_data['next_malfunction'] > 0
-    for agent in env.agents:
-        agent.status = RailAgentStatus.ACTIVE
-
     agent_halts = 0
     total_down_time = 0
     agent_old_position = env.agents[0].position
@@ -101,12 +94,6 @@ def test_malfunction_process():
         for i in range(len(obs)):
             actions[i] = np.argmax(obs[i]) + 1
 
-        if step % 5 == 0:
-            # Stop the agent and set it to be malfunctioning
-            env.agents[0].malfunction_data['malfunction'] = -1
-            env.agents[0].malfunction_data['next_malfunction'] = 0
-            agent_halts += 1
-
         obs, all_rewards, done, _ = env.step(actions)
 
         if env.agents[0].malfunction_data['malfunction'] > 0:
@@ -122,12 +109,9 @@ def test_malfunction_process():
         total_down_time += env.agents[0].malfunction_data['malfunction']
 
     # Check that the appropriate number of malfunctions is achieved
-    assert env.agents[0].malfunction_data['nr_malfunctions'] == 20, "Actual {}".format(
+    assert env.agents[0].malfunction_data['nr_malfunctions'] == 23, "Actual {}".format(
         env.agents[0].malfunction_data['nr_malfunctions'])
 
-    # Check that 20 stops where performed
-    assert agent_halts == 20
-
     # Check that malfunctioning data was standing around
     assert total_down_time > 0
 
@@ -135,8 +119,7 @@ def test_malfunction_process():
 def test_malfunction_process_statistically():
     """Tests hat malfunctions are produced by stochastic_data!"""
     # Set fixed malfunction duration for this test
-    stochastic_data = {'prop_malfunction': 1.,
-                       'malfunction_rate': 5,
+    stochastic_data = {'malfunction_rate': 5,
                        'min_duration': 5,
                        'max_duration': 5}
 
@@ -151,21 +134,21 @@ def test_malfunction_process_statistically():
                   obs_builder_object=SingleAgentNavigationObs()
                   )
 
-    # reset to initialize agents_static
     env.reset(True, True, False, random_seed=10)
 
     env.agents[0].target = (0, 0)
-
-    agent_malfunction_list = [[0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3, 2, 1, 0],
-                              [0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 6, 5],
-                              [0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 0, 6, 5, 4],
-                              [0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 6, 5, 4],
-                              [6, 6, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0],
-                              [6, 6, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3, 2, 1, 0, 0, 6, 5, 4, 3],
-                              [0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3, 2, 1, 0, 6, 5],
-                              [0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 0, 6, 5, 4, 3, 2, 1, 0],
-                              [0, 0, 0, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0, 6, 5, 4, 3, 2, 1],
-                              [6, 6, 6, 6, 5, 4, 3, 2, 1, 0, 0, 0, 0, 6, 5, 4, 3, 2, 1, 0]]
+    # Next line only for test generation
+    #agent_malfunction_list = [[] for i in range(10)]
+    agent_malfunction_list = [[0, 5, 4, 3, 2, 1, 0, 0, 0, 5, 4, 3, 2, 1, 0, 5, 4, 3, 2, 1],
+     [0, 0, 0, 0, 0, 0, 5, 4, 3, 2, 1, 0, 0, 5, 4, 3, 2, 1, 0, 0],
+     [5, 4, 3, 2, 1, 0, 5, 4, 3, 2, 1, 0, 0, 0, 5, 4, 3, 2, 1, 0],
+     [0, 5, 4, 3, 2, 1, 0, 0, 0, 5, 4, 3, 2, 1, 0, 5, 4, 3, 2, 1],
+     [0, 0, 0, 0, 0, 0, 0, 0, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0],
+     [0, 0, 0, 0, 0, 0, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+     [0, 0, 0, 0, 0, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 5, 4, 3, 2, 1],
+     [0, 0, 0, 5, 4, 3, 2, 1, 0, 0, 5, 4, 3, 2, 1, 0, 5, 4, 3, 2],
+     [5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+     [5, 4, 3, 2, 1, 0, 0, 0, 0, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0]]
 
     for step in range(20):
         action_dict: Dict[int, RailEnvActions] = {}
@@ -173,16 +156,16 @@ def test_malfunction_process_statistically():
             # We randomly select an action
             action_dict[agent_idx] = RailEnvActions(np.random.randint(4))
             # For generating tests only:
-            # agent_malfunction_list[agent_idx].append(env.agents[agent_idx].malfunction_data['malfunction'])
+            #agent_malfunction_list[agent_idx].append(env.agents[agent_idx].malfunction_data['malfunction'])
             assert env.agents[agent_idx].malfunction_data['malfunction'] == agent_malfunction_list[agent_idx][step]
         env.step(action_dict)
+    #print(agent_malfunction_list)
 
 
 def test_malfunction_before_entry():
-    """Tests that malfunctions are produced by stochastic_data!"""
+    """Tests that malfunctions are working properly for agents before entering the environment!"""
     # Set fixed malfunction duration for this test
-    stochastic_data = {'prop_malfunction': 1.,
-                       'malfunction_rate': 1,
+    stochastic_data = {'malfunction_rate': 2,
                        'min_duration': 10,
                        'max_duration': 10}
 
@@ -191,55 +174,69 @@ def test_malfunction_before_entry():
     env = RailEnv(width=25,
                   height=30,
                   rail_generator=rail_from_grid_transition_map(rail),
-                  schedule_generator=random_schedule_generator(seed=2),  # seed 12
+                  schedule_generator=random_schedule_generator(seed=1),  # seed 12
                   number_of_agents=10,
                   random_seed=1,
                   stochastic_data=stochastic_data,  # Malfunction data generator
                   )
-    # reset to initialize agents_static
     env.reset(False, False, False, random_seed=10)
     env.agents[0].target = (0, 0)
 
-    # Print for test generation
-    assert env.agents[0].malfunction_data['malfunction'] == 11
-    assert env.agents[1].malfunction_data['malfunction'] == 11
-    assert env.agents[2].malfunction_data['malfunction'] == 11
-    assert env.agents[3].malfunction_data['malfunction'] == 11
-    assert env.agents[4].malfunction_data['malfunction'] == 11
-    assert env.agents[5].malfunction_data['malfunction'] == 11
-    assert env.agents[6].malfunction_data['malfunction'] == 11
-    assert env.agents[7].malfunction_data['malfunction'] == 11
-    assert env.agents[8].malfunction_data['malfunction'] == 11
-    assert env.agents[9].malfunction_data['malfunction'] == 11
+    # Test initial malfunction values for all agents
+    # we want some agents to be malfuncitoning already and some to be working
+    # we want different next_malfunction values for the agents
+    assert env.agents[0].malfunction_data['malfunction'] == 0
+    assert env.agents[1].malfunction_data['malfunction'] == 0
+    assert env.agents[2].malfunction_data['malfunction'] == 10
+    assert env.agents[3].malfunction_data['malfunction'] == 0
+    assert env.agents[4].malfunction_data['malfunction'] == 0
+    assert env.agents[5].malfunction_data['malfunction'] == 0
+    assert env.agents[6].malfunction_data['malfunction'] == 0
+    assert env.agents[7].malfunction_data['malfunction'] == 0
+    assert env.agents[8].malfunction_data['malfunction'] == 10
+    assert env.agents[9].malfunction_data['malfunction'] == 10
+
+    #for a in range(10):
+    #  print("assert env.agents[{}].malfunction_data['malfunction'] == {}".format(a,env.agents[a].malfunction_data['malfunction']))
+
+
+def test_malfunction_values_and_behavior():
+    """
+    Test the malfunction counts down as desired
+    Returns
+    -------
 
-    for step in range(20):
-        action_dict: Dict[int, RailEnvActions] = {}
-        for agent in env.agents:
-            # We randomly select an action
-            action_dict[agent.handle] = RailEnvActions(2)
-            if step < 10:
-                action_dict[agent.handle] = RailEnvActions(0)
+    """
+    # Set fixed malfunction duration for this test
+
+    rail, rail_map = make_simple_rail2()
+    action_dict: Dict[int, RailEnvActions] = {}
+    stochastic_data = {'malfunction_rate': 0.001,
+                       'min_duration': 10,
+                       'max_duration': 10}
+    env = RailEnv(width=25,
+                  height=30,
+                  rail_generator=rail_from_grid_transition_map(rail),
+                  schedule_generator=random_schedule_generator(seed=2),  # seed 12
+                  stochastic_data=stochastic_data,
+                  number_of_agents=1,
+                  random_seed=1,
+                  )
 
+    env.reset(False, False, activate_agents=True, random_seed=10)
+
+    # Assertions
+    assert_list = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 10, 9, 8, 7, 6, 5]
+    print("[")
+    for time_step in range(15):
+        # Move in the env
         env.step(action_dict)
-    assert env.agents[1].malfunction_data['malfunction'] == 2
-    assert env.agents[2].malfunction_data['malfunction'] == 2
-    assert env.agents[3].malfunction_data['malfunction'] == 2
-    assert env.agents[4].malfunction_data['malfunction'] == 2
-    assert env.agents[5].malfunction_data['malfunction'] == 2
-    assert env.agents[6].malfunction_data['malfunction'] == 2
-    assert env.agents[7].malfunction_data['malfunction'] == 2
-    assert env.agents[8].malfunction_data['malfunction'] == 2
-    assert env.agents[9].malfunction_data['malfunction'] == 2
-
-    # for a in range(env.get_num_agents()):
-    #    print("assert env.agents[{}].malfunction_data['malfunction'] == {}".format(a,
-    #                                                                               env.agents[a].malfunction_data[
-    #                                                                                   'malfunction']))
+        # Check that next_step decreases as expected
+        assert env.agents[0].malfunction_data['malfunction'] == assert_list[time_step]
 
 
 def test_initial_malfunction():
-    stochastic_data = {'prop_malfunction': 1.,  # Percentage of defective agents
-                       'malfunction_rate': 100,  # Rate of malfunction occurence
+    stochastic_data = {'malfunction_rate': 1000,  # Rate of malfunction occurence
                        'min_duration': 2,  # Minimal duration of malfunction
                        'max_duration': 5  # Max duration of malfunction
                        }
@@ -254,7 +251,6 @@ def test_initial_malfunction():
                   stochastic_data=stochastic_data,  # Malfunction data generator
                   obs_builder_object=SingleAgentNavigationObs()
                   )
-    # reset to initialize agents_static
     env.reset(False, False, True, random_seed=10)
     print(env.agents[0].malfunction_data)
     env.agents[0].target = (0, 5)
@@ -283,22 +279,22 @@ def test_initial_malfunction():
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.MOVE_FORWARD,
                 malfunction=1,
-                reward=env.start_penalty + env.step_penalty * 1.0
-                # malfunctioning ends: starting and running at speed 1.0
-            ),
+                reward=env.step_penalty
+
+            ),  # malfunctioning ends: starting and running at speed 1.0
             Replay(
-                position=(3, 3),
+                position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.MOVE_FORWARD,
                 malfunction=0,
-                reward=env.step_penalty * 1.0  # running at speed 1.0
+                reward=env.start_penalty + env.step_penalty * 1.0  # running at speed 1.0
             ),
             Replay(
-                position=(3, 4),
+                position=(3, 3),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.MOVE_FORWARD,
                 malfunction=0,
-                reward=env.step_penalty * 1.0  # running at speed 1.0
+                reward=env.step_penalty  # running at speed 1.0
             )
         ],
         speed=env.agents[0].speed_data['speed'],
@@ -346,7 +342,7 @@ def test_initial_malfunction_stop_moving():
                 position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.DO_NOTHING,
-                malfunction=3,
+                malfunction=2,
                 reward=env.step_penalty,  # full step penalty when stopped
                 status=RailAgentStatus.ACTIVE
             ),
@@ -357,7 +353,7 @@ def test_initial_malfunction_stop_moving():
                 position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.STOP_MOVING,
-                malfunction=2,
+                malfunction=1,
                 reward=env.step_penalty,  # full step penalty while stopped
                 status=RailAgentStatus.ACTIVE
             ),
@@ -366,7 +362,7 @@ def test_initial_malfunction_stop_moving():
                 position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.DO_NOTHING,
-                malfunction=1,
+                malfunction=0,
                 reward=env.step_penalty,  # full step penalty while stopped
                 status=RailAgentStatus.ACTIVE
             ),
@@ -416,7 +412,6 @@ def test_initial_malfunction_do_nothing():
                   number_of_agents=1,
                   stochastic_data=stochastic_data,  # Malfunction data generator
                   )
-    # reset to initialize agents_static
     env.reset()
     set_penalties_for_replay(env)
     replay_config = ReplayConfig(
@@ -434,7 +429,7 @@ def test_initial_malfunction_do_nothing():
                 position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.DO_NOTHING,
-                malfunction=3,
+                malfunction=2,
                 reward=env.step_penalty,  # full step penalty while malfunctioning
                 status=RailAgentStatus.ACTIVE
             ),
@@ -445,7 +440,7 @@ def test_initial_malfunction_do_nothing():
                 position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.DO_NOTHING,
-                malfunction=2,
+                malfunction=1,
                 reward=env.step_penalty,  # full step penalty while stopped
                 status=RailAgentStatus.ACTIVE
             ),
@@ -454,7 +449,7 @@ def test_initial_malfunction_do_nothing():
                 position=(3, 2),
                 direction=Grid4TransitionsEnum.EAST,
                 action=RailEnvActions.DO_NOTHING,
-                malfunction=1,
+                malfunction=0,
                 reward=env.step_penalty,  # full step penalty while stopped
                 status=RailAgentStatus.ACTIVE
             ),
@@ -484,45 +479,14 @@ def test_initial_malfunction_do_nothing():
     run_replay_config(env, [replay_config], activate_agents=False)
 
 
-def test_initial_nextmalfunction_not_below_zero():
-    random.seed(0)
-    np.random.seed(0)
-
-    stochastic_data = {'prop_malfunction': 1.,  # Percentage of defective agents
-                       'malfunction_rate': 70,  # Rate of malfunction occurence
-                       'min_duration': 2,  # Minimal duration of malfunction
-                       'max_duration': 5  # Max duration of malfunction
-                       }
-
-    rail, rail_map = make_simple_rail2()
-
-    env = RailEnv(width=25,
-                  height=30,
-                  rail_generator=rail_from_grid_transition_map(rail),
-                  schedule_generator=random_schedule_generator(),
-                  number_of_agents=1,
-                  stochastic_data=stochastic_data,  # Malfunction data generator
-                  obs_builder_object=SingleAgentNavigationObs()
-                  )
-    # reset to initialize agents_static
-    env.reset()
-    agent = env.agents[0]
-    env.step({})
-    # was next_malfunction was -1 befor the bugfix https://gitlab.aicrowd.com/flatland/flatland/issues/186
-    assert agent.malfunction_data['next_malfunction'] >= 0, \
-        "next_malfunction should be >=0, found {}".format(agent.malfunction_data['next_malfunction'])
-
-
 def tests_random_interference_from_outside():
     """Tests that malfunctions are produced by stochastic_data!"""
     # Set fixed malfunction duration for this test
-    stochastic_data = {'prop_malfunction': 1.,
-                       'malfunction_rate': 1,
+    stochastic_data = {'malfunction_rate': 1,
                        'min_duration': 10,
                        'max_duration': 10}
 
     rail, rail_map = make_simple_rail2()
-
     env = RailEnv(width=25,
                   height=30,
                   rail_generator=rail_from_grid_transition_map(rail),
@@ -532,11 +496,8 @@ def tests_random_interference_from_outside():
                   stochastic_data=stochastic_data,  # Malfunction data generator
                   )
     env.reset()
-    # reset to initialize agents_static
     env.agents[0].speed_data['speed'] = 0.33
-    env.agents[0].initial_position = (3, 0)
-    env.agents[0].target = (3, 9)
-    env.reset(False, False, False)
+    env.reset(False, False, False, random_seed=10)
     env_data = []
 
     for step in range(200):
@@ -565,13 +526,9 @@ def tests_random_interference_from_outside():
                   stochastic_data=stochastic_data,  # Malfunction data generator
                   )
     env.reset()
-    # reset to initialize agents_static
     env.agents[0].speed_data['speed'] = 0.33
-    env.agents[0].initial_position = (3, 0)
-    env.agents[0].target = (3, 9)
-    env.reset(False, False, False)
+    env.reset(False, False, False, random_seed=10)
 
-    # Print for test generation
     dummy_list = [1, 2, 6, 7, 8, 9, 4, 5, 4]
     for step in range(200):
         action_dict: Dict[int, RailEnvActions] = {}
@@ -586,3 +543,56 @@ def tests_random_interference_from_outside():
         _, reward, _, _ = env.step(action_dict)
         assert reward[0] == env_data[step][0]
         assert env.agents[0].position == env_data[step][1]
+
+
+def test_last_malfunction_step():
+    """
+    Test to check that agent moves when it is not malfunctioning
+
+    """
+
+    # Set fixed malfunction duration for this test
+    stochastic_data = {'malfunction_rate': 5,
+                       'min_duration': 4,
+                       'max_duration': 4}
+
+    rail, rail_map = make_simple_rail2()
+
+    env = RailEnv(width=25,
+                  height=30,
+                  rail_generator=rail_from_grid_transition_map(rail),
+                  schedule_generator=random_schedule_generator(seed=2),  # seed 12
+                  number_of_agents=1,
+                  random_seed=1,
+                  stochastic_data=stochastic_data,  # Malfunction data generator
+                  )
+    env.reset()
+    env.agents[0].speed_data['speed'] = 1. / 3.
+    env.agents[0].target = (0, 0)
+
+    env.reset(False, False, True)
+    # Force malfunction to be off at beginning and next malfunction to happen in 2 steps
+    env.agents[0].malfunction_data['next_malfunction'] = 2
+    env.agents[0].malfunction_data['malfunction'] = 0
+    env_data = []
+    for step in range(20):
+        action_dict: Dict[int, RailEnvActions] = {}
+        for agent in env.agents:
+            # Go forward all the time
+            action_dict[agent.handle] = RailEnvActions(2)
+
+        if env.agents[0].malfunction_data['malfunction'] < 1:
+            agent_can_move = True
+        # Store the position before and after the step
+        pre_position = env.agents[0].speed_data['position_fraction']
+        _, reward, _, _ = env.step(action_dict)
+        # Check if the agent is still allowed to move in this step
+
+        if env.agents[0].malfunction_data['malfunction'] > 0:
+            agent_can_move = False
+        post_position = env.agents[0].speed_data['position_fraction']
+        # Assert that the agent moved while it was still allowed
+        if agent_can_move:
+            assert pre_position != post_position
+        else:
+            assert post_position == pre_position
diff --git a/tests/test_generators.py b/tests/test_generators.py
index 1e69223daebd24c52137e12eed9dc43d188a9bbd..94e3d7faeb59599fed32493467a65464512aaebf 100644
--- a/tests/test_generators.py
+++ b/tests/test_generators.py
@@ -137,7 +137,6 @@ def tests_rail_from_file():
     env.reset()
     env.save(file_name)
     dist_map_shape = np.shape(env.distance_map.get())
-    # initialize agents_static
     rails_initial = env.rail.grid
     agents_initial = env.agents
 
@@ -173,7 +172,6 @@ def tests_rail_from_file():
     env2.reset()
     env2.save(file_name_2)
 
-    # initialize agents_static
     rails_initial_2 = env2.rail.grid
     agents_initial_2 = env2.agents
 
@@ -211,7 +209,6 @@ def tests_rail_from_file():
 
     # Test to save without distance map and load with generating distance map
 
-    # initialize agents_static
     env4 = RailEnv(width=1,
                    height=1,
                    rail_generator=rail_from_file(file_name_2),
diff --git a/tests/test_multi_speed.py b/tests/test_multi_speed.py
index 243ea078d0e920aeaab912f81553e17a5f37b1c1..f83990cc39bf73e50719b2291006eed68d1d1360 100644
--- a/tests/test_multi_speed.py
+++ b/tests/test_multi_speed.py
@@ -437,79 +437,79 @@ def test_multispeed_actions_malfunction_no_blocking():
                 reward=env.step_penalty * 0.5  # recovered: running at speed 0.5
             ),
             Replay(
-                position=(3, 7),
+                position=(3, 8),
                 direction=Grid4TransitionsEnum.WEST,
-                action=RailEnvActions.MOVE_FORWARD,
+                action=None,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             Replay(
                 position=(3, 7),
                 direction=Grid4TransitionsEnum.WEST,
-                action=None,
+                action=RailEnvActions.MOVE_FORWARD,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             Replay(
-                position=(3, 6),
+                position=(3, 7),
                 direction=Grid4TransitionsEnum.WEST,
-                action=RailEnvActions.MOVE_FORWARD,
+                action=None,
                 set_malfunction=2,  # recovers in two steps from now!
                 malfunction=2,
                 reward=env.step_penalty * 0.5  # step penalty for speed 0.5 when malfunctioning
             ),
             # agent recovers in this step; since we're at the beginning, we provide a different action although we're broken!
             Replay(
-                position=(3, 6),
+                position=(3, 7),
                 direction=Grid4TransitionsEnum.WEST,
-                action=RailEnvActions.MOVE_LEFT,
+                action=None,
                 malfunction=1,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             Replay(
-                position=(3, 6),
+                position=(3, 7),
                 direction=Grid4TransitionsEnum.WEST,
                 action=None,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             Replay(
-                position=(4, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 6),
+                direction=Grid4TransitionsEnum.WEST,
                 action=RailEnvActions.STOP_MOVING,
                 reward=env.stop_penalty + env.step_penalty * 0.5  # stopping and step penalty for speed 0.5
             ),
             Replay(
-                position=(4, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 6),
+                direction=Grid4TransitionsEnum.WEST,
                 action=RailEnvActions.STOP_MOVING,
                 reward=env.step_penalty * 0.5  # step penalty for speed 0.5 while stopped
             ),
             Replay(
-                position=(4, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 6),
+                direction=Grid4TransitionsEnum.WEST,
                 action=RailEnvActions.MOVE_FORWARD,
                 reward=env.start_penalty + env.step_penalty * 0.5  # starting and running at speed 0.5
             ),
             Replay(
-                position=(4, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 6),
+                direction=Grid4TransitionsEnum.WEST,
                 action=None,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             # DO_NOTHING keeps moving!
             Replay(
-                position=(5, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 5),
+                direction=Grid4TransitionsEnum.WEST,
                 action=RailEnvActions.DO_NOTHING,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             Replay(
-                position=(5, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 5),
+                direction=Grid4TransitionsEnum.WEST,
                 action=None,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
             Replay(
-                position=(6, 6),
-                direction=Grid4TransitionsEnum.SOUTH,
+                position=(3, 4),
+                direction=Grid4TransitionsEnum.WEST,
                 action=RailEnvActions.MOVE_FORWARD,
                 reward=env.step_penalty * 0.5  # running at speed 0.5
             ),
diff --git a/tests/test_utils.py b/tests/test_utils.py
index 1a98c161829dedc429465b6606101fd19784cbaa..e4fba2aebd795462971e8d1e8f16992c2affbac8 100644
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@@ -77,9 +77,10 @@ def run_replay_config(env: RailEnv, test_configs: List[ReplayConfig], rendering:
     for step in range(len(test_configs[0].replay)):
         if step == 0:
             for a, test_config in enumerate(test_configs):
-                agent: EnvAgent = env.agents_static[a]
+                agent: EnvAgent = env.agents[a]
                 # set the initial position
                 agent.initial_position = test_config.initial_position
+                agent.initial_direction = test_config.initial_direction
                 agent.direction = test_config.initial_direction
                 agent.target = test_config.target
                 agent.speed_data['speed'] = test_config.speed
@@ -118,9 +119,8 @@ def run_replay_config(env: RailEnv, test_configs: List[ReplayConfig], rendering:
                 # recognizes the agent as potentially malfuncitoning
                 # We also set next malfunction to infitiy to avoid interference with our tests
                 agent.malfunction_data['malfunction'] = replay.set_malfunction
-                agent.malfunction_data['malfunction_rate'] = max(agent.malfunction_data['malfunction_rate'], 1)
-                agent.malfunction_data['next_malfunction'] = np.inf
                 agent.malfunction_data['moving_before_malfunction'] = agent.moving
+                agent.malfunction_data['fixed'] = False
             _assert(a, agent.malfunction_data['malfunction'], replay.malfunction, 'malfunction')
         print(step)
         _, rewards_dict, _, info_dict = env.step(action_dict)