diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md
new file mode 100644
index 0000000000000000000000000000000000000000..e869f32837c7b5def6e63dffc846e1229a3af059
--- /dev/null
+++ b/GETTING_STARTED.md
@@ -0,0 +1,263 @@
+# Getting Started
+
+This page provides basic tutorials about the usage of mmdetection.
+For installation instructions, please see [INSTALL.md](INSTALL.md).
+
+## Inference with pretrained models
+
+We provide testing scripts to evaluate a whole dataset (COCO, PASCAL VOC, etc.),
+and also some high-level apis for easier integration to other projects.
+
+### Test a dataset
+
+- [x] single GPU testing
+- [x] multiple GPU testing
+- [x] visualize detection results
+
+You can use the following command to test a dataset.
+
+```shell
+python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--gpus ${GPU_NUM}] [--proc_per_gpu ${PROC_NUM}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]
+```
+
+Positional arguments:
+- `CONFIG_FILE`: Path to the config file of the corresponding model.
+- `CHECKPOINT_FILE`: Path to the checkpoint file.
+
+Optional arguments:
+- `GPU_NUM`: Number of GPUs used for testing. (default: 1)
+- `PROC_NUM`: Number of processes on each GPU. (default: 1)
+- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
+- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values are: `proposal_fast`, `proposal`, `bbox`, `segm`, `keypoints`.
+- `--show`: If specified, detection results will be ploted on the images and shown in a new window. Only applicable for single GPU testing.
+
+Examples:
+
+Assume that you have already downloaded the checkpoints to `checkpoints/`.
+
+1. Test Faster R-CNN and show the results.
+
+```shell
+python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \
+    checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \
+    --show
+```
+
+2. Test Mask R-CNN and evaluate the bbox and mask AP.
+
+```shell
+python tools/test.py configs/mask_rcnn_r50_fpn_1x.py \
+    checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
+    --out results.pkl --eval bbox mask
+```
+
+3. Test Mask R-CNN with 8 GPUs and 2 processes per GPU, and evaluate the bbox and mask AP.
+
+```shell
+python tools/test.py configs/mask_rcnn_r50_fpn_1x.py \
+    checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \
+    --gpus 8 --proc_per_gpu 2 --out results.pkl --eval bbox mask
+```
+
+### High-level APIs for testing images.
+
+Here is an example of building the model and test given images.
+
+```python
+import mmcv
+from mmcv.runner import load_checkpoint
+from mmdet.models import build_detector
+from mmdet.apis import inference_detector, show_result
+
+cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py')
+cfg.model.pretrained = None
+
+# construct the model and load checkpoint
+model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
+_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
+
+# test a single image
+img = mmcv.imread('test.jpg')
+result = inference_detector(model, img, cfg)
+show_result(img, result)
+
+# test a list of images
+imgs = ['test1.jpg', 'test2.jpg']
+for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')):
+    print(i, imgs[i])
+    show_result(imgs[i], result)
+```
+
+
+## Train a model
+
+mmdetection implements distributed training and non-distributed training,
+which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
+
+All outputs (log files and checkpoints) will be saved to the working directory,
+which is specified by `work_dir` in the config file.
+
+**\*Important\***: The default learning rate in config files is for 8 GPUs.
+If you use less or more than 8 GPUs, you need to set the learning rate proportional
+to the GPU num, e.g., 0.01 for 4 GPUs and 0.04 for 16 GPUs.
+
+### Train with a single GPU
+
+```shell
+python tools/train.py ${CONFIG_FILE}
+```
+
+If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`.
+
+### Train with multiple GPUs
+
+```shell
+./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
+```
+
+Optional arguments are:
+
+- `--validate` (recommended): Perform evaluation at every k (default=1) epochs during the training.
+- `--work_dir ${WORK_DIR}`: Override the working directory specified in the config file.
+- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
+
+### Train with multiple machines
+
+If you run mmdetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can just use the script `slurm_train.sh`.
+
+```shell
+./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [${GPUS}]
+```
+
+Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition.
+
+```shell
+./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x.py /nfs/xxxx/mask_rcnn_r50_fpn_1x 16
+```
+
+You can check [slurm_train.sh](tools/slurm_train.sh) for full arguments and environment variables.
+
+If you have just multiple machines connected with ethernet, you can refer to
+pytorch [launch utility](https://pytorch.org/docs/stable/distributed_deprecated.html#launch-utility).
+Usually it is slow if you do not have high speed networking like infiniband.
+
+
+## How-to
+
+### Use my own datasets
+
+The simplest way is to convert your dataset to existing dataset formats (COCO or PASCAL VOC).
+
+Here we show an example of adding a custom dataset of 5 classes, assuming it is also in COCO format.
+
+In `mmdet/datasets/my_dataset.py`:
+
+```python
+from .coco import CocoDataset
+
+
+class MyDataset(CocoDataset):
+
+    CLASSES = ('a', 'b', 'c', 'd', 'e')
+```
+
+In `mmdet/datasets/__init__.py`:
+
+```python
+from .my_dataset import MyDataset
+```
+
+Then you can use `MyDataset` in config files, with the same API as CocoDataset.
+
+
+It is also fine if you do not want to convert the annotation format to COCO or PASCAL format.
+Actually, we define a simple annotation format and all existing datasets are
+processed to be compatible with it, either online or offline.
+
+The annotation of a dataset is a list of dict, each dict corresponds to an image.
+There are 3 field `filename` (relative path), `width`, `height` for testing,
+and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields:
+`bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide
+annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore`
+to cover them.
+
+Here is an example.
+```
+[
+    {
+        'filename': 'a.jpg',
+        'width': 1280,
+        'height': 720,
+        'ann': {
+            'bboxes': <np.ndarray, float32> (n, 4),
+            'labels': <np.ndarray, float32> (n, ),
+            'bboxes_ignore': <np.ndarray, float32> (k, 4),
+            'labels_ignore': <np.ndarray, float32> (k, ) (optional field)
+        }
+    },
+    ...
+]
+```
+
+There are two ways to work with custom datasets.
+
+- online conversion
+
+  You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods
+  `load_annotations(self, ann_file)` and `get_ann_info(self, idx)`,
+  like [CocoDataset](mmdet/datasets/coco.py) and [VOCDataset](mmdet/datasets/voc.py).
+
+- offline conversion
+
+  You can convert the annotation format to the expected format above and save it to
+  a pickle or json file, like [pascal_voc.py](tools/convert_datasets/pascal_voc.py).
+  Then you can simply use `CustomDataset`.
+
+### Develop new components
+
+We basically categorize model components into 4 types.
+
+- backbone: usually a FCN network to extract feature maps, e.g., ResNet, MobileNet.
+- neck: the component between backbones and heads, e.g., FPN, PAFPN.
+- head: the component for specific tasks, e.g., bbox prediction and mask prediction.
+- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align.
+
+Here we show how to develop new components with an example of MobileNet.
+
+1. Create a new file `mmdet/models/backbones/mobilenet.py`.
+
+```python
+import torch.nn as nn
+
+from ..registry import BACKBONES
+
+
+@BACKBONES.register
+class MobileNet(nn.Module):
+
+    def __init__(self, arg1, arg2):
+        pass
+
+    def forward(x):  # should return a tuple
+        pass
+```
+
+2. Import the module in `mmdet/models/backbones/__init__.py`.
+
+```python
+from .mobilenet import MobileNet
+```
+
+3. Use it in your config file.
+
+```python
+model = dict(
+    ...
+    backbone=dict(
+        type='MobileNet',
+        arg1=xxx,
+        arg2=xxx),
+    ...
+```
+
+For more information on how it works, you can refer to [TECHNICAL_DETAILS.md](TECHNICAL_DETAILS.md) (TODO).
diff --git a/README.md b/README.md
index bffd59a3b8d814ac6e440af2c502abd63320b56d..63836324bf23b219b3199e8bf0ccc931f83c157d 100644
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@
 
 ## Introduction
 
-The master branch works with **PyTorch 1.0**. If you would like to use PyTorch 0.4.1,
+The master branch works with **PyTorch 1.0** or higher. If you would like to use PyTorch 0.4.1,
 please checkout to the [pytorch-0.4.1](https://github.com/open-mmlab/mmdetection/tree/pytorch-0.4.1) branch.
 
 mmdetection is an open source object detection toolbox based on PyTorch. It is
@@ -24,7 +24,7 @@ a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http
 - **Efficient**
 
   All basic bbox and mask operations run on GPUs now.
-  The training speed is about 5% ~ 20% faster than Detectron for different models.
+  The training speed is nearly 2x faster than Detectron and comparable to maskrcnn-benchmark.
 
 - **State of the art**
 
@@ -108,149 +108,8 @@ Please refer to [INSTALL.md](INSTALL.md) for installation and dataset preparatio
 
 ## Inference with pretrained models
 
-### Test a dataset
+Please see [GETTING_STARTED.md](GETTING_STARTED.md) for the basic usage of mmdetection.
 
-- [x] single GPU testing
-- [x] multiple GPU testing
-- [x] visualize detection results
-
-We allow to run one or multiple processes on each GPU, e.g. 8 processes on 8 GPU
-or 16 processes on 8 GPU. When the GPU workload is not very heavy for a single
-process, running multiple processes will accelerate the testing, which is specified
-with the argument `--proc_per_gpu <PROCESS_NUM>`.
-
-
-To test a dataset and save the results.
-
-```shell
-python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --gpus <GPU_NUM> --out <OUT_FILE>
-```
-
-To perform evaluation after testing, add `--eval <EVAL_TYPES>`. Supported types are:
-`[proposal_fast, proposal, bbox, segm, keypoints]`.
-`proposal_fast` denotes evaluating proposal recalls with our own implementation,
-others denote evaluating the corresponding metric with the official coco api.
-
-For example, to evaluate Mask R-CNN with 8 GPUs and save the result as `results.pkl`.
-
-```shell
-python tools/test.py configs/mask_rcnn_r50_fpn_1x.py <CHECKPOINT_FILE> --gpus 8 --out results.pkl --eval bbox segm
-```
-
-It is also convenient to visualize the results during testing by adding an argument `--show`.
-
-```shell
-python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --show
-```
-
-### Test image(s)
-
-We provide some high-level apis (experimental) to test an image.
-
-```python
-import mmcv
-from mmcv.runner import load_checkpoint
-from mmdet.models import build_detector
-from mmdet.apis import inference_detector, show_result
-
-cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py')
-cfg.model.pretrained = None
-
-# construct the model and load checkpoint
-model = build_detector(cfg.model, test_cfg=cfg.test_cfg)
-_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth')
-
-# test a single image
-img = mmcv.imread('test.jpg')
-result = inference_detector(model, img, cfg)
-show_result(img, result)
-
-# test a list of images
-imgs = ['test1.jpg', 'test2.jpg']
-for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')):
-    print(i, imgs[i])
-    show_result(imgs[i], result)
-```
-
-
-## Train a model
-
-mmdetection implements distributed training and non-distributed training,
-which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
-
-### Distributed training (Single or Multiples machines)
-
-mmdetection potentially supports multiple launch methods, e.g., PyTorchâ€™s built-in launch utility, slurm and MPI.
-
-We provide a training script using the launch utility provided by PyTorch.
-
-```shell
-./tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> [optional arguments]
-```
-
-Supported arguments are:
-
-- --validate: perform evaluation every k (default=1) epochs during the training.
-- --work_dir <WORK_DIR>: if specified, the path in config file will be replaced.
-
-Expected results in WORK_DIR:
-
-- log file
-- saved checkpoints (every k epochs, defaults=1)
-- a symbol link to the latest checkpoint
-
-**Important**: The default learning rate is for 8 GPUs. If you use less or more than 8 GPUs, you need to set the learning rate proportional to the GPU num. E.g., modify lr to 0.01 for 4 GPUs or 0.04 for 16 GPUs.
-
-### Non-distributed training
-
-Please refer to `tools/train.py` for non-distributed training, which is not recommended
-and left for debugging. Even on a single machine, distributed training is preferred.
-
-### Train on custom datasets
-
-We define a simple annotation format.
-
-The annotation of a dataset is a list of dict, each dict corresponds to an image.
-There are 3 field `filename` (relative path), `width`, `height` for testing,
-and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields:
-`bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide
-annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore`
-to cover them.
-
-Here is an example.
-```
-[
-    {
-        'filename': 'a.jpg',
-        'width': 1280,
-        'height': 720,
-        'ann': {
-            'bboxes': <np.ndarray> (n, 4),
-            'labels': <np.ndarray> (n, ),
-            'bboxes_ignore': <np.ndarray> (k, 4),
-            'labels_ignore': <np.ndarray> (k, ) (optional field)
-        }
-    },
-    ...
-]
-```
-
-There are two ways to work with custom datasets.
-
-- online conversion
-
-  You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods
-  `load_annotations(self, ann_file)` and `get_ann_info(self, idx)`, like [CocoDataset](mmdet/datasets/coco.py) and [VOCDataset](mmdet/datasets/voc.py).
-
-- offline conversion
-
-  You can convert the annotation format to the expected format above and save it to
-  a pickle or json file, like [pascal_voc.py](tools/convert_datasets/pascal_voc.py).
-  Then you can simply use `CustomDataset`.
-
-## Technical details
-
-Some implementation details and project structures are described in the [technical details](TECHNICAL_DETAILS.md).
 
 ## Citation