diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md new file mode 100644 index 0000000000000000000000000000000000000000..e869f32837c7b5def6e63dffc846e1229a3af059 --- /dev/null +++ b/GETTING_STARTED.md @@ -0,0 +1,263 @@ +# Getting Started + +This page provides basic tutorials about the usage of mmdetection. +For installation instructions, please see [INSTALL.md](INSTALL.md). + +## Inference with pretrained models + +We provide testing scripts to evaluate a whole dataset (COCO, PASCAL VOC, etc.), +and also some high-level apis for easier integration to other projects. + +### Test a dataset + +- [x] single GPU testing +- [x] multiple GPU testing +- [x] visualize detection results + +You can use the following command to test a dataset. + +```shell +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--gpus ${GPU_NUM}] [--proc_per_gpu ${PROC_NUM}] [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] +``` + +Positional arguments: +- `CONFIG_FILE`: Path to the config file of the corresponding model. +- `CHECKPOINT_FILE`: Path to the checkpoint file. + +Optional arguments: +- `GPU_NUM`: Number of GPUs used for testing. (default: 1) +- `PROC_NUM`: Number of processes on each GPU. (default: 1) +- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. +- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values are: `proposal_fast`, `proposal`, `bbox`, `segm`, `keypoints`. +- `--show`: If specified, detection results will be ploted on the images and shown in a new window. Only applicable for single GPU testing. + +Examples: + +Assume that you have already downloaded the checkpoints to `checkpoints/`. + +1. Test Faster R-CNN and show the results. + +```shell +python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \ + checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \ + --show +``` + +2. Test Mask R-CNN and evaluate the bbox and mask AP. + +```shell +python tools/test.py configs/mask_rcnn_r50_fpn_1x.py \ + checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \ + --out results.pkl --eval bbox mask +``` + +3. Test Mask R-CNN with 8 GPUs and 2 processes per GPU, and evaluate the bbox and mask AP. + +```shell +python tools/test.py configs/mask_rcnn_r50_fpn_1x.py \ + checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \ + --gpus 8 --proc_per_gpu 2 --out results.pkl --eval bbox mask +``` + +### High-level APIs for testing images. + +Here is an example of building the model and test given images. + +```python +import mmcv +from mmcv.runner import load_checkpoint +from mmdet.models import build_detector +from mmdet.apis import inference_detector, show_result + +cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py') +cfg.model.pretrained = None + +# construct the model and load checkpoint +model = build_detector(cfg.model, test_cfg=cfg.test_cfg) +_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth') + +# test a single image +img = mmcv.imread('test.jpg') +result = inference_detector(model, img, cfg) +show_result(img, result) + +# test a list of images +imgs = ['test1.jpg', 'test2.jpg'] +for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')): + print(i, imgs[i]) + show_result(imgs[i], result) +``` + + +## Train a model + +mmdetection implements distributed training and non-distributed training, +which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. + +All outputs (log files and checkpoints) will be saved to the working directory, +which is specified by `work_dir` in the config file. + +**\*Important\***: The default learning rate in config files is for 8 GPUs. +If you use less or more than 8 GPUs, you need to set the learning rate proportional +to the GPU num, e.g., 0.01 for 4 GPUs and 0.04 for 16 GPUs. + +### Train with a single GPU + +```shell +python tools/train.py ${CONFIG_FILE} +``` + +If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`. + +### Train with multiple GPUs + +```shell +./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] +``` + +Optional arguments are: + +- `--validate` (recommended): Perform evaluation at every k (default=1) epochs during the training. +- `--work_dir ${WORK_DIR}`: Override the working directory specified in the config file. +- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. + +### Train with multiple machines + +If you run mmdetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can just use the script `slurm_train.sh`. + +```shell +./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [${GPUS}] +``` + +Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition. + +```shell +./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x.py /nfs/xxxx/mask_rcnn_r50_fpn_1x 16 +``` + +You can check [slurm_train.sh](tools/slurm_train.sh) for full arguments and environment variables. + +If you have just multiple machines connected with ethernet, you can refer to +pytorch [launch utility](https://pytorch.org/docs/stable/distributed_deprecated.html#launch-utility). +Usually it is slow if you do not have high speed networking like infiniband. + + +## How-to + +### Use my own datasets + +The simplest way is to convert your dataset to existing dataset formats (COCO or PASCAL VOC). + +Here we show an example of adding a custom dataset of 5 classes, assuming it is also in COCO format. + +In `mmdet/datasets/my_dataset.py`: + +```python +from .coco import CocoDataset + + +class MyDataset(CocoDataset): + + CLASSES = ('a', 'b', 'c', 'd', 'e') +``` + +In `mmdet/datasets/__init__.py`: + +```python +from .my_dataset import MyDataset +``` + +Then you can use `MyDataset` in config files, with the same API as CocoDataset. + + +It is also fine if you do not want to convert the annotation format to COCO or PASCAL format. +Actually, we define a simple annotation format and all existing datasets are +processed to be compatible with it, either online or offline. + +The annotation of a dataset is a list of dict, each dict corresponds to an image. +There are 3 field `filename` (relative path), `width`, `height` for testing, +and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields: +`bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide +annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore` +to cover them. + +Here is an example. +``` +[ + { + 'filename': 'a.jpg', + 'width': 1280, + 'height': 720, + 'ann': { + 'bboxes': <np.ndarray, float32> (n, 4), + 'labels': <np.ndarray, float32> (n, ), + 'bboxes_ignore': <np.ndarray, float32> (k, 4), + 'labels_ignore': <np.ndarray, float32> (k, ) (optional field) + } + }, + ... +] +``` + +There are two ways to work with custom datasets. + +- online conversion + + You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods + `load_annotations(self, ann_file)` and `get_ann_info(self, idx)`, + like [CocoDataset](mmdet/datasets/coco.py) and [VOCDataset](mmdet/datasets/voc.py). + +- offline conversion + + You can convert the annotation format to the expected format above and save it to + a pickle or json file, like [pascal_voc.py](tools/convert_datasets/pascal_voc.py). + Then you can simply use `CustomDataset`. + +### Develop new components + +We basically categorize model components into 4 types. + +- backbone: usually a FCN network to extract feature maps, e.g., ResNet, MobileNet. +- neck: the component between backbones and heads, e.g., FPN, PAFPN. +- head: the component for specific tasks, e.g., bbox prediction and mask prediction. +- roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align. + +Here we show how to develop new components with an example of MobileNet. + +1. Create a new file `mmdet/models/backbones/mobilenet.py`. + +```python +import torch.nn as nn + +from ..registry import BACKBONES + + +@BACKBONES.register +class MobileNet(nn.Module): + + def __init__(self, arg1, arg2): + pass + + def forward(x): # should return a tuple + pass +``` + +2. Import the module in `mmdet/models/backbones/__init__.py`. + +```python +from .mobilenet import MobileNet +``` + +3. Use it in your config file. + +```python +model = dict( + ... + backbone=dict( + type='MobileNet', + arg1=xxx, + arg2=xxx), + ... +``` + +For more information on how it works, you can refer to [TECHNICAL_DETAILS.md](TECHNICAL_DETAILS.md) (TODO). diff --git a/README.md b/README.md index bffd59a3b8d814ac6e440af2c502abd63320b56d..63836324bf23b219b3199e8bf0ccc931f83c157d 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ ## Introduction -The master branch works with **PyTorch 1.0**. If you would like to use PyTorch 0.4.1, +The master branch works with **PyTorch 1.0** or higher. If you would like to use PyTorch 0.4.1, please checkout to the [pytorch-0.4.1](https://github.com/open-mmlab/mmdetection/tree/pytorch-0.4.1) branch. mmdetection is an open source object detection toolbox based on PyTorch. It is @@ -24,7 +24,7 @@ a part of the open-mmlab project developed by [Multimedia Laboratory, CUHK](http - **Efficient** All basic bbox and mask operations run on GPUs now. - The training speed is about 5% ~ 20% faster than Detectron for different models. + The training speed is nearly 2x faster than Detectron and comparable to maskrcnn-benchmark. - **State of the art** @@ -108,149 +108,8 @@ Please refer to [INSTALL.md](INSTALL.md) for installation and dataset preparatio ## Inference with pretrained models -### Test a dataset +Please see [GETTING_STARTED.md](GETTING_STARTED.md) for the basic usage of mmdetection. -- [x] single GPU testing -- [x] multiple GPU testing -- [x] visualize detection results - -We allow to run one or multiple processes on each GPU, e.g. 8 processes on 8 GPU -or 16 processes on 8 GPU. When the GPU workload is not very heavy for a single -process, running multiple processes will accelerate the testing, which is specified -with the argument `--proc_per_gpu <PROCESS_NUM>`. - - -To test a dataset and save the results. - -```shell -python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --gpus <GPU_NUM> --out <OUT_FILE> -``` - -To perform evaluation after testing, add `--eval <EVAL_TYPES>`. Supported types are: -`[proposal_fast, proposal, bbox, segm, keypoints]`. -`proposal_fast` denotes evaluating proposal recalls with our own implementation, -others denote evaluating the corresponding metric with the official coco api. - -For example, to evaluate Mask R-CNN with 8 GPUs and save the result as `results.pkl`. - -```shell -python tools/test.py configs/mask_rcnn_r50_fpn_1x.py <CHECKPOINT_FILE> --gpus 8 --out results.pkl --eval bbox segm -``` - -It is also convenient to visualize the results during testing by adding an argument `--show`. - -```shell -python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --show -``` - -### Test image(s) - -We provide some high-level apis (experimental) to test an image. - -```python -import mmcv -from mmcv.runner import load_checkpoint -from mmdet.models import build_detector -from mmdet.apis import inference_detector, show_result - -cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py') -cfg.model.pretrained = None - -# construct the model and load checkpoint -model = build_detector(cfg.model, test_cfg=cfg.test_cfg) -_ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth') - -# test a single image -img = mmcv.imread('test.jpg') -result = inference_detector(model, img, cfg) -show_result(img, result) - -# test a list of images -imgs = ['test1.jpg', 'test2.jpg'] -for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')): - print(i, imgs[i]) - show_result(imgs[i], result) -``` - - -## Train a model - -mmdetection implements distributed training and non-distributed training, -which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. - -### Distributed training (Single or Multiples machines) - -mmdetection potentially supports multiple launch methods, e.g., PyTorch’s built-in launch utility, slurm and MPI. - -We provide a training script using the launch utility provided by PyTorch. - -```shell -./tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> [optional arguments] -``` - -Supported arguments are: - -- --validate: perform evaluation every k (default=1) epochs during the training. -- --work_dir <WORK_DIR>: if specified, the path in config file will be replaced. - -Expected results in WORK_DIR: - -- log file -- saved checkpoints (every k epochs, defaults=1) -- a symbol link to the latest checkpoint - -**Important**: The default learning rate is for 8 GPUs. If you use less or more than 8 GPUs, you need to set the learning rate proportional to the GPU num. E.g., modify lr to 0.01 for 4 GPUs or 0.04 for 16 GPUs. - -### Non-distributed training - -Please refer to `tools/train.py` for non-distributed training, which is not recommended -and left for debugging. Even on a single machine, distributed training is preferred. - -### Train on custom datasets - -We define a simple annotation format. - -The annotation of a dataset is a list of dict, each dict corresponds to an image. -There are 3 field `filename` (relative path), `width`, `height` for testing, -and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields: -`bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide -annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore` -to cover them. - -Here is an example. -``` -[ - { - 'filename': 'a.jpg', - 'width': 1280, - 'height': 720, - 'ann': { - 'bboxes': <np.ndarray> (n, 4), - 'labels': <np.ndarray> (n, ), - 'bboxes_ignore': <np.ndarray> (k, 4), - 'labels_ignore': <np.ndarray> (k, ) (optional field) - } - }, - ... -] -``` - -There are two ways to work with custom datasets. - -- online conversion - - You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods - `load_annotations(self, ann_file)` and `get_ann_info(self, idx)`, like [CocoDataset](mmdet/datasets/coco.py) and [VOCDataset](mmdet/datasets/voc.py). - -- offline conversion - - You can convert the annotation format to the expected format above and save it to - a pickle or json file, like [pascal_voc.py](tools/convert_datasets/pascal_voc.py). - Then you can simply use `CustomDataset`. - -## Technical details - -Some implementation details and project structures are described in the [technical details](TECHNICAL_DETAILS.md). ## Citation