# Getting Started This page provides basic tutorials about the usage of mmdetection. For installation instructions, please see [INSTALL.md](INSTALL.md). ## Inference with pretrained models We provide testing scripts to evaluate a whole dataset (COCO, PASCAL VOC, etc.), and also some high-level apis for easier integration to other projects. ### Test a dataset - [x] single GPU testing - [x] multiple GPU testing - [x] visualize detection results You can use the following commands to test a dataset. ```shell # single-gpu testing python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] # multi-gpu testing ./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] ``` Optional arguments: - `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. - `EVAL_METRICS`: Items to be evaluated on the results. Allowed values are: `proposal_fast`, `proposal`, `bbox`, `segm`, `keypoints`. - `--show`: If specified, detection results will be ploted on the images and shown in a new window. Only applicable for single GPU testing. Examples: Assume that you have already downloaded the checkpoints to `checkpoints/`. 1. Test Faster R-CNN and show the results. ```shell python tools/test.py configs/faster_rcnn_r50_fpn_1x.py \ checkpoints/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth \ --show ``` 2. Test Mask R-CNN and evaluate the bbox and mask AP. ```shell python tools/test.py configs/mask_rcnn_r50_fpn_1x.py \ checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \ --out results.pkl --eval bbox mask ``` 3. Test Mask R-CNN with 8 GPUs, and evaluate the bbox and mask AP. ```shell ./tools/dist_test.sh configs/mask_rcnn_r50_fpn_1x.py \ checkpoints/mask_rcnn_r50_fpn_1x_20181010-069fa190.pth \ 8 --out results.pkl --eval bbox mask ``` ### High-level APIs for testing images. Here is an example of building the model and test given images. ```python import mmcv from mmcv.runner import load_checkpoint from mmdet.models import build_detector from mmdet.apis import inference_detector, show_result cfg = mmcv.Config.fromfile('configs/faster_rcnn_r50_fpn_1x.py') cfg.model.pretrained = None # construct the model and load checkpoint model = build_detector(cfg.model, test_cfg=cfg.test_cfg) _ = load_checkpoint(model, 'https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/faster_rcnn_r50_fpn_1x_20181010-3d1b3351.pth') # test a single image img = mmcv.imread('test.jpg') result = inference_detector(model, img, cfg) show_result(img, result) # test a list of images imgs = ['test1.jpg', 'test2.jpg'] for i, result in enumerate(inference_detector(model, imgs, cfg, device='cuda:0')): print(i, imgs[i]) show_result(imgs[i], result) ``` ## Train a model mmdetection implements distributed training and non-distributed training, which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. All outputs (log files and checkpoints) will be saved to the working directory, which is specified by `work_dir` in the config file. **\*Important\***: The default learning rate in config files is for 8 GPUs. If you use less or more than 8 GPUs, you need to set the learning rate proportional to the GPU num, e.g., 0.01 for 4 GPUs and 0.04 for 16 GPUs. ### Train with a single GPU ```shell python tools/train.py ${CONFIG_FILE} ``` If you want to specify the working directory in the command, you can add an argument `--work_dir ${YOUR_WORK_DIR}`. ### Train with multiple GPUs ```shell ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] ``` Optional arguments are: - `--validate` (recommended): Perform evaluation at every k (default=1) epochs during the training. - `--work_dir ${WORK_DIR}`: Override the working directory specified in the config file. - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. ### Train with multiple machines If you run mmdetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can just use the script `slurm_train.sh`. ```shell ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR} [${GPUS}] ``` Here is an example of using 16 GPUs to train Mask R-CNN on the dev partition. ```shell ./tools/slurm_train.sh dev mask_r50_1x configs/mask_rcnn_r50_fpn_1x.py /nfs/xxxx/mask_rcnn_r50_fpn_1x 16 ``` You can check [slurm_train.sh](tools/slurm_train.sh) for full arguments and environment variables. If you have just multiple machines connected with ethernet, you can refer to pytorch [launch utility](https://pytorch.org/docs/stable/distributed_deprecated.html#launch-utility). Usually it is slow if you do not have high speed networking like infiniband. ## How-to ### Use my own datasets The simplest way is to convert your dataset to existing dataset formats (COCO or PASCAL VOC). Here we show an example of adding a custom dataset of 5 classes, assuming it is also in COCO format. In `mmdet/datasets/my_dataset.py`: ```python from .coco import CocoDataset class MyDataset(CocoDataset): CLASSES = ('a', 'b', 'c', 'd', 'e') ``` In `mmdet/datasets/__init__.py`: ```python from .my_dataset import MyDataset ``` Then you can use `MyDataset` in config files, with the same API as CocoDataset. It is also fine if you do not want to convert the annotation format to COCO or PASCAL format. Actually, we define a simple annotation format and all existing datasets are processed to be compatible with it, either online or offline. The annotation of a dataset is a list of dict, each dict corresponds to an image. There are 3 field `filename` (relative path), `width`, `height` for testing, and an additional field `ann` for training. `ann` is also a dict containing at least 2 fields: `bboxes` and `labels`, both of which are numpy arrays. Some datasets may provide annotations like crowd/difficult/ignored bboxes, we use `bboxes_ignore` and `labels_ignore` to cover them. Here is an example. ``` [ { 'filename': 'a.jpg', 'width': 1280, 'height': 720, 'ann': { 'bboxes': <np.ndarray, float32> (n, 4), 'labels': <np.ndarray, float32> (n, ), 'bboxes_ignore': <np.ndarray, float32> (k, 4), 'labels_ignore': <np.ndarray, float32> (k, ) (optional field) } }, ... ] ``` There are two ways to work with custom datasets. - online conversion You can write a new Dataset class inherited from `CustomDataset`, and overwrite two methods `load_annotations(self, ann_file)` and `get_ann_info(self, idx)`, like [CocoDataset](mmdet/datasets/coco.py) and [VOCDataset](mmdet/datasets/voc.py). - offline conversion You can convert the annotation format to the expected format above and save it to a pickle or json file, like [pascal_voc.py](tools/convert_datasets/pascal_voc.py). Then you can simply use `CustomDataset`. ### Develop new components We basically categorize model components into 4 types. - backbone: usually a FCN network to extract feature maps, e.g., ResNet, MobileNet. - neck: the component between backbones and heads, e.g., FPN, PAFPN. - head: the component for specific tasks, e.g., bbox prediction and mask prediction. - roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align. Here we show how to develop new components with an example of MobileNet. 1. Create a new file `mmdet/models/backbones/mobilenet.py`. ```python import torch.nn as nn from ..registry import BACKBONES @BACKBONES.register class MobileNet(nn.Module): def __init__(self, arg1, arg2): pass def forward(x): # should return a tuple pass ``` 2. Import the module in `mmdet/models/backbones/__init__.py`. ```python from .mobilenet import MobileNet ``` 3. Use it in your config file. ```python model = dict( ... backbone=dict( type='MobileNet', arg1=xxx, arg2=xxx), ... ``` For more information on how it works, you can refer to [TECHNICAL_DETAILS.md](TECHNICAL_DETAILS.md) (TODO).