Skip to content
Snippets Groups Projects
Unverified Commit ae856e11 authored by Kai Chen's avatar Kai Chen Committed by GitHub
Browse files

Update some docs (#852)

* update some docs

* update the lr adjusting rule
parent d95727b2
No related branches found
No related tags found
No related merge requests found
...@@ -27,7 +27,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [- ...@@ -27,7 +27,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [-
Optional arguments: Optional arguments:
- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. - `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.
- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values are: `proposal_fast`, `proposal`, `bbox`, `segm`, `keypoints`. - `EVAL_METRICS`: Items to be evaluated on the results. Allowed values are: `proposal_fast`, `proposal`, `bbox`, `segm`, `keypoints`.
- `--show`: If specified, detection results will be ploted on the images and shown in a new window. Only applicable for single GPU testing. - `--show`: If specified, detection results will be ploted on the images and shown in a new window. (Only applicable for single GPU testing.)
Examples: Examples:
...@@ -90,9 +90,8 @@ which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. ...@@ -90,9 +90,8 @@ which uses `MMDistributedDataParallel` and `MMDataParallel` respectively.
All outputs (log files and checkpoints) will be saved to the working directory, All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by `work_dir` in the config file. which is specified by `work_dir` in the config file.
**\*Important\***: The default learning rate in config files is for 8 GPUs. **\*Important\***: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16).
If you use less or more than 8 GPUs, you need to set the learning rate proportional According to the [Linear Scaling Rule](https://arxiv.org/abs/1706.02677), you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu.
to the GPU num, e.g., 0.01 for 4 GPUs and 0.04 for 16 GPUs.
### Train with a single GPU ### Train with a single GPU
...@@ -110,10 +109,14 @@ If you want to specify the working directory in the command, you can add an argu ...@@ -110,10 +109,14 @@ If you want to specify the working directory in the command, you can add an argu
Optional arguments are: Optional arguments are:
- `--validate` (recommended): Perform evaluation at every k (default=1) epochs during the training. - `--validate` (**strongly recommended**): Perform evaluation at every k (default value is 1, which can be modified like `this`[configs/mask_rcnn_r50_fpn_1x.py#L174]) epochs during the training.
- `--work_dir ${WORK_DIR}`: Override the working directory specified in the config file. - `--work_dir ${WORK_DIR}`: Override the working directory specified in the config file.
- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
Difference between `resume_from` and `load_from`:
`resume_from` loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
`load_from` only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
### Train with multiple machines ### Train with multiple machines
If you run mmdetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can just use the script `slurm_train.sh`. If you run mmdetection on a cluster managed with [slurm](https://slurm.schedmd.com/), you can just use the script `slurm_train.sh`.
......
...@@ -36,6 +36,9 @@ FPN structure in [Path Aggregation Network for Instance Segmentation](https://ar ...@@ -36,6 +36,9 @@ FPN structure in [Path Aggregation Network for Instance Segmentation](https://ar
1. create a new file in `mmdet/models/necks/pafpn.py`. 1. create a new file in `mmdet/models/necks/pafpn.py`.
```python ```python
from ..registry import NECKS
@NECKS.register
class PAFPN(nn.Module): class PAFPN(nn.Module):
def __init__(self, def __init__(self,
...@@ -97,3 +100,7 @@ Model parameters are only synchronized once at the begining. ...@@ -97,3 +100,7 @@ Model parameters are only synchronized once at the begining.
After a forward and backward pass, gradients will be allreduced among all GPUs, After a forward and backward pass, gradients will be allreduced among all GPUs,
and the optimizer will update model parameters. and the optimizer will update model parameters.
Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration. Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.
## Other information
For more information, please refer to our [technical report](https://arxiv.org/abs/1906.07155).
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment