diff --git a/TECHNICAL_DETAILS.md b/TECHNICAL_DETAILS.md index 43ea2e6e0f15e171b903f0f9281bdd0aa60058cb..85f1854940f2e4a41245ef80f2f39dbe3879f195 100644 --- a/TECHNICAL_DETAILS.md +++ b/TECHNICAL_DETAILS.md @@ -96,4 +96,4 @@ Each process keeps an isolated model, data loader, and optimizer. Model parameters are only synchronized once at the begining. After a forward and backward pass, gradients will be allreduced among all GPUs, and the optimizer will update model parameters. -Since the gradients are all reduced, the model parameter stays the same for all processes after the iteration. +Since the gradients are allreduced, the model parameter stays the same for all processes after the iteration.