Lines Matching full:backward
20 it with DDP, and then runs one forward pass, one backward pass, and an optimizer
49 # backward pass
50 loss_fn(outputs, labels).backward()
93 later will take care of the gradients synchronization during the backward
101 order is because DDP expects gradients to become ready during the backward
105 be true, and when that happens it could hurt DDP backward speed as the
109 the backward pass when the gradient becomes ready.
113 backward on a subgraph of the model, and DDP finds out which parameters are
114 involved in the backward pass by traversing the autograd graph from the model
116 backward pass, the ``Reducer`` would only wait for unready parameters, but it
119 absent gradients forever during the backward pass. Note that traversing the
122 - **Backward Pass**: The ``backward()`` function is directly invoked on the loss
132 of all parameters. So after the backward pass, the `grad` field on the same
148 ``allreduce`` order across processes can lead to wrong results or DDP backward
190 provides the core implementation for gradient synchronization in the backward
211 …nts this overlap when used with TorchDynamo for compiling a whole forward and whole backward graph,