译(五十六)-Pytorch梯度剪裁
如有翻译问题欢迎评论指出,谢谢。
PyTorch如何实现梯度剪裁?
Gulzar asked:
- 怎么用 PyTorch 实现梯度剪裁?
- 我碰到了梯度爆炸的问题。
Answers:
Rahul - vote: 143
更完整的示例见 这里。
optimizer.zero_grad() loss, hidden = model(data, hidden, targets) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) optimizer.step()
Charles Xu - vote: 0
我碰到了相同的错误,我想剪裁正则但是依然是
nan
。
译者注:答主在评论区提到 doesn't work 是指 still gives a 'nan'。我不想改变改动网络或者增添正则化,之后我尝试将优化器改为 Adam,问题解决了。
具体来说,是使用 Adam 的预训练模型来初始化训练,并使用 SGD 和 momentum 来微调。
hkchengrex - vote: 3
如果用的是 AMP,剪裁前还需要一些步骤:
optimizer.zero_grad() loss, hidden = model(data, hidden, targets) self.scaler.scale(loss).backward() # Unscales the gradients of optimizer's assigned params in-place self.scaler.unscale_(optimizer) # Since the gradients of optimizer's assigned params are unscaled, clips as usual: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) # optimizer's gradients are already unscaled, so scaler.step does not unscale them, # although it still skips optimizer.step() if the gradients contain infs or NaNs. scaler.step(optimizer) # Updates the scale for next iteration. scaler.update()
参考: https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping
How to do gradient clipping in pytorch?
Gulzar asked:
- What is the correct way to perform gradient clipping in pytorch?
怎么用 PyTorch 实现梯度剪裁? - I have an exploding gradients problem.
我碰到了梯度爆炸的问题。
- What is the correct way to perform gradient clipping in pytorch?
Answers:
Rahul - vote: 143
optimizer.zero_grad() loss, hidden = model(data, hidden, targets) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) optimizer.step()
Charles Xu - vote: 0
Well, I met with same err. I tried to use the clip norm but it doesn\'t work.
我碰到了相同的错误,我想剪裁正则但是依然是nan
。
译者注:答主在评论区提到 doesn't work 是指 still gives a 'nan'。I don\'t want to change the network or add regularizers. So I change the optimizer to Adam, and it works.
我不想改变改动网络或者增添正则化,之后我尝试将优化器改为 Adam,问题解决了。Then I use the pretrained model from Adam to initate the training and use SGD + momentum for fine tuning. It is now working.
具体来说,是使用 Adam 的预训练模型来初始化训练,并使用 SGD 和 momentum 来微调。hkchengrex - vote: 3
And if you are using Automatic Mixed Precision (AMP), you need to do a bit more before clipping:
如果用的是 AMP,剪裁前还需要一些步骤:optimizer.zero_grad() loss, hidden = model(data, hidden, targets) self.scaler.scale(loss).backward() # Unscales the gradients of optimizer's assigned params in-place self.scaler.unscale_(optimizer) # Since the gradients of optimizer's assigned params are unscaled, clips as usual: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) # optimizer's gradients are already unscaled, so scaler.step does not unscale them, # although it still skips optimizer.step() if the gradients contain infs or NaNs. scaler.step(optimizer) # Updates the scale for next iteration. scaler.update()
Reference: https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping
参考: [https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping](
共有 0 条评论