译（五十六）-Pytorch梯度剪裁

MWHLS • 2022/04/16 am7:40 • Python, Pytorch, 计算机语言

已阅： 30

stackoverflow热门问题目录
如有翻译问题欢迎评论指出，谢谢。

PyTorch如何实现梯度剪裁?

How to do gradient clipping in pytorch?

PyTorch如何实现梯度剪裁?

Gulzar asked:
- 怎么用 PyTorch 实现梯度剪裁？
- 我碰到了梯度爆炸的问题。

Answers:

Rahul - vote: 143
更完整的示例见这里。

optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()

Charles Xu - vote: 0
我碰到了相同的错误，我想剪裁正则但是依然是nan。
译者注：答主在评论区提到 doesn't work 是指 still gives a 'nan'。
我不想改变改动网络或者增添正则化，之后我尝试将优化器改为 Adam，问题解决了。
具体来说，是使用 Adam 的预训练模型来初始化训练，并使用 SGD 和 momentum 来微调。
hkchengrex - vote: 3
如果用的是 AMP，剪裁前还需要一些步骤：

optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
self.scaler.scale(loss).backward()

# Unscales the gradients of optimizer's assigned params in-place
self.scaler.unscale_(optimizer)

# Since the gradients of optimizer's assigned params are unscaled, clips as usual:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

# optimizer's gradients are already unscaled, so scaler.step does not unscale them,
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
scaler.step(optimizer)

# Updates the scale for next iteration.
scaler.update()

参考： https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping

How to do gradient clipping in pytorch?

Gulzar asked:
- What is the correct way to perform gradient clipping in pytorch?
  怎么用 PyTorch 实现梯度剪裁？
- I have an exploding gradients problem.
  我碰到了梯度爆炸的问题。
Answers:
- Rahul - vote: 143
- A more complete example from here:
  更完整的示例见这里。
- ```
optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()
```
- Charles Xu - vote: 0
- Well, I met with same err. I tried to use the clip norm but it doesn\'t work.
  我碰到了相同的错误，我想剪裁正则但是依然是nan。
  译者注：答主在评论区提到 doesn't work 是指 still gives a 'nan'。
- I don\'t want to change the network or add regularizers. So I change the optimizer to Adam, and it works.
  我不想改变改动网络或者增添正则化，之后我尝试将优化器改为 Adam，问题解决了。
- Then I use the pretrained model from Adam to initate the training and use SGD + momentum for fine tuning. It is now working.
  具体来说，是使用 Adam 的预训练模型来初始化训练，并使用 SGD 和 momentum 来微调。
- hkchengrex - vote: 3
- And if you are using Automatic Mixed Precision (AMP), you need to do a bit more before clipping:
  如果用的是 AMP，剪裁前还需要一些步骤：
- ```
optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
self.scaler.scale(loss).backward()

# Unscales the gradients of optimizer's assigned params in-place
self.scaler.unscale_(optimizer)

# Since the gradients of optimizer's assigned params are unscaled, clips as usual:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

# optimizer's gradients are already unscaled, so scaler.step does not unscale them,
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
scaler.step(optimizer)

# Updates the scale for next iteration.
scaler.update()
```
- Reference: https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping
  参考： [https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping](