译(五十七)-Pytorch避免’CUDA out of memory’
如有翻译问题欢迎评论指出,谢谢。
如何避免PyTorch的'CUDA out of memory'
voilalex asked:
下面的报错对于缺少 GPU 资源的 PyTorch 用户们很常见:
RuntimeError: CUDA out of memory. Tried to allocate ? MiB (GPU ?; ? GiB total capacity; ? GiB already allocated; ? MiB free; ? cached)
我试过先把图片用 GPU 传给每一层网络,然后再把网络丢回 GPU:
for m in self.children(): m.cuda() x = m(x) m.cpu() torch.cuda.empty_cache()
但效果不佳。有没什么小技巧可以用一个小 GPU 训练一个大网络?
Answers:
SHAGUN SHARMA - vote: 57
即便
import torch torch.cuda.empty_cache()
提供了一个不错的方式来清理没用的 CUDA 内存,以及手动清理不再使用的变量,
import gc del variables gc.collect()
但清理后依然报错,因为 PyTorch 实际上并不清理内存,而是清理变量的内存引用。所以找到一个更合适的
batch_size
可能是个更好的选择(虽然有时很不方便)。另一个详细获得 GPU 内存信息的方式:
torch.cuda.memory_summary(device=None, abbreviated=False)
这里的所有参数都是可选的,它可以给出一个清晰的内存分配总结,方便我们调整代码避免再次出现
CUDA out of memory
。还可以试着迭代数据,改变网络大小或是分解网络,都挺有效的(有的模型会占据较多内存,比如迁移学习)。
Rahul - vote: 32
减少
batch_size
。我训练的时候有这个错:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiBtotal capacity; 4.29 GiB already allocated; 10.12 MiB free; 4.46 GiBreserved in total by PyTorch)
、我把
batch_size
从 32 调到 15 的时候就正常了。Nicolas Gervais - vote: 21
迭代的训练数据,或者用小的
batch_size
。不要一次性把所有数据都塞给 CUDA,像这样分开塞会更好:for e in range(epochs): for images, labels in train_loader: if torch.cuda.is_available(): images, labels = images.cuda(), labels.cuda() # blablabla
改变
dtypes
降低点精度会使用更低内存。例如torch.float16
或torch.half
都不错。
How to avoid CUDA out of memory in PyTorch
voilalex asked:
I think it\'s a pretty common message for PyTorch users with low GPU memory:
下面的报错对于缺少 GPU 资源的 PyTorch 用户们很常见:RuntimeError: CUDA out of memory. Tried to allocate ? MiB (GPU ?; ? GiB total capacity; ? GiB already allocated; ? MiB free; ? cached)
I tried to process an image by loading each layer to GPU and then loading it back:
我试过先把图片用 GPU 传给每一层网络,然后再把网络丢回 GPU:for m in self.children(): m.cuda() x = m(x) m.cpu() torch.cuda.empty_cache()
But it doesn\'t seem to be very effective. I\'m wondering is there any tips and tricks to train large deep learning models while using little GPU memory.
但效果不佳。有没什么小技巧可以用一个小 GPU 训练一个大网络?
Answers:
SHAGUN SHARMA - vote: 57
Although
即便import torch torch.cuda.empty_cache()
provides a good alternative for clearing the occupied cuda memory and we can also manually clear the not in use variables by using,
提供了一个不错的方式来清理没用的 CUDA 内存,以及手动清理不再使用的变量,import gc del variables gc.collect()
But still after using these commands, the error might appear again because pytorch doesn\'t actually clears the memory instead clears the reference to the memory occupied by the variables.So reducing the batch_size after restarting the kernel and finding the optimum batch_size is the best possible option (but sometimes not a very feasible one).
但清理后依然报错,因为 PyTorch 实际上并不清理内存,而是清理变量的内存引用。所以找到一个更合适的batch_size
可能是个更好的选择(虽然有时很不方便)。Another way to get a deeper insight into the alloaction of memory in gpu is to use:
另一个详细获得 GPU 内存信息的方式:torch.cuda.memory_summary(device=None, abbreviated=False)
wherein, both the arguments are optional. This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory and restart the kernel to avoid the error from happening again (Just like I did in my case).
这里的所有参数都是可选的,它可以给出一个清晰的内存分配总结,方便我们调整代码避免再次出现CUDA out of memory
。Passing the data iteratively might help but changing the size of layers of your network or breaking them down would also prove effective (as sometimes the model also occupies a significant memory for example, while doing transfer learning).
还可以试着迭代数据,改变网络大小或是分解网络,都挺有效的(有的模型会占据较多内存,比如迁移学习)。Rahul - vote: 32
Just reduce the batch size, and it will work.While I was training, it gave following error:
减少batch_size
。我训练的时候有这个错:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiBtotal capacity; 4.29 GiB already allocated; 10.12 MiB free; 4.46 GiBreserved in total by PyTorch)
And I was using batch size of 32. So I just changed it to 15 and it worked for me.
我把batch_size
从 32 调到 15 的时候就正常了。Nicolas Gervais - vote: 21
Send the batches to CUDA iteratively, and make small batch sizes. Don\'t send all your data to CUDA at once in the beginning. Rather, do it as follows:
迭代的训练数据,或者用小的batch_size
。不要一次性把所有数据都塞给 CUDA,像这样分开塞会更好:for e in range(epochs): for images, labels in train_loader: if torch.cuda.is_available(): images, labels = images.cuda(), labels.cuda() # blablabla
You can also use
dtypes
that use less memory. For instance,torch.float16
ortorch.half
.
改变dtypes
降低点精度会使用更低内存。例如torch.float16
或torch.half
都不错。
共有 0 条评论