译(五十八)-Pytorch dataloader中worker数的工作原理
如有翻译问题欢迎评论指出,谢谢。
这篇确实只有一个回答。
PyTorch dataloader 里的 worker 数的工作原理是什么?
floyd asked:
- 如果
num_workers
是 2,是不是意味着两个 batch 被送入内存,其中一个被送入 GPU?或者是把三个送入内存,然后其中一个送入 GPU? - 当
num_workers
高于 CPU 核心时会怎样?我试了下但还是顺利执行了,到底发生了什么?(我觉得num_workers
最大应该是核心数) - 如果
num_workers
被设为 3 且在训练期间没有 batch 在 GPU 中,主进程会等待 workers 去读取 batch 还是不等待直接读取一个 batch?
- 如果
Answers:
Shihab Shahriar Khan - vote: 71
当
num_workers>0
时,只有这些 workers 会检索数据,主进程不会。所以当num_workers=2
时,会有最多两个 workers 同时丢数据到内存,而不是三个。CPU 通常跑 100 个进程也不会有问题,worker 进程亦然。所以超过 CPU 核心的
num_workers
没问题。不过对于效率问题而言,这取决于你的 CPU 在其它任务上的消耗、CPU 性能、硬盘加载速度等。总之,性能问题受多因素影响,所以设置num_workers
为 CPU 核心数就好,不要过多。不会。
DataLoader
不会随机返回当前内存中的可用数据,而是使用batch_sampler
来决定接下来返回哪个 batch。每个 batch 被分配给一个 worker,主进程会等待直到 worker 检索到指定的 batch。
还有一点,
DataLoader
并不是用来发数据给 GPU 的,cuda()
才是。再次编辑:不要在
Datasets
的__getitem__()
里面调用cuda()
,见 psarka 的评论。- 译者注:psarka 的评论:
我延伸一下最后那句,在Dataset
对象里调用.cuda()
会让每个样本分开送给 GPU,而非以 batch 的形式送入,会产生大量开销,慎用。
- 译者注:psarka 的评论:
How does the number of workers parameter in PyTorch dataloader actually work?
floyd asked:
- If
num_workers
is 2, Does that mean that it will put 2 batches in the RAM and send 1 of them to the GPU or Does it put 3 batches in the RAM then sends 1 of them to the GPU?
如果num_workers
是 2,是不是意味着两个 batch 被送入内存,其中一个被送入 GPU?或者是把三个送入内存,然后其中一个送入 GPU? - What does actually happen when the number of workers is higher than the number of CPU cores? I tried it and it worked fine but How does it work? (I thought that the maximum number of workers I can choose is the number of cores).
当num_workers
高于 CPU 核心时会怎样?我试了下但还是顺利执行了,到底发生了什么?(我觉得num_workers
最大应该是核心数) - If I set
num_workers
to 3 and during the training there were no batches in the memory for the GPU, Does the main process waits for its workers to read the batches or Does it read a single batch (without waiting for the workers)?
如果num_workers
被设为 3 且在训练期间没有 batch 在 GPU 中,主进程会等待 workers 去读取 batch 还是不等待直接读取一个 batch?
- If
Answers:
Shihab Shahriar Khan - vote: 71
When
num_workers>0
, only these workers will retrieve data, main process won\'t. So whennum_workers=2
you have at most 2 workers simultaneously putting data into RAM, not 3.
当num_workers>0
时,只有这些 workers 会检索数据,主进程不会。所以当num_workers=2
时,会有最多两个 workers 同时丢数据到内存,而不是三个。Well our CPU can usually run like 100 processes without trouble and these worker processes aren\'t special in anyway, so having more workers than cpu cores is ok. But is it efficient? it depends on how busy your cpu cores are for other tasks, speed of cpu, speed of your hard disk etc. In short, its complicated, so setting workers to number of cores is a good rule of thumb, nothing more.
CPU 通常跑 100 个进程也不会有问题,worker 进程亦然。所以超过 CPU 核心的num_workers
没问题。不过对于效率问题而言,这取决于你的 CPU 在其它任务上的消耗、CPU 性能、硬盘加载速度等。总之,性能问题受多因素影响,所以设置num_workers
为 CPU 核心数就好,不要过多。Nope. Remember
DataLoader
doesn\'t just randomly return from what\'s available in RAM right now, it usesbatch_sampler
to decide which batch to return next. Each batch is assigned to a worker, and main process will wait until the desired batch is retrieved by assigned worker.
不会。DataLoader
不会随机返回当前内存中的可用数据,而是使用batch_sampler
来决定接下来返回哪个 batch。每个 batch 被分配给一个 worker,主进程会等待直到 worker 检索到指定的 batch。
Lastly to clarify, it isn\'t
DataLoader
\'s job to send anything directly to GPU, you explicitly callcuda()
for that.
还有一点,DataLoader
并不是用来发数据给 GPU 的,cuda()
才是。EDIT: Don\'t call
cuda()
insideDataset
\'s__getitem__()
method, please look at @psarka\'s comment for the reasoning
再次编辑:不要在Datasets
的__getitem__()
里面调用cuda()
,见 psarka 的评论。- 译者注:psarka 的评论:
Just a remark to the last sentence - it is probably not a good idea to call.cuda()
in theDataset
object, as it will have to move each sample (rather than the batch) to GPU separately, incurring a lot of overhead.
我延伸一下最后那句,在Dataset
对象里调用.cuda()
会让每个样本分开送给 GPU,而非以 batch 的形式送入,会产生大量开销,慎用。
- 译者注:psarka 的评论:
共有 0 条评论