num_workers
is 2, Does that mean that it will put 2 batches in the RAM and send 1 of them to the GPU or Does it put 3 batches in the RAM then sends 1 of them to the GPU?num_workers
to 3 and during the training there were no batches in the memory for the GPU, Does the main process waits for its workers to read the batches or Does it read a single batch (without waiting for the workers)?num_workers>0
, only these workers will retrieve data, main process won't. So when num_workers=2
you have at most 2 workers simultaneously putting data into RAM, not 3.DataLoader
doesn't just randomly return from what's available in RAM right now, it uses batch_sampler
to decide which batch to return next. Each batch is assigned to a worker, and main process will wait until the desired batch is retrieved by assigned worker.Lastly to clarify, it isn't DataLoader
's job to send anything directly to GPU, you explicitly call cuda()
for that.
EDIT: Don't call cuda()
inside Dataset
's __getitem__()
method, please look at @psarka's comment for the reasoning