multithread read from disk?

cmo picture cmo · Nov 16, 2012 · Viewed 7.4k times · Source

Suppose I need to read many distinct, independent chunks of data from the same file saved on disk.

Is it possible to multi-thread this upload?

Related: Do all threads on the same processor use the same IO device to read from disk? In this case, multi-threading would not speed up the upload at all - the threads would just be waiting in line.

(I am currently multi-threading with OpenMP.)

Answer

Dan picture Dan · Nov 16, 2012

Yes, it is possible. However:

Do all threads on the same processor use the same IO device to read from disk?

Yes. The read head on the disk. As an example, try copying two files in parallel as opposed to in series. It will take significantly longer in parallel, because the OS uses scheduling algorithms to make sure the IO rate is "fair," or equal between the two threads/processes. Because of this, the read head will jump back and forth between different parts of the disk, slowing the process down A LOT. The time to actually read the data is pretty small compared to the time to seek to it, and when you're reading two different parts of the disk at once, you spend most of the time seeking.

Note that all of this assumes you're using a hard disk. If you're using an SSD, it will not be slower in parallel, but it will not be faster either. Edit: according to comments parallel is actually faster for an SSD. With RAID the situation becomes more complicated, and (obviously) depends on what kind of RAID you're using.

This is what it looks like (I've unwrapped the circular disk into a rectangle because ascii circles are hard, and simplified the data layout to make it easier to read):

Assume the files are separated by some space on the platter like so:

|         |

A series read will look like (* indicates reading)

space ----->
|        *|  t
|        *|  i
|        *|  m
|        *|  e
|        *|  |
|       / |  |
|     /   |  |
|   /     |  V
|  /      |
|*        |
|*        |
|*        |
|*        |

While a parallel read will look like

|       \ |
|        *|
|       / |
|     /   |
|   /     |
|  /      |
|*        |
|  \      |
|    \    |
|     \   |
|       \ |
|        *|
|       / |
|     /   |
|   /     |
|  /      |
|*        |
|  \      |
|    \    |
|     \   |
|       \ |
|        *|

etc