I have to read a 8192x8192 matrix into memory. I want to do it as fast as possible.
Right now I have this structure:
char inputFile[8192][8192*4]; // I know the numbers are at max 3 digits
int8_t matrix[8192][8192]; // Matrix to be populated
// Read entire file line by line using fgets
while (fgets (inputFile[lineNum++], MAXCOLS, fp));
//Populate the matrix in parallel,
for (t = 0; t < NUM_THREADS; t++){
pthread_create(&threads[t], NULL, ParallelRead, (void *)t);
}
In the function ParallelRead
, I parse each line, do atoi
and populate the matrix. The parallelism is line-wise like thread t parses line t, t+ 1 * NUM_THREADS..
On a two-core system with 2 threads, this takes
Loading big file (fgets) : 5.79126
Preprocessing data (Parallel Read) : 4.44083
Is there a way to optimize this any further?
It's a bad idea to do it this way. Threads can get your more cpu cycles if you have enough cores but you still have only one hard disk. So inevitably threads cannot improve the speed of reading file data.
They actually make it much worse. Reading data from a file is fastest when you access the file sequentially. That minimizes the number of reader head seeks, by far the most expensive operation on a disk drive. By splitting the reading across multiple threads, each reading a different part of the file, you are making the reader head constantly jump back and forth. Very, very bad for throughput.
Use only one thread to read file data. You might be able to overlap it with some computational cycles on the file data by starting a thread once a chunk of the file data is loaded.
Do watch out for the test effect. When you re-run your program, typically after tweaking your code somewhat, it is likely that the program can find file data back in the file system cache so it doesn't have to be read from the disk. That's very fast, memory bus speed, a memory-to-memory copy. Pretty likely on your dataset since it isn't very big and easily fits in the amount of RAM a modern machine has. This does not (typically) happen on a production machine. So be sure to clear out the cache to get realistic numbers, whatever it takes on your OS.