Reading parts of large files from drive

joe_coolish picture joe_coolish · Mar 6, 2011 · Viewed 12.9k times · Source

I'm working with large files in C# (can be up to 20%-40% of available memory) and I will only need small parts of the files to be loaded into memory at a time (like 1-2% of the file). I was thinking that using a FileStream would be the best option, but idk. I will need to give a starting point (in bytes) and a length (in bytes) and copy that region into a byte[]. Access to the file might need to be shared between threads and will be at random spots in the file (non-linear access). I also need it to be fast.

The project already has unsafe methods, so feel free to suggest things from the more dangerous side of C#

Answer

James King picture James King · Mar 6, 2011

A FileStream will allow you to seek to the portion of the file you want, no problem. It's the recommended way to do it in C#, and it's fast.

Sharing between threads: You will need to create a lock to prevent other threads from changing the FileStream position while you're trying to read from it. The simplest way to do this:

//  This really needs to be a member-level variable;
private static readonly object fsLock = new object();

//  Instantiate this in a static constructor or initialize() method
private static FileStream fs = new FileStream("myFile.txt", FileMode.Open);


public string ReadFile(int fileOffset) {

    byte[] buffer = new byte[bufferSize];

    int arrayOffset = 0;

    lock (fsLock) {
        fs.Seek(fileOffset, SeekOrigin.Begin);

        int numBytesRead = fs.Read(bytes, arrayOffset , bufferSize);

        //  Typically used if you're in a loop, reading blocks at a time
        arrayOffset += numBytesRead;
    }

    // Do what you want to the byte array and return it

}

Add try..catch statements and other code as necessary. Everywhere you access this FileStream, put a lock on the member-level variable fsLock... this will keep other methods from reading/manipulating the file pointer while you're trying to read.

Speed-wise, I think you'll find you're limited by disk access speeds, not code.

You'll have to think through all the issues about multi-threaded file access... who intializes/opens the file, who closes it, etc. There's a lot of ground to cover.