I'm working with large files in C# (can be up to 20%-40% of available memory) and I will only need small parts of the files to be loaded into memory at a time (like 1-2% of the file). I was thinking that using a FileStream would be the best option, but idk. I will need to give a starting point (in bytes) and a length (in bytes) and copy that region into a byte[]. Access to the file might need to be shared between threads and will be at random spots in the file (non-linear access). I also need it to be fast.
The project already has unsafe
methods, so feel free to suggest things from the more dangerous side of C#
A FileStream
will allow you to seek to the portion of the file you want, no problem. It's the recommended way to do it in C#, and it's fast.
Sharing between threads: You will need to create a lock to prevent other threads from changing the FileStream position while you're trying to read from it. The simplest way to do this:
// This really needs to be a member-level variable;
private static readonly object fsLock = new object();
// Instantiate this in a static constructor or initialize() method
private static FileStream fs = new FileStream("myFile.txt", FileMode.Open);
public string ReadFile(int fileOffset) {
byte[] buffer = new byte[bufferSize];
int arrayOffset = 0;
lock (fsLock) {
fs.Seek(fileOffset, SeekOrigin.Begin);
int numBytesRead = fs.Read(bytes, arrayOffset , bufferSize);
// Typically used if you're in a loop, reading blocks at a time
arrayOffset += numBytesRead;
}
// Do what you want to the byte array and return it
}
Add try..catch
statements and other code as necessary. Everywhere you access this FileStream, put a lock on the member-level variable fsLock... this will keep other methods from reading/manipulating the file pointer while you're trying to read.
Speed-wise, I think you'll find you're limited by disk access speeds, not code.
You'll have to think through all the issues about multi-threaded file access... who intializes/opens the file, who closes it, etc. There's a lot of ground to cover.