I've got the lovely task of working out how to handle large files being loaded into our application's script editor (it's like VBA for our internal product for quick macros). Most files are about 300-400 KB which is fine loading. But when they go beyond 100 MB the process has a hard time (as you'd expect).
What happens is that the file is read and shoved into a RichTextBox which is then navigated - don't worry too much about this part.
The developer who wrote the initial code is simply using a StreamReader and doing
[Reader].ReadToEnd()
which could take quite a while to complete.
My task is to break this bit of code up, read it in chunks into a buffer and show a progressbar with an option to cancel it.
Some assumptions:
Now for the questions:
Are these (in your professional opinions) good ideas? I've had a few issues in the past with reading content from Streams, because it will always miss the last few bytes or something, but I'll ask another question if this is the case.
You can improve read speed by using a BufferedStream, like this:
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string line;
while ((line = sr.ReadLine()) != null)
{
}
}
March 2013 UPDATE
I recently wrote code for reading and processing (searching for text in) 1 GB-ish text files (much larger than the files involved here) and achieved a significant performance gain by using a producer/consumer pattern. The producer task read in lines of text using the BufferedStream
and handed them off to a separate consumer task that did the searching.
I used this as an opportunity to learn TPL Dataflow, which is very well suited for quickly coding this pattern.
Why BufferedStream is faster
A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance. A buffer can be used for either reading or writing, but never both simultaneously. The Read and Write methods of BufferedStream automatically maintain the buffer.
December 2014 UPDATE: Your Mileage May Vary
Based on the comments, FileStream should be using a BufferedStream internally. At the time this answer was first provided, I measured a significant performance boost by adding a BufferedStream. At the time I was targeting .NET 3.x on a 32-bit platform. Today, targeting .NET 4.5 on a 64-bit platform, I do not see any improvement.
Related
I came across a case where streaming a large, generated CSV file to the Response stream from an ASP.Net MVC action was very slow. Adding a BufferedStream improved performance by 100x in this instance. For more see Unbuffered Output Very Slow