Reading a file line by line in C#

Luca Spiller picture Luca Spiller · Aug 13, 2009 · Viewed 66.1k times · Source

I am trying to read some text files, where each line needs to be processed. At the moment I am just using a StreamReader, and then reading each line individually.

I am wondering whether there is a more efficient way (in terms of LoC and readability) to do this using LINQ without compromising operational efficiency. The examples I have seen involve loading the whole file into memory, and then processing it. In this case however I don't believe that would be very efficient. In the first example the files can get up to about 50k, and in the second example, not all lines of the file need to be read (sizes are typically < 10k).

You could argue that nowadays it doesn't really matter for these small files, however I believe that sort of the approach leads to inefficient code.

First example:

// Open file
using(var file = System.IO.File.OpenText(_LstFilename))
{
    // Read file
    while (!file.EndOfStream)
    {
        String line = file.ReadLine();

        // Ignore empty lines
        if (line.Length > 0)
        {
            // Create addon
            T addon = new T();
            addon.Load(line, _BaseDir);

            // Add to collection
            collection.Add(addon);
        }
    }
}

Second example:

// Open file
using (var file = System.IO.File.OpenText(datFile))
{
    // Compile regexs
    Regex nameRegex = new Regex("IDENTIFY (.*)");

    while (!file.EndOfStream)
    {
        String line = file.ReadLine();

        // Check name
        Match m = nameRegex.Match(line);
        if (m.Success)
        {
            _Name = m.Groups[1].Value;

            // Remove me when other values are read
            break;
        }
    }
}

Answer

Marc Gravell picture Marc Gravell · Aug 13, 2009

You can write a LINQ-based line reader pretty easily using an iterator block:

static IEnumerable<SomeType> ReadFrom(string file) {
    string line;
    using(var reader = File.OpenText(file)) {
        while((line = reader.ReadLine()) != null) {
            SomeType newRecord = /* parse line */
            yield return newRecord;
        }
    }
}

or to make Jon happy:

static IEnumerable<string> ReadFrom(string file) {
    string line;
    using(var reader = File.OpenText(file)) {
        while((line = reader.ReadLine()) != null) {
            yield return line;
        }
    }
}
...
var typedSequence = from line in ReadFrom(path)
                    let record = ParseLine(line)
                    where record.Active // for example
                    select record.Key;

then you have ReadFrom(...) as a lazily evaluated sequence without buffering, perfect for Where etc.

Note that if you use OrderBy or the standard GroupBy, it will have to buffer the data in memory; ifyou need grouping and aggregation, "PushLINQ" has some fancy code to allow you to perform aggregations on the data but discard it (no buffering). Jon's explanation is here.