My simple requirement: Reading a huge (> a million) line test file (For this example assume it's a CSV of some sorts) and keeping a reference to the beginning of that line for faster lookup in the future (read a line, starting at X).
I tried the naive and easy way first, using a StreamWriter
and accessing the underlying BaseStream.Position
. Unfortunately that doesn't work as I intended:
Given a file containing the following
Foo
Bar
Baz
Bla
Fasel
and this very simple code
using (var sr = new StreamReader(@"C:\Temp\LineTest.txt")) {
string line;
long pos = sr.BaseStream.Position;
while ((line = sr.ReadLine()) != null) {
Console.Write("{0:d3} ", pos);
Console.WriteLine(line);
pos = sr.BaseStream.Position;
}
}
the output is:
000 Foo
025 Bar
025 Baz
025 Bla
025 Fasel
I can imagine that the stream is trying to be helpful/efficient and probably reads in (big) chunks whenever new data is necessary. For me this is bad..
The question, finally: Any way to get the (byte, char) offset while reading a file line by line without using a basic Stream and messing with \r \n \r\n and string encoding etc. manually? Not a big deal, really, I just don't like to build things that might exist already..
You could create a TextReader
wrapper, which would track the current position in the base TextReader
:
public class TrackingTextReader : TextReader
{
private TextReader _baseReader;
private int _position;
public TrackingTextReader(TextReader baseReader)
{
_baseReader = baseReader;
}
public override int Read()
{
_position++;
return _baseReader.Read();
}
public override int Peek()
{
return _baseReader.Peek();
}
public int Position
{
get { return _position; }
}
}
You could then use it as follows :
string text = @"Foo
Bar
Baz
Bla
Fasel";
using (var reader = new StringReader(text))
using (var trackingReader = new TrackingTextReader(reader))
{
string line;
while ((line = trackingReader.ReadLine()) != null)
{
Console.WriteLine("{0:d3} {1}", trackingReader.Position, line);
}
}