I am trying to parse a text report that is formatted into columns. Each column appears to be right justified with a fixed length. For each line, there are times that not all the columns are used. In that case it appears that spaces are used to justify each column in the line. Example input:
031 91 1221,154
043 66 312,222 1 3,047 3,047 1.5% .9%
040 118 529,626 1 1,842 1,842 .8% .3%
037 45 427,710
019 80 512,153 1 14,685 14,685 1.2% 2.8%
009 68 520,301 1 16,085 16,085 1.4% 3.0%
030 13 106,689 1 1,581 1,581 7.6% 1.4%
008 54 377,593 1 7,098 7,098 1.8% 1.8%
018 24 171,264
022 25 8,884 1 433 433 4.0% 4.8%
035 9 42,043
041 13 112,355
The column widths appear to be as follows (in character counts including white spaces): 3,5,12,6,10,7,10,11,8,7.
What is a good way to parse this? I have tried using a regular expression to do it, but it obviously fails on the first line being read in because I am using an expression that expects the whole line to have data:
string pattern = @"^(?.{3})(?.{5})(?.{12})(?thirtyeightyninenumber>.{6})(?{10})(?.{7}(?.{10})(?.{11})(?.{8})(?.{7})";
Looking for a good way to read this into appropriate variables depending on whether that column has data or not. I feel like I need to throw a bunch of if
checks in, but am hoping there is a better way I am not thinking of.
Thanks for any help.
BTW - I am reading the lines using a StreamReader and ReadLine.
There is a TextFieldParser
available that is specifically meant for reading fixed-width/delimited text files like this.
It's in the Microsoft.VisualBasic.FileIO
namespace but you should can still call it from C#.
Add a reference to Microsoft.VisualBasic
, a using Microsoft.VisualBasic.FileIO;
, then the code looks like this:
TextFieldParser parser = new TextFieldParser(stream);
parser.TextFieldType = FieldType.FixedWidth;
parser.SetFieldWidths(3, 5, 12, 6, 10, 7, 10, 11, 8, 7);
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
// Treat each field appropriately e.g. int.TryParse,
// remove the "%" then float.TryParse etc.
}
parser.Close();
Edit: That said, looking in Reflector, I think this fails if your shortened lines don't have a full width worth of spaces. I'm not sure how to suggest you fix this; you could pre-process your stream to insert any missing spaces per line?