TextFieldParser Class

optimusprime picture optimusprime · May 16, 2013 · Viewed 8.7k times · Source

I am using the TextFieldParser Class to read comma separated value (.csv) file. Fields in this file are enclosed with double quotes like "Field1","Field2".

So, to read file, I've set the HasFieldsEnclosedInQuotes property of TextFieldParser object to true. But I get an error of MalformedLineException when any of fields contain double quote (`"+ ) in the beginning.

Example: ""Field2"with additional" Here I should see "Field2" with additional as output.

However, if " is anywhere except first position then it works fine. Like line with "Field2 "with" additional" works perfectly fine and gives me Field2 "with" additional as output.

Does any one have same issue? Is there any way I can resolve this issue?

This is my code:

Private Sub ReadTextFile(ByVal txtFilePath As String)
    Dim myReader As tfp = New Microsoft.VisualBasic.FileIO.TextFieldParser(txtFilePath)
    myReader.Delimiters = New String() {","}
    myReader.TextFieldType = FileIO.FieldType.Delimited
    myReader.HasFieldsEnclosedInQuotes = True
    myReader.TrimWhiteSpace = True
    Dim currentRow As String()
    Dim headerRow As Integer = 0

    While Not myReader.EndOfData
        Try
            currentRow = myReader.ReadFields()

            'Read Header
            If (headerRow = 0) Then
               'Do work for Header Row
                headerRow += 1
            Else
                'Do work for Data Row
            End If

        Catch ex As Exception
            Dim errorline As String = myReader.ErrorLine
        End Try
    End While

End Sub

This is my Data in csv file:

"Column1","Column2","Column3"
"Value1","Value2",""A" Block in Building 123"

Answer

Heinzi picture Heinzi · May 16, 2013

Your example ""A" Block" is malformed CSV; thus, TextFieldParser has every right to reject it. The CSV standard says:

7.  If double-quotes are used to enclose fields, then a double-quote
    appearing inside a field must be escaped by preceding it with
    another double quote.  For example:

    "aaa","b""bb","ccc"

If you encode your data correctly, i.e., ...

"Column1","Column2","Column3"
"Value1","Value2","""A"" Block in Building 123"

... TextFieldParser works fine and correctly returns "A" Block in Building 123.

So, the first step would be to tell the guy producing the CSV file to create a valid CSV file instead of something-that-looks-like-CSV-but-isn't.

If you cannot do that, you might want to make two passes through the file:

  • Fix the file by converting it into a "valid" CSV file (for example by replacing quotes not followed or preceded by a comma by two quotes).
  • Then, TextFieldParser can parse the "valid" CSV file without trouble.