Reading Comma Delimited Text File to C# DataTable, columns get truncated to 255 characters

Greg Bailey picture Greg Bailey · Jun 26, 2009 · Viewed 32.6k times · Source

We are importing from CSV to SQL. To do so, we are reading the CSV file and writing to a temporary .txt file using a schema.ini. (I'm not sure yet exactly why are are writing to this temporary file, but that's how the code currently works). From there, we are loading a DataTable via OleDB using the following connection string (for ASCII files).

"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + sPath + ";Extended Properties=\"text;HDR=Yes;FMT=Delimited\"";

The problem we are having is that fields with more than 255 characters get truncated. I've read online about this problem and it seems that by default, text fields get truncated thusly.

I set my registry settings ImportMixedTypes=Majority Type and TypeGuessRows=0 in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel , hoping that mycolumns will no longer be interpreted as text. After doing that, the temporary txt file is being written correctly from the CSV file, but when I call dataAdapter.Fill, the resulting DataTable still has a truncated value.

Here is the column definition in question. CommaDelimited#txt Notes 2 false 234 true 130 0 0

Any help would be appreciated. At this time, I'm not interested in using any 3d party code to solve this problem, there must be a way using built in tools.

Here is the table definition:

<Columns> 
    <TABLE_NAME>CommaDelimited#txt</TABLE_NAME> 
    <COLUMN_NAME>Notes</COLUMN_NAME> 
    <ORDINAL_POSITION>2</ORDINAL_POSITION> 
    <COLUMN_HASDEFAULT>false</COLUMN_HASDEFAULT> 
    <COLUMN_FLAGS>234</COLUMN_FLAGS> 
    <IS_NULLABLE>true</IS_NULLABLE> 
    <DATA_TYPE>130</DATA_TYPE> 
    <CHARACTER_MAXIMUM_LENGTH>0</CHARACTER_MAXIMUM_LENGTH> 
    <CHARACTER_OCTET_LENGTH>0</CHARACTER_OCTET_LENGTH> 
</Columns>

Thanks,

Greg


I tried editing the schema.ini specifying text with a width, and that did not help (it was set to memo before)

[CommaDelimited.txt] Format=CSVDelimited DecimalSymbol=. Col1=Notes Text Width 5000

Answer

Johnny picture Johnny · Jul 26, 2009

Here's a simple class for reading a delimited file and returning a DataTable (all strings) that doesn't truncate strings. It has an overloaded method to specify column names if they're not in the file. Maybe you can use it?

Imported Namespaces

using System;
using System.Text;
using System.Data;
using System.IO;

Code

/// <summary>
/// Simple class for reading delimited text files
/// </summary>
public class DelimitedTextReader
{
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter)
    {
        return ReadFile(filename, delimiter, null);
    }
    /// <summary>
    /// Read the file and return a DataTable
    /// </summary>
    /// <param name="filename">File to read</param>
    /// <param name="delimiter">Delimiting string</param>
    /// <param name="columnNames">Array of column names</param>
    /// <returns>Populated DataTable</returns>
    public static DataTable ReadFile(string filename, string delimiter, string[] columnNames)
    {
        //  Create the new table
        DataTable data = new DataTable();
        data.Locale = System.Globalization.CultureInfo.CurrentCulture;

        //  Check file
        if (!File.Exists(filename))
            throw new FileNotFoundException("File not found", filename);

        //  Process the file line by line
        string line;
        using (TextReader tr = new StreamReader(filename, Encoding.Default))
        {
            //  If column names were not passed, we'll read them from the file
            if (columnNames == null)
            {
                //  Get the first line
                line = tr.ReadLine();
                if (string.IsNullOrEmpty(line))
                    throw new IOException("Could not read column names from file.");
                columnNames = line.Split(new string[] { delimiter }, StringSplitOptions.RemoveEmptyEntries);
            }

            //  Add the columns to the data table
            foreach (string colName in columnNames)
                data.Columns.Add(colName);

            //  Read the file
            string[] columns;
            while ((line = tr.ReadLine()) != null)
            {
                columns = line.Split(new string[] { delimiter }, StringSplitOptions.None);
                //  Ensure we have the same number of columns
                if (columns.Length != columnNames.Length)
                {
                    string message = "Data row has {0} columns and {1} are defined by column names.";
                    throw new DataException(string.Format(message, columns.Length, columnNames.Length));
                }
                data.Rows.Add(columns);
            }
        }
        return data;

    }
}

Required Namespaces

using System;
using System.Data;
using System.Windows.Forms;
using System.Data.SqlClient;
using System.Diagnostics;

Here's an example of calling it and uploading to a SQL Database:

        Stopwatch sw = new Stopwatch();
        TimeSpan tsRead;
        TimeSpan tsTrunc;
        TimeSpan tsBcp;
        int rows;
        sw.Start();
        using (DataTable dt = DelimitedTextReader.ReadFile(textBox1.Text, "\t"))
        {
            tsRead = sw.Elapsed;
            sw.Reset();
            rows = dt.Rows.Count;
            string connect = @"Data Source=.;Initial Catalog=MyDB;Integrated Security=SSPI";
            using (SqlConnection cn = new SqlConnection(connect))
            using (SqlCommand cmd = new SqlCommand("TRUNCATE TABLE dbo.UploadTable", cn))
            using (SqlBulkCopy bcp = new SqlBulkCopy(cn))
            {
                cn.Open();
                sw.Start();
                cmd.ExecuteNonQuery();
                tsTrunc = sw.Elapsed;
                sw.Reset();

                sw.Start();
                bcp.DestinationTableName = "dbo.UploadTable";
                bcp.ColumnMappings.Add("Column A", "ColumnA");
                bcp.ColumnMappings.Add("Column D", "ColumnD");
                bcp.WriteToServer(dt);
                tsBcp = sw.Elapsed;
                sw.Reset();
            }
        }

        string message = "File read:\t{0}\r\nTruncate:\t{1}\r\nBcp:\t{2}\r\n\r\nTotal time:\t{3}\r\nTotal rows:\t{4}";
        MessageBox.Show(string.Format(message, tsRead, tsTrunc, tsBcp, tsRead + tsTrunc + tsBcp, rows));