How to read huge CSV file with 29 million rows of data using .net

Lee picture Lee · Jun 5, 2017 · Viewed 10.2k times · Source

I have a huge .csv file, to be specific a .TAB file with 29 million rows and the file size is around 600 MB. I would need to read this into an IEnumerable collection.

I have tried CsvHelper, GenericParser, and few other solutions but always ending up with an Out of Memory exception

Please suggest a way to do this

I have tried

var deliveryPoints = new List<Point>();

using (TextReader csvreader1 = File.OpenText(@"C:\testfile\Prod\PCDP1705.TAB")) //StreamReader csvreader1 = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB"))
using (var csvR1 = new CsvReader(csvreader1, csvconfig))
{
     csvR1.Configuration.RegisterClassMap<DeliveryMap>();
     deliveryPoints = csvR1.GetRecords<Point>().ToList();
}

using (GenericParser parser = new GenericParser())
{
     parser.SetDataSource(@"C:\testfile\Prod\PCDP1705.TAB");

     parser.ColumnDelimiter = '\t';
     parser.FirstRowHasHeader = false;
     //parser.SkipStartingDataRows = 10;
     //parser.MaxBufferSize = 4096;
     //parser.MaxRows = 500;
     parser.TextQualifier = '\"';

     while (parser.Read())
     {
         var address = new Point();
         address.PostCodeID = int.Parse(parser[0]);
         address.DPS = parser[1];
         address.OrganisationFlag = parser[2];
         deliveryPoints.Add(address);
     }
}

and

var deliveryPoints = new List<Point>();
csvreader = new StreamReader(@"C:\testfile\Prod\PCDP1705.TAB");
csv = new CsvReader(csvreader, csvconfig);

while (csv.Read())
{
     var address = new Point();
     address.PostCodeID = int.Parse(csv.GetField(0));
     address.DPS = csv.GetField(1);                
     deliveryPoints.Add(address);
}

Answer

Liero picture Liero · Nov 18, 2018

The problem is that you are loading entire file into memory. You can compile your code to x64 which will increase memory limit for your program rapidly, but it is not recommended if you can avoid loading entire file into memory.

Notice that calling ToList() forces the CsvReader to load entire file into memory at once:

csvR1.GetRecords<Point>().ToList();

But this will load only one line at a time:

foreach(var record in csvR1.GetRecords<Point>())
{
    //do whatever with the single record
}

This way you can process files of unlimited size