I have function that saves all records from SQL table to excel worksheet using EPPlus. If I export small amount of data everything works fine, but with 200+ columns and 500 000+ rows I get OutOfMemory exception.
I'd like to modify my code in a way to be able to save 50 000 records per file.
Here is my code that works for small data:
private Task SaveAsync(string tableName)
{
return Task.Run(() =>
{
try
{
using (var conn = new SqlConnection(_connectionString))
{
using (var cmd = new SqlCommand(string.Format(DataQuery, tableName), conn))
{
cmd.CommandType = CommandType.Text;
cmd.CommandTimeout = 360;
conn.Open();
using (SqlDataReader sdr = cmd.ExecuteReader())
{
var fileName = string.Format(TargetFile, tableName);
if (File.Exists(fileName))
{
File.Delete(fileName);
}
sdr.Read();
var numberOfRecordsInTable = sdr.GetInt32(0);
sdr.NextResult();
using (ExcelPackage pck = new ExcelPackage(new FileInfo(fileName)))
{
ExcelWorksheet ws = pck.Workbook.Worksheets.Add("Results");
int count = sdr.FieldCount;
int col = 1, row = 1;
for (int i = 0; i < count; i++)
{
ws.SetValue(row, col++, sdr.GetName(i));
}
row++;
col = 1;
while (sdr.Read())
{
for (int i = 0; i < count; i++)
{
var val = sdr.GetValue(i);
ws.SetValue(row, col++, val);
}
row++;
col = 1;
}
//autosize
ws.Cells[ws.Dimension.Address].AutoFitColumns();
//autofiltr
ws.Cells[1, 1, 1, count].AutoFilter = true;
}
}
conn.Close();
}
}
}
catch (Exception e)
{
Debug.WriteLine("Error at: " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine(e);
}
});
}
and my modified code that splits records 50 000 per file:
private Task SaveAsync2(string tableName)
{
return Task.Run(() =>
{
try
{
using (var conn = new SqlConnection(_connectionString))
{
using (var cmd = new SqlCommand(string.Format(DataQuery, tableName), conn))
{
cmd.CommandType = CommandType.Text;
cmd.CommandTimeout = 360;
conn.Open();
using (SqlDataReader sdr = cmd.ExecuteReader())
{
var fileName = string.Format(TargetFile, tableName,"");
if (File.Exists(fileName))
{
File.Delete(fileName);
}
sdr.Read();
var max = sdr.GetInt32(0);
int filesCount = 1;
if (max > 50000)
{
fileName = string.Format(TargetFile, tableName, filesCount);
}
sdr.NextResult();
ExcelPackage pck = new ExcelPackage(new FileInfo(fileName));
ExcelWorksheet ws = pck.Workbook.Worksheets.Add("RESULTS");
int count = sdr.FieldCount;
int col = 1, row = 1;
for (int i = 0; i < count; i++)
{
ws.SetValue(row, col++, sdr.GetName(i));
}
row++;
col = 1;
while (sdr.Read())
{
for (int i = 0; i < count; i++)
{
var val = sdr.GetValue(i);
ws.SetValue(row, col++, val);
}
row++;
col = 1;
if (row > 50000)
{
pck.Save();
filesCount++;
fileName = string.Format(TargetFile, tableName, filesCount);
pck = new ExcelPackage(new FileInfo(fileName));
ws = pck.Workbook.Worksheets.Add("RESULTS");
count = sdr.FieldCount;
col = 1;
row = 1;
for (int i = 0; i < count; i++)
{
ws.SetValue(row, col++, sdr.GetName(i));
}
row++;
col = 1;
}
}
//autosize
ws.Cells[ws.Dimension.Address].AutoFitColumns();
//autofiltr
ws.Cells[1, 1, 1, count].AutoFilter = true;
pck.Save();
}
}
conn.Close();
}
}
catch (Exception e)
{
Debug.WriteLine("Error at: " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine(e);
}
});
}
basically this works fine, but in first version of my code I was using everything inside using
statement, when in second version I'm calling same code twice.
rows x columns
? I found information that EPPlus should handle more than million rows, but not so much columns as I have. I thinks that I can export million rows with single column, but for 200+ columns for me 50 000 rows is limit. I'm wondering if there is number (rows x columns) that will be limit to which my export will work fine. I want that export function to be universal, so when I pass datatable with 50 columns it will export for example 100 000 rows per file and for 2 columns it will export half million per file.I've run up against memory limits with EPPlus in the past, and ended up generating multiple .xlsx files as a workaround (similar to your approach). Another alternative would be to change your compiler settings to target 64-bit only (if you can get by without supporting 32-bit platforms). As I recall, EPPlus is compiled for "Any CPU", so if you can change your code to target "x64" that would likely relax the memory limitations and allow you to generate a single .xlsx file. Targeting x64 likely would have worked in my case, but I didn't think of it until after the fact, so I never had an opportunity to test.
UPDATE: I just ran a quick test using EPPlus 3.1.3 creating 500,000 rows, 70 columns each. My 32-bit application was able to generate about 119,000 rows before generating an out-of-memory exception. After switching the target to x64, it successfully generated all 500,000 rows, although it took forever. Creating the actual worksheet took just a few minutes, but ExcelPackage.SaveAs() took close to 20 minutes. RAM consumption was quite high as well (roughly 11GB of RAM). The resulting .xlsx is 220MB, which 32-bit Excel is not able to open (out of memory). The bottom line: Targeting x64 is probably not a viable solution; you'd be better off splitting the output into multiple .xlsx files.
I was tempted to delete this answer since it has turned out to be a dead end, but decided to leave it in case it helps someone else avoid this path in the future.