I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat()
. My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?
I've tried starting at the end row and, one-by-one, using CountA
to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:
Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.
I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues)
but that includes a row if any cell has formatting (like a bg color).
I've tried using Worksheet.UsedRange
and then deleting everything after that but UsedRange
has the same problem as point two.
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
currentRowIndex = rows.Count;
bool contentFound = false;
while (!contentFound && currentRowIndex > 0)
{
currentRow = rows[currentRowIndex];
if (Application.WorksheetFunction.CountA(currentRow) == 0)
{
currentRow.Delete();
}
else
{
contentFound = true;
}
Marshal.FinalReleaseComObject(currentRow);
currentRowIndex--;
}
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
for (int i = 0; i < worksheets.Count; i++)
{
sheet = worksheets[i + 1];
rows = sheet.Rows;
lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
int startRow = lastCell.Row;
Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
range.Delete();
Marshal.FinalReleaseComObject(range);
Marshal.FinalReleaseComObject(lastCell);
Marshal.FinalReleaseComObject(rows);
Marshal.FinalReleaseComObject(sheet);
}
Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?
I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.
For creating new sheet and copying rows you can use the following code:
excel.Worksheet tempSheet = workbook.Worksheets.Add();
tempSheet.Name = sheetName;
workbook.Save();
//create a new method for copy new rows
//as the rowindex you can pass the total no of rows you have found out using CountA
public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
{
excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
source.Copy(dest);
excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
newRow.Insert();
workbook.Save();
}