I am trying to extract all text data from an Excel document in C# and am having performance issues. In the following code I open the Workbook, loop over all worksheets, and loop over all cells in the used range, extracting the text from each cell as I go. The problem is, this takes 14 seconds to execute.
public class ExcelFile
{
public string Path = @"C:\test.xlsx";
private Excel.Application xl = new Excel.Application();
private Excel.Workbook WB;
public string FullText;
private Excel.Range rng;
private Dictionary<string, string> Variables;
public ExcelFile()
{
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
{
rng = CurrentWS.UsedRange;
for (int i = 1; i < rng.Count; i++)
{ FullText += rng.Cells[i].Value; }
}
WB.Close(false);
xl.Quit();
}
}
Whereas in VBA I would do something like this, which takes ~1 second:
Sub run()
Dim strText As String
For Each ws In ActiveWorkbook.Sheets
For Each c In ws.UsedRange
strText = strText & c.Text
Next c
Next ws
End Sub
Or, even faster (less than 1 second):
Sub RunFast()
Dim strText As String
Dim varCells As Variant
For Each ws In ActiveWorkbook.Sheets
varCells = ws.UsedRange
For i = 1 To UBound(varCells, 1)
For j = 1 To UBound(varCells, 2)
strText = strText & CStr(varCells(i, j))
Next j
Next i
Next ws
End Sub
Perhaps something is happening in the for loop in C# that I'm not aware of? Is it possible to load a range into an array-type object (as in my last example) to allow iteration over just the values, not the cell objects?
I use this function. The loops are only for converting to array starting at index 0, the main work is done in object[,] tmp = range.Value
.
public object[,] GetTable(int row, int col, int width, int height)
{
object[,] arr = new object[height, width];
Range c1 = (Range)Worksheet.Cells[row + 1, col + 1];
Range c2 = (Range)Worksheet.Cells[row + height, col + width];
Range range = Worksheet.get_Range(c1, c2);
object[,] tmp = range.Value;
for (int i = 0; i < height; ++i)
{
for (int j = 0; j < width; ++j)
{
arr[i, j] = tmp[i + tmp.GetLowerBound(0), j + tmp.GetLowerBound(1)];
}
}
return arr;
}