C# Excel Interop Slow when looping through cells

pwwolff picture pwwolff · Mar 5, 2017 · Viewed 7k times · Source

I am trying to extract all text data from an Excel document in C# and am having performance issues. In the following code I open the Workbook, loop over all worksheets, and loop over all cells in the used range, extracting the text from each cell as I go. The problem is, this takes 14 seconds to execute.

public class ExcelFile
{
    public string Path = @"C:\test.xlsx";
    private Excel.Application xl = new Excel.Application();
    private Excel.Workbook WB;
    public string FullText;
    private Excel.Range rng;
    private Dictionary<string, string> Variables;
    public ExcelFile()
    {
        WB = xl.Workbooks.Open(Path);
        xl.Visible = true;
        foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
        {
            rng = CurrentWS.UsedRange;
            for (int i = 1; i < rng.Count; i++)
            { FullText += rng.Cells[i].Value; }
        }
        WB.Close(false);
        xl.Quit();
    }
}

Whereas in VBA I would do something like this, which takes ~1 second:

Sub run()
    Dim strText As String
    For Each ws In ActiveWorkbook.Sheets
        For Each c In ws.UsedRange
            strText = strText & c.Text
        Next c
    Next ws
End Sub

Or, even faster (less than 1 second):

Sub RunFast()
    Dim strText As String
    Dim varCells As Variant
    For Each ws In ActiveWorkbook.Sheets
        varCells = ws.UsedRange
        For i = 1 To UBound(varCells, 1)
            For j = 1 To UBound(varCells, 2)
                strText = strText & CStr(varCells(i, j))
            Next j
        Next i
    Next ws
End Sub

Perhaps something is happening in the for loop in C# that I'm not aware of? Is it possible to load a range into an array-type object (as in my last example) to allow iteration over just the values, not the cell objects?

Answer

Anton&#237;n Lejsek picture Antonín Lejsek · Mar 5, 2017

I use this function. The loops are only for converting to array starting at index 0, the main work is done in object[,] tmp = range.Value.

public object[,] GetTable(int row, int col, int width, int height)
{
    object[,] arr = new object[height, width];

    Range c1 = (Range)Worksheet.Cells[row + 1, col + 1];
    Range c2 = (Range)Worksheet.Cells[row + height, col + width];
    Range range = Worksheet.get_Range(c1, c2);

    object[,] tmp = range.Value;

    for (int i = 0; i < height; ++i)
    {
        for (int j = 0; j < width; ++j)
        {
            arr[i, j] = tmp[i + tmp.GetLowerBound(0), j + tmp.GetLowerBound(1)];
        }
    }                 

    return arr;
}