How to get list of ONLY excel worksheet names in Excel using OLEDB; filter out non-worksheets that show up in metadata

Jim picture Jim · May 18, 2012 · Viewed 17.2k times · Source

I have an issue getting worksheet names from an Excel spreadsheet using OLEDB. The problem is that when I use GetOleDbSchemaTable, the resulting DataTable has more than just the actual worksheet names; it has extra rows for "Tables" that I can only assume are used internally by Excel.

So for example, if I have a worksheet named myWorksheet, the code below might end up with a list that contains myWorksheet$, myWorksheet$PrintTable and myWorksheet$_. Only the first myWorksheet$ record is for the actual worksheet. The others are just garbage that I don't need. When you look at them in metadata they look just like regular tables, even with the type of TABLE.

For now I just manually filtered out anything with "$_" or "$Print" in the name, but who knows what other Excel feature might make these extra records turn up in a different format.

Does anyone know the best way to get ONLY actual worksheet names, and not these internal tables that aren't worksheets? Is there something in metadata that would differentiate them?

 private ArrayList getXlsWorksheetNames(OleDb.OleDbConnection conn)
    {
        ArrayList wsList = new ArrayList();
        DataTable schemaTable;

        try
        {
            conn.Open();
            schemaTable = conn.GetOleDbSchemaTable(OleDb.OleDbSchemaGuid.Tables, null);

            foreach (DataRow row in schemaTable.Rows)
            {
                //form.appendToResultsTxt("Adding worksheet to list: " + Environment.NewLine +
                //    "Name = " + row.Field<string>("TABLE_NAME") + "," + Environment.NewLine +
                //    "Type = " + row.Field<string>("TABLE_TYPE") + "," + Environment.NewLine + Environment.NewLine);
                wsList.Add(row.Field<string>("TABLE_NAME"));
            }
            conn.Close();
        }
        catch (Exception ex)
        {
            if (this.mode == Cps2TxtUtilModes.GUI_MODE)
            {
                this.form.appendToResultsTxt(ex.ToString());
            }
            throw;
        }

        return wsList;
    }

I read through the article at this link, but they don't seem to be doing anything differently than I am, and I don't see any filtering out of extra non-worksheet tables, so Microsoft doesn't seem to have provided the right answer.

http://support.microsoft.com/kb/318452

And I've also looked around alot of StackOverflow, like at the thread from the link below which was helpful, but doesn't solve this one problem.

Using Excel OleDb to get sheet names IN SHEET ORDER

Before anyone asks, I'd also like to say that I don't really have control over what features are used in the spreadsheet, so I can't just tell them "Don't turn on filtering" or "Don't use print tables".

Any ideas are much appreciated. Thanks!

Answer

pepr picture pepr · Sep 30, 2015

The question is old, but for those who found it now, the skipping can be done as Jim found...

// skip those that do not end correctly
foreach (DataRow row in schemTable.Rows)
{
    string sheetName = row["TABLE_NAME"].ToString();
    if (!sheetName.EndsWith("$") && !sheetName.EndsWith("$'"))
        continue;
    Console.WriteLine(sheetName);
}

That is the wanted are or those that end with $ or those that end with $'.