I hate to resort to StackOverflow for something so (seemingly) basic, but I've been fighting with Microsoft for the last few hours and seem to be hitting a dead end. I am trying to read (large) Excel 2007+ spreadsheets, and Google has kindly informed me that using the OpenXml SDK is a pretty popular choice. So I gave the thing a shot, read some tutorials, checked Microsoft's own library pages, and got very little out of them all.
I am using a small test spreadsheet with just one column of numbers and one of strings - large scale testing will come later. I've tried several implementations similar to the one I am about to post, and none of them read data. The code below was mostly taken from another StackOverflow thread, where it seemed to have worked - not so for me. I figured I'll have you guys check/debug/help with this version, because it'll likely be less broken than anything I have written today.
static void ReadExcelFileSAX(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, true))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart);
string text;
string rowNum;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
do
{
if (reader.HasAttributes)
{
rowNum = reader.Attributes.First(a => a.LocalName == "r").Value;
Console.Write("rowNum: " + rowNum); //we never even get here, I tested it with a breakpoint
}
} while (reader.ReadNextSibling()); // Skip to the next row
Console.ReadKey();
break; // We just looped through all the rows so no need to continue reading the worksheet
}
if (reader.ElementType == typeof(Cell))
{
}
if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
reader.Skip(); // Skip contents of any node before finding the first row.
}
reader.Close();
Console.WriteLine();
Console.ReadKey();
}
}
And, on a side note, are there any good alternatives to using the OpenXml SDK I have somehow missed?
I think you took the wrong WorksheetPart
for reading the rows.
The line
workbookPart.WorksheetParts.First();
gets the first WorksheetPart
of the collection which must not
necessarily be the first worksheet as you see it in Microsoft Excel.
So, iterate through all WorksheetParts
and you should see some output on your
console window.
static void ReadExcelFileSAX(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument =
SpreadsheetDocument.Open(fileName, true))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
// Iterate through all WorksheetParts
foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
{
OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart);
string text;
string rowNum;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
do
{
if (reader.HasAttributes)
{
rowNum = reader.Attributes.First(a => a.LocalName == "r").Value;
Console.Write("rowNum: " + rowNum);
}
} while (reader.ReadNextSibling()); // Skip to the next row
break; // We just looped through all the rows so no
// need to continue reading the worksheet
}
if (reader.ElementType != typeof(Worksheet))
reader.Skip();
}
reader.Close();
}
}
}
To read all cell values use the following function (all error handling details omitted) :
static void ReadAllCellValues(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
foreach(WorksheetPart worksheetPart in workbookPart.WorksheetParts)
{
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
do
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
Console.Out.Write("{0}: {1} ", c.CellReference, cellValue);
}
} while (reader.ReadNextSibling());
Console.Out.WriteLine();
}
}
}
}
}
In the code above you see that cells with data type SharedString
must be handled using
the SharedStringTablePart
.