Working with Open XML 2.0 using c# to parse large excel files. Issue I'm running into is the cell I'm parsing does not have a DataType I then check the NumberFormatId to determine if it is decimal, number or date. I'm looking for the exact NumberFormatId range for numbers/decimals vs dates. They seem to be all over the place some numbers/decimals have formats of 189,212,214,305 and dates having values of 185, 194, 278 etc. Does anyone know if the specification defines these ranges?
Edited - More Information
Below is an example of the number format of 194 from the style.xml file inside the xl folder.
The excel sheets are from different regions of the world so I'm thinking the number formats are different, but do they overlap? Will numFmtId 194 be something other than a date on different culture settings?
Below is how I'm converting c.CellValues like "40574" to dates, but the issue is how do I know if "40574" is a date and not a number?
DateTime.FromOADate(Convert.ToDouble(c.CellValue.Text));
Currently I'm doing this by checking if there is no DataType than check the CellFormat but there are issues when some of the NumberFormatId are not in my check.
private Object FormatCellValue(Cell c, SharedStringTable ssTable, CellFormats cellFormats)
{
if (c.CellValue != null)
{
// If there is no data type, this must be a string that has been formatted as a number
if (c.DataType == null)
{
CellFormat cf;
if (c.StyleIndex == null)
{
cf = cellFormats.Descendants<CellFormat>().ElementAt<CellFormat>(0);
}
else
{
cf = cellFormats.Descendants<CellFormat>().ElementAt<CellFormat>(Convert.ToInt32(c.StyleIndex.Value));
}
if ((cf.NumberFormatId >= 14 && cf.NumberFormatId <= 22) ||
(cf.NumberFormatId >= 165 && cf.NumberFormatId <= 180) ||
cf.NumberFormatId == 278 || cf.NumberFormatId == 185 || cf.NumberFormatId == 196 ||
cf.NumberFormatId == 217 || cf.NumberFormatId == 326) // Dates
{
try
{
DateTime dt;
dt = DateTime.FromOADate(Convert.ToDouble(c.CellValue.Text));
...CODE CONTINUES
Edit
In my updated post I forgot to post the value I found in the style.xml file:
<numFmt numFmtId="323" formatCode="mmm/yy;@"/>
So with this my question would be how do I get the formatCode and parse it to determine if it is a date?
Below is the output from the immediate debug window of the numberformat 323
{DocumentFormat.OpenXml.Spreadsheet.CellFormat}
base {DocumentFormat.OpenXml.OpenXmlCompositeElement}: {DocumentFormat.OpenXml.Spreadsheet.CellFormat}
Alignment: {DocumentFormat.OpenXml.Spreadsheet.Alignment}
ApplyAlignment: "1"
ApplyBorder: "1"
ApplyFill: "1"
ApplyFont: "1"
ApplyNumberFormat: "1"
ApplyProtection: "1"
BorderId: "64"
ExtensionList: null
FillId: "0"
FontId: "83"
FormatId: "37992"
LocalName: "xf"
NumberFormatId: "323"
PivotButton: null
Protection: {DocumentFormat.OpenXml.Spreadsheet.Protection}
QuotePrefix: "1"
Lists of formats ID values
Below is the list of format options (source)
ID Format Code
0 General
1 0
2 0.00
3 #,##0
4 #,##0.00
9 0%
10 0.00%
11 0.00E+00
12 # ?/?
13 # ??/??
14 d/m/yyyy
15 d-mmm-yy
16 d-mmm
17 mmm-yy
18 h:mm tt
19 h:mm:ss tt
20 H:mm
21 H:mm:ss
22 m/d/yyyy H:mm
37 #,##0 ;(#,##0)
38 #,##0 ;[Red](#,##0)
39 #,##0.00;(#,##0.00)
40 #,##0.00;[Red](#,##0.00)
45 mm:ss
46 [h]:mm:ss
47 mmss.0
48 ##0.0E+0
49 @
Hower, those list specify only several formats. According to this post: Reading dates from OpenXml Excel files, format with ID value less than 164 are built in. You can also find a longer list of formats there.
Checking formats ID values in xlsx file
For formats with greater ID values, you can find their definitions inside the file itself. In order to see them, you should open it with a zip archive browser and find styles.xml file in xl directory. Alternatively open this xlsx file with Open XML SDK 2.0 Productivity Tools and navigate to that file's /xl/styles.xml/x:StyleSheet node.
In that section, you should be able to see formats defined in your document along with ID values assigned to them. The part with formats should look similar to this:
...
<x:numFmts count="1">
<x:numFmt numFmtId="166" formatCode="yy/mm/dd;@" />
</x:numFmts>
...
Looking at formats saved here, it seems that id vlaues can be specific to a xlsx file, so probably the same ID value can be used to define different formats in two different xlsx files. However, for built-int formats they're predefined, so should be the same in all the files.
If you need any help with finding this formats in your file or additional information, let me know.
EDIT
You can also find some more information about number formats in this document: http://msdn.microsoft.com/en-us/library/documentformat.openxml.spreadsheet.numberingformat.aspx.
EDIT II
You can use this code to get a dictionary containing all the formats defined within the xlsx file:
private Dictionary<uint, String> BuildFormatMappingsFromXlsx(String fileName)
{
Dictionary<uint, String> formatMappings = new Dictionary<uint, String>();
using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileName, true))
{
var stylePart = document.WorkbookPart.WorkbookStylesPart;
var numFormatsParentNodes = stylePart.Stylesheet.ChildElements.OfType<NumberingFormats>();
foreach (var numFormatParentNode in numFormatsParentNodes)
{
var formatNodes = numFormatParentNode.ChildElements.OfType<NumberingFormat>();
foreach (var formatNode in formatNodes)
{
formatMappings.Add(formatNode.NumberFormatId.Value, formatNode.FormatCode);
}
}
}
return formatMappings;
}
If you want to check if any of those is a date, I suppose a simple way would be verifying if format code (value in the dictionary created by the method I've posted) contains mm and yy substrings.