Quirky SELECT from Excel file via OleDbDataAdapter method (C#)

SimoneF picture SimoneF · Dec 1, 2010 · Viewed 14.6k times · Source

I have got an Excel file in this form :

Column 1    Column 2    Column 3  
 data1        data2    
 data1        data2  
 data1        data2  
 data1        data2  
 data1        data2       data3  

That is, the whole Column 3 is empty except for the last row. I am accessing the Excel file via OleDbDataAdapter, returning a DataTable: here's the code.

query = "SELECT * FROM [" + query + "]";
objDT = new DataTable();
objCmdSQL = this.GetCommand();
objCmdSQL.CommandText = query;
objSQLDad = new OleDbDataAdapter(objCmdSQL);
objSQLDad.Fill(objDT);
return objDT;

The point is, in this scenario my code returns a DataTable with just Column 1 and Column 2.
My guess is that JET engine tries to infer column type by the type of the very first cell in every column; being the first value null, the whole column is ignored.
I tried to fill in zeros and this code is actually returning all three columns; this is obviously the least preferable solution because I have to process large numbers of small files.
Inverting the selection range (from, i.e. "A1:C5" to "C5:A1" ) doesn't work either. I'm looking for something more elegant.
I have already found a couple of posts discussing type mismatch (varchar cells in int columns and vice versa) but actually haven't found anything related to this one.
Thanks for reading!

edit

Weird behavior again. I have to work on mostly Excel 2003 .xls files, but since this question has been answered I thought I could test my code against Excel 2007 .xslx files. The connection string is the following:

string strConn = @"Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" + _fileName.Trim() + @";Extended Properties=""Excel 12.0;HDR=No;IMEX=1;""";

I get the "External table is not in the expected format" exception which I reckon is the standard exception when there is a version mismatch between ACE/JET and the file being opened.

The string

Provider=Microsoft.ACE.OLEDB.12.0 

means that I am using the most recent version of OLEDB, I took a quick peek around and this version is used everywhere there is need of connecting to .xlsx files.
I have tried with just a vanilla provider ( just Excel 12.0, without IMEX nor HDR ) but I get the same exception.
I am on .NET 2.0.50727 SP2, maybe time to upgrade?

Answer

Sorax picture Sorax · Dec 1, 2010

I recreated your situation and following returned the 3 columns correctly. That is, the first two columns fully populated with data and the third containing null until the last row, which had data.

string connString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\MyExcel.xls;Extended Properties=""Excel 8.0;HDR=No;IMEX=1"";";
DataTable dt = new DataTable();
OleDbConnection conn = new OleDbConnection(connString);
OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", conn);

adapter.Fill(dt);

Note I used the Access Database Engine(ACE) provider, which succeeded the old Joint Engine Technology(JET) provider, and my results may represent a behavior difference between the two. Of course, if you aren't already using it I suggest using the ACE provider as I believe Microsoft would too. Also, note the connection's Extended Properties:

"HDR=Yes;" indicates that the first row contains columnnames, not data. "HDR=No;" indicates the opposite.

"IMEX=1;" tells the driver to always read "intermixed" (numbers, dates, strings etc) data columns as text. Note that this option might affect excel sheet write access negative.

Let me know if this helps.