SQL SERVER bulk insert ignore deformed lines

Toby picture Toby · Jul 2, 2012 · Viewed 14.6k times · Source

I have to import SAP unconvered lists. These reports look quite ugly and are not that well suited for automated processing. However there is no other option. The data is borderd around minus and pipe symbols similar to the following example:

02.07.2012
--------------------
Report name
--------------------
|Header1 |Header2  |
|Value 11|Value1 2 |
|Value 21|Value2 2 | 
--------------------

I use a format file and a statement like the following:

SELECT Header1, Header2
FROM  OPENROWSET(BULK  'report.txt',
FORMATFILE='formatfile_report.xml'  ,
errorfile='rejects.txt',
firstrOW = 2,
maxerrors = 100 ) as report

Unfortunately I receive the follwing error code:

Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

The rejects txt file contains the last row from the file with just minuses in it. The rejects.txt.Error.Txt documents:

Row 21550 File Offset 3383848 ErrorFile Offset 0 - HRESULT 0x80004005

The culprit that raises the error is obviously the very last row that does not conform to the format as declared in the format file. However the ugly header does not cause much problems (at least the one at the very top).

Although I defined the maxerror attribute that very one deformed line kills the whole operation. If I manually delete the last line containing all that minuses (-) everything works fine. Since that import shall run frequently and particularly unattended that extra post-treatment is not serious solution.

Can anyone help me to get sql server to be less picky and susceptible respectively. It is good that it documents the lines that couldn't be loaded but why does it abort the whole operation? And further after one execution of a statement that caused the creation of the reject.txt no other (or the same) statement can be executed before the txt file gets deleted manually:

Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "rejects.txt" could not be opened. Operating system error code 80(The file exists.).
Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "rejects.txt.Error.Txt" could not be opened. Operating system error code 80(The file exists.).

I think that is weird behavior. Please help me to suppress it.

EDIT - FOLLOWUP: Here is the format file I use:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
   <FIELD ID="EMPTY" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="100"/>
   <FIELD ID="HEADER1" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="100"/>
   <FIELD ID="HEADER2" xsi:type="CharTerm" TERMINATOR="|\r\n" MAX_LENGTH="100"/>
 </RECORD>
 <ROW>
   <COLUMN SOURCE="HEADER1" NAME="HEADER2" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="HEADER2" NAME="HEADER2" xsi:type="SQLNVARCHAR"/>
 </ROW>
 </BCPFORMAT>

Answer

lyrisey picture lyrisey · Jul 4, 2012

BULK INSERT is notoriously fiddly and unhelpful when it comes to handling data that doesn't meet the specifications provided.

I haven't done a lot of work with format files, but one thing you might want to consider as a replacement is using BULK INSERT to drop each line of the file into a temporary staging table with a single nvarchar(max) column.

This lets you get your data into SQL for further examination, and then you can use the various string manipulation functions to break it down into the data you want to finally insert.