I am trying to import data from a csv file to SQL Server. There are thousands of entries in the csv file and we have a lot of rows with incorrect data in it.
Some of the rows in the CSV File are:
`"ID"|"EmpID"|"FName"|"LName"|"Gender"|"DateOfBirth"
"1"|"90043041961"|"ABCD"|"TEST"|"F"|"1848-05-05 00:00:00.000"
"1"|"10010161961"|"XYZ"|"TEST"|"F"|"1888-12-12 00:00:00.000"
.
.
..
..
....
"4"|"75101141821PPKKLL"|"LLKK"|"F"|"1925-09-09 00:00:00.000"|""
"4"|"32041401961UUYYTT"|"PPLL"|"M"|"1920-01-01 00:00:00.000"|""
.
.....
"25"|"00468132034"|"FGTT"|"OOOO"|"F"|"1922-11-11 00:00:00.000"
"25"|"00468132034"|"KKKK"|"PPPP"|"F"|"1922-11-11 00:00:00.000"
Creating the TestTable and trying to insert data (from csv file) into it:
create table TestTable
(
ID varchar(5),
EmpID varchar(25),
FName varchar(25),
LName varchar(25),
Gender varchar(5),
DateOfirthB varchar(30)
);
I am using the following script to import data from csv file to the TestTable
in SQL Server:
bulk insert TestTable
from 'C:\TestData.csv'
with
(firstrow = 2,
DATAFILETYPE='char',
FIELDTERMINATOR= '"|"',
ROWTERMINATOR = '\n',
ERRORFILE ='C:\ImportErrors.csv',
MAXERRORS = 0,
TABLOCK
);
Errors:
Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 32763, column 5 (Gender).Msg 4863, Level 16, State 1, Line 1
Bulk load data conversion error (truncation) for row 32764, column 5 (Gender).
Is there any way to ignore the rows (in the csv file) which can not be added for some or other reason and insert the one's which have the correct syntax?
Thanks
PS: I can not use SSIS. Only allowed to use SQL
I deal with different CSV Files that I receive from different sources on a weekly basis, so of the data is nice and clean and others are a nightmare. So this is how I handle the CSV Fields I receive, I hope it helps you. You will still need to add some data validation to handle malformed data.
SET NOCOUNT ON
GO
-- Create Staging Table
IF OBJECT_ID(N'TempDB..#ImportData', N'U') IS NOT NULL
DROP TABLE #ImportData
CREATE TABLE #ImportData(CSV NVARCHAR(MAX))
-- Insert the CSV Data
BULK INSERT #ImportData
FROM 'C:\TestData.csv'
-- Add Control Columns
ALTER TABLE #ImportData
ADD ID INT IDENTITY(1, 1)
ALTER TABLE #ImportData
ADD Malformed BIT DEFAULT(0)
-- Declare Variables
DECLARE @Deliminator NVARCHAR(5) = '|', @ID INT = 0, @DDL NVARCHAR(MAX)
DECLARE @NumberCols INT = (SELECT LEN(CSV) - LEN(REPLACE(CSV, @Deliminator, '')) FROM #ImportData WHERE ID = 1)
-- Flag Malformed Rows
UPDATE #ImportData
SET Malformed = CASE WHEN LEN(CSV) - LEN(REPLACE(CSV, @Deliminator, '')) != @NumberCols THEN 1 ELSE 0 END
-- Create Second Staging Table
IF OBJECT_ID(N'TestTable', N'U') IS NOT NULL
DROP TABLE TestTable
CREATE table TestTable
(ID varchar(4000),
EmpID varchar(4000),
FName varchar(4000),
LName varchar(4000),
Gender varchar(4000),
DateOfirthB varchar(4000));
-- Insert CSV Rows
WHILE(1 = 1)
BEGIN
SELECT TOP 1
@ID = ID
,@DDL = 'INSERT INTO TestTable(ID, EmpID, FName, LName, Gender, DateOfirthB)' + CHAR(13) + CHAR(10) + REPLICATE(CHAR(9), 1)
+ 'VALUES' -- + CHAR(13) + CHAR(10) + REPLICATE(CHAR(9), 2)
+ '(' + DDL + ')'
FROM
(
SELECT
ID
,DDL = '''' + REPLACE(REPLACE(REPLACE(CSV, '''', ''''''), @Deliminator, ''','''), '"', '') + ''''
FROM
#ImportData
WHERE
ID > 1
AND Malformed = 0) D
WHERE
ID > @ID
ORDER BY
ID
IF @@ROWCOUNT = 0 BREAK
EXEC sp_executesql @DDL
END
-- Clean Up
IF OBJECT_ID(N'TempDB..#ImportData', N'U') IS NOT NULL
DROP TABLE #ImportData
-- View Results
SELECT * FROM dbo.TestTable