How do I format dd-mmm-yy values in flat file to smalldatetime during data import?

Adrian S picture Adrian S · Aug 25, 2011 · Viewed 11.3k times · Source

I have a flat file which is imported into SQL Server via an existing SSIS package. I need to make a change to the package to accommodate a new field in the flat file. The new field is a date field which is in the format dd-mmm-yy (e.g. 25-AUG-11). The date field in the flat file will either be empty (e.g. a space/whitespace) or populated with a date. I don’t have any control over the date format in the flat file.

I need to import the date field in the flat file into an existing SQL Server table and the target field data type is smalldatetime.

I was proposing to import the date as a string into a load table and then convert to smalldatetime when taking the data from the load table. But is there another possible way to parse the date format dd-mmm-yy to load this straight into a smalldatetime field without having to use convert to smalldatetime from the load table. I can’t quite think how to parse the date format, particularly the month. Any suggestions welcome.

Answer

user756519 picture user756519 · Aug 25, 2011

Here is an example that might give you an idea of what you can do. Ideally, in an SSIS package or in any ETL job, you should take into account that data may not be exactly what you would like it to be. You need to take appropriate steps to handle the incorrect or invalid data that might pop up now and then. That's why SSIS comes up with lots of Transformation tasks within Data Flow Task which you can make use of to clean up the data.

In your case, you can make use of Derived Column transformation or Data conversion transformation to achieve your requirements.

The example was created in SSIS 2008 R2. It shows how to read a flat file containing the dates and load into an SQL table.

File

I created a simple SQL table to import the flat file data.

Table

On the SSIS package, I have a connection manager to SQL and one for Flat file. Flat file connection is configured as shown below.

Connection

Flat file 1

Flat file 2

Flat file 3

On the SSIS package, I placed a Data Flow Task on the Control Flow tab. Inside, the Data Flow task, I have a Flat File Source, Derived Column transformation and an OLE DB Destination. Since the Flat file source and OLE DB destination are straightforward, I will leave those out here. The Derived transformation creates a new column with the expression (DT_DBDATE)SmallDate. Note that you can also use Data Conversion transformation to do the same. This new column SmallDateTimeValue should be mapped to the database column in OLE DB Destination.

Derived column

If you execute this package, it will fail because not all the values in the file are valid.

Failed

The reason why it fails in your case is because the invalid data is directly inserted into the table. In your case, the table will throw an exception making the package to fail. In this example, the package fails because the default setting on the Derived column transformation is to fail the component if there is any error. So, let's place a dummy transformation to redirect the error rows. We will Multicast transformation for this purpose. It won't really do anything. Ideally, you should redirect the error rows to another table using OLE DB Destination or other Destination component of your choice so you can analyze the data that causes the errors.

Drag the red arrow from Derived transformation and connect it to the Multicast transformation. This will popup the Configure Error Output dialog. Change the values under the column Error and Truncation from Fail component to Redirect row. This will redirect any error rows to the Multicast transformation and will not get into the tables.

Configure Error output

Now, if we execute the package, it will run successfully. Note the number of rows displayed in each direction.

Success

Here is the data that got into the table. Only 2 rows were valid. You can look at the first screenshot that showed the data in the file and you can see only 2 rows were valid.

Hope that gives you an idea to implement your requirement in the SSIS package.

Table data