Can someone please explain data mining, SSIS, BI, ETL and other related technologies?

Micah picture Micah · Jan 6, 2009 · Viewed 8.2k times · Source

I was talking with a co-worker yesterday regarding a situation where he used SSIS (or something like that) to do some really cool thing with an SSIS Package where he passed in a name like "Dr. Reginald Williams, PhD." and based on some weighting scheme the system was smart enough to figure out how to tokenize it and store it in the database as "Salutation- First Name - Last Name - Suffix". He threw out some buzzwords like BI, and SSIS, ETL, and Data mining. I really wanted more information, but didn't even know where to begin to ask.

I'm a .Net developer and thoroughly versed in C#, Vb.Net, WPF, etc..., but I have no idea what these technologies are, how to add them to my skill set, and whether or not it's something that I really should be focusing on. Any and all direction would be helpful.

Answer

keithwarren7 picture keithwarren7 · Jan 6, 2009

SSIS == SQL Server Integration Services and it is an Extract Transform and Load (ETL) tool, it is a far superior implementation of what was Data Transformation Services or DTS in SQL7, SQL2K era. It is a great tool for expressing workflow processes wherein data is moved from point A to point B (and c and d etc) and undergoes changes through that process such as consolidation to a denormalized design or data cleansing.

BI or Business Intelligence is a moniker for a entire category in the tech world and it is a great place to be right now. BI skills are very valued and hard to come by, one of the reasons this is the case is that it is hard to recreate a true BI case in a lab so teaching is almost always done in a real world situation.

From a high level, BI projects usually involve an end point of reporting. Often times as developers we are used to transactional report writing such as the details of a PO but BI can get into very broad reports that cover product sales trends over decades and deal with hundreds of millions of records. The way we design databases for applications is not ideal for this kind of reporting so other tools and technologies were invented and are used in the BI space. These are things like Cubes which you often hear called OLAP cubes. OLAP cubes usually originate from a data warehouse which is nothing more than another database - but typical warehouses contain data that came from more than one, and often dozens of other application databases. Your inventory app, purchasing app, HR app and a whole bunch of others all contain bits and pieces of data that create a complete picture of the business, a BI architect will use something like SSIS to pull the data from all these systems, massage it and store it in the data warehouse which is designed with a different kind of design better for reporting. Once it is in the warehouse he will use Analysis services to create cubes on that data and something like Reporting Services to show you reports over that data.

Edit: sorry, forgot Data Mining, it is another non-specific term that describes and concept or a process and not so much a tool. In a simple example, it is a methodical approach to identifying patterns in data. In the past a good business analysy would look through data for trends but with modern databases you are talking about datasets way too large to manually comb through - Data mining allows you to instruct the computer to comb through that data and identify patterns that are of interest.

Hope that helps