Before reading this tutorial, please go through the first article on creating ETL project in Business Intelligent (BI) tool of visual studio for SSIS.
For this tutorial we will need two data sources. I am taking one flat file and another excel file so that new user can understand that extraction can be done from different types of datasources.
Text file contains “CustomerId, Subscription Start Date, Subscription End Date” and Excel file Contains “CustomerId, Segment, CustomerIdSpace” columns.
Observe that both data sources contains one common column named “CustomerId”. For a while don’t count the column “CustomerIdSpace” of Excel Data Source.
Create New Business Intelligent project in Visual Studio 2005 and Drag “Data Flow Task” from tools to package.
Double click on “Data Flow Task” and new tab will opened. Drag “Flat File Source” and “Excel source” from tool box.
Double click on “Flat File Source” and “Excel Source” and create new Connection for excel file and text file respectively.
Drag two “Sort” control from “Data Flow Transformation” section of the toolbox below both sources.
Double click on “Sort” control and select the column “CustomerId” for both “Sort” Control.
Now Drag the “Merge Join” control from tool box and drop green arrow from both “Sort” Control to the “Merge Join” control. Select the type of Join, in this case we have selected “inner join“, also select the columns which should be exported in Output .
Here, we are not going to write the result in file, but we will use “derived column” control after merge join and add “data Viewer” as discussed in previous article to view the output.
The final snapshot of the ETL package will look like:
Merge Data from different sources in which the common column is not well formatted:
In above example, we have considered that “CustomerId” in both sources have same value. But what will happen if the column is not same and needs some modification. for example if one source have extra space in values of column “CustomerId”.
In Excel file we have one column named “CustomerIdSpace” which i said to forget in previous section.
To work in this type of situation, before sorting data we will need to change/format the inconsistent column. here we will need “Derived Coulmn” control from the toolbox.
As you can see in formula editor of derived column , one new column is added named as “Removed Id” and the value is calculated by expression TRIM(CustomerIdSpace).
The Final snapshot of the package is shown in below image: