For this tutorial we will need two data sources. I am taking one flat file and another excel file so that new user can understand that extraction can be done from different types of datasources.
Text file contains “CustomerId, Subscription Start Date, Subscription End Date” and Excel file Contains “CustomerId, Segment, CustomerIdSpace” columns.
Observe that both data sources contains one common column named “CustomerId”. For a while don’t count the column “CustomerIdSpace” of Excel Data Source.
Create New Business Intelligent project in Visual Studio 2005 and Drag “Data Flow Task” from tools to package.
Double click on “Data Flow Task” and new tab will opened. Drag “Flat File Source” and “Excel source” from tool box.
Double click on “Flat File Source” and “Excel Source” and create new Connection for excel file and text file respectively.
Drag two “Sort” control from “Data Flow Transformation” section of the toolbox below both sources.
Double click on “Sort” control and select the column “CustomerId” for both “Sort” Control.
Now Drag the “Merge Join” control from tool box and drop green arrow from both “Sort” Control to the “Merge Join” control. Select the type of Join, in this case we have selected “inner join“, also select the columns which should be exported in Output .
Here, we are not going to write the result in file, but we will use “derived column” control after merge join and add “data Viewer” as discussed in previous article to view the output.
The final snapshot of the ETL package will look like:
ETL is the process in database usage, specially in Dataware house that evolves getting data from different sources (Extract) , performing manipulation operations as per business need (Transformation) and saving on destination database (loading).