SQL Server Integration Services (SSIS) is a tool that we use to perform data flow operations; i.e. extract, transform and load data. It retrieves data from just about any source. It performs various transformations on the data; e.g. convert from one type to another, convert to uppercase or lowercase, perform calculations, etc. It loads data into just about any source. It defines a workflow.
To begin with the activity to start with the SSIS packages, first, we need to install the SSIS tool in our visual studio server. After this installation VS will have the built-in add-on feature of SSIS tools. Then all the related controls will be available in VS toolbox.
What is Data Flow Task?
The data flow task is a tool which is used for retrieving the data from one or more sources, optionally change the content or structure of the data in transit, and then send the data to one or more than one destinations.
Every SSIS package has at least one task in the control flow. Adding multiple tasks to the control flow helps you direct the sequence of tasks by adding precedence constraints to connect a series of tasks from beginning to end. It also enables you to group these tasks together in containers.
Steps to create Package and data flow:
- Once the visual studio is open, now we need to create a solution based on our requirement. Since we are going to start with the integration services just move on to File -> New Project or Ctrl + Shift + N. This step opens up a pop up where one needs to select Integration Services Project and give the project name.
- After creating this new project, we can see that by default a new SSIS package is added (dtsx). Double click on package.dtsx to open the properties. Define the connection manager in this property.
- There are quite a few built-in tasks available in ToolBox. For our current package, we only need a Data Flow task. Drag and drop the Data Flow Task from the Toolbox onto the Control Flow designer.
- Right Click and rename the Data Flow task. Once you double click on task box it will navigate you to Data Flow Section, where you can design the flowchart of your data. In this data Flow section, we can add controls from Toolbox like OLE DB source, OLE DB destination and many more in between as per the requirement. In OLE DB source, we can define our custom query to pick the data from the source database. Whereas, in OLE Db destination, we can map the column derived from source to destination Db table.
Deployment: For the deployment of our packages, we have the following options available:
- Deploy to the file system
- Deploy to the package store
- Deploy to SQL Server
The simplest approach for deployment is “file system”. As SSIS package is actually just an XML file and it can simply be copied from its project location to a folder on the deployment target. We can also use the DOS COPY command to perform the copy operation. The package store is a particular folder on the file system.
We have to build our solution first. Once it is built successfully, we can either copy our project to the SQL target area or we can go to MSDB and target our Project in SSIS job scheduler by using the file System process in the scheduler.
Performance tuning points:
Fixing Source Components bottlenecks
The delivery speed by the source, for the requested data by the source component, is the main constraint of bringing the data into the data flow pipeline. The processing time required to extract the data is the fastest possible speed for the Data Flow Task, but it’s pointless to extract data without doing something with it. Any further addition of destination or subsequent transformation to the data flow pipeline will add performance overhead.
Few things in data flow design to improve that source component’s performance:
- Reduce the number of columns.
- Reduce the number of rows.
- Reduce column width.
- Use the SQL Command option instead of the Table or View option for relational sources.
- By setting the “Fast Parse” property to True for relevant columns
Fixing Destination Components bottlenecks
Sometimes in a package, the performance bottleneck is caused by the destination component. Although external factors can affect the performance of destinations, there are some recommendations for improving the data flow design that relates to destinations:
- Optimize the OLE DB destination.
- Instead of an OLE DB destination use a SQL Server destination.
- Set data types explicitly
Evaluate how efficiently SSIS can place data in buffers
To do this, you must enable the BufferSizeTuning logging event, then execute your package and examine the log.
Our SSIS Job scheduler will run periodically and execute the package to fetch the data from one db to another as per your defined logic and query in SSIS task flow. We can add the error handling also in our package flow task. Possible cause of the error many times is the data type issue and null exceptions.