Integration Services Fundamentals

This course is for business intelligence developers and database professionals responsible for moving, manipulating, and integrating data and assumes no prior experience with Integration Services
Course info
Rating
(1304)
Level
Beginner
Updated
Jun 6, 2011
Duration
7h 1m
Table of contents
Introduction to Integration Services
Control Flow Basics
Control Flow Tasks
Data Flow Basics
Data Flow Transformations
Expressions
Scripts
Debugging SSIS Packages
Package Reliability
Package Execution
Description
Course info
Rating
(1304)
Level
Beginner
Updated
Jun 6, 2011
Duration
7h 1m
Description

This course explains how to develop Integration Services packages with an emphasis on the development of processes that support data warehousing. It begins by describing the overall process of package development, then describes the types of tasks that Integration Services and shows how these tasks can be performed sequentially or in parallel in the control flow by using precedence constraints. Then the course continues by reviewing the data flow components that are used for extract, transform, and load processes. In addition, the course covers the Integration Services expression language and scripting, and demonstrates how to debug packages, configure logging, manage transactions, and manage package restarts. It also describes how to automate the execution of packages. The features and demonstrations in this course focus on the SQL Server 2008 R2 release, although most topics also apply to earlier versions of Integration Services.

About the author
About the author

Stacia Misner Varga is a consultant, instructor, author, mentor, BI Partner for SQLSkills, and principal of Data Inspirations specializing in Microsoft business intelligence technologies for over 10 years. She is a frequent speaker at the PASS Summit, IT/Dev Connections conferences, and various SQL Saturday and Microsoft-related events worldwide.

More from the author
Getting Started with Power BI
Beginner
3h 14m
23 Jun 2016
More courses by Stacia Misner Varga
Section Introduction Transcripts
Section Introduction Transcripts

Introduction to Integration Services
Hello I'm Stacia Misner from Pluralsight and I'll be presenting Introduction to Integration Services. In this module we'll start learning about SQL Server integration services, covering some basic concepts that you need to understand before beginning to develop your own packages. We'll also review the architecture of integration services and we'll look at the steps involved in the package development process. Then we'll look at how to work with package projects in business intelligence development studio, also known as BIDS. And last we'll get familiar with the package designer interface. There are lots of different business scenarios for using integration services. With data warehouse population being the most common scenario. This usage for integration services is also knows as extract, transform, and load or ETL. Cleaning and standardizing data can be an expensive and time intensive task that can now be automated by using integration services. And another common task in data warehousing, is merging data for multiple data sources, even when the sources are different, whether they're in relational format or in flat files. And integration services can also be used to automate administrative tasks. Perhaps a DBA develops an intricate database maintenance plan which can then be executed on a periodic basis. These are only a few examples of what's possible and developers can extend the capabilities and built in functionality of integration services to accommodate any type of scenario. Particularly where rapid application development is one of the objectives.

Control Flow Basics
Hello I'm Stacia Misner and I'll be presenting Control Flow Basics. In this module we'll review how to use control flow and integration services to perform sequential tasks. In particular, we'll review the various components that we used to perform these tasks in control flow and then in a later module we'll explore the control flow tasks themselves in more detail. So in this module we'll start with connection managers, which we use to tell integration services how to connect to a server or a file. Then we'll discuss tasks at a high level. We use tasks to perform some type of action, like execute a SQL command or delete a file. We control the sequence of processing of tasks by using precedence constraints. So we'll look at all the various options we have here for working with control flow components. And we can organize tasks into containers as one way to control when tasks execute and as a way to manage transactions within a package.

Control Flow Tasks
Hello I'm Stacia Misner and I'll be presenting Control Flow Tasks. In this module we're going to spend some more time exploring the control flow tasks that were introduced in the previous module. We'll cover everything except scripting which will come in a separate module. Our focus in this module is simply on learning about the tasks that are available and how to configure them rather than building complete packages that implement these tasks as part of a business intelligent solution. Our approach to learning about these tasks is to look at them by category and consider how they might be used in an ETL process for data warehousing, but don't worry, even if you're not working on a data warehouse, you'll learn what these tasks do and get ideas for how to use them. We'll be starting with data preparation tasks that we can use to retrieve data to perform tasks. These data preparation tasks are separate from the data flow process that we'll be talking about in the next module. Then we'll discuss process communication tasks that allow us to either get or send information from external processes or other integration services packages. Then we have SQL Server tasks to perform bulk load activities, SQL statements, or to move database objects. And we have analysis services task to help us keep cubes up to date with fresh data and also to apply data mining models to our data. And last, we have database maintenance tasks, which are used to do things like backups or maintain indexes.

Data Flow Basics
Hello I'm Stacia Misner and I'll be presenting Data Flow Basics. In this module we'll get familiar with the data flow task. We'll start by learning what the data flow task does, then we'll review the pipeline architecture that integration services uses when executing the data flow task. Then we'll look at the components that we use for moving data, starting with data sources that we use to get the data. And then we'll move onto the data destinations that we can use to store the data that we retrieved from those sources. In this module our focus is purely on moving data from one place to another. In the next module we'll look at the transformations that we can use to change the data while it's in transit.

Data Flow Transformations
Hello I'm Stacia Misner and I'll be presenting Data Flow Transformations. In the previous module we learned about the pipeline architecture of the data flow task and focused on the beginning and end of the pipeline by reviewing each of the available data sources and data destinations. In this module we'll focus on the various transformations that are available to change the content or structure of the data before we send to its destination. First we'll look at row transformations. These are the transformations that operate on data one row at a time. Then we'll look at row set transformations. With these transformations the number of rows and sometimes the number of columns change, which means that the amount of data that goes in is not necessarily the same amount of data that comes out. Of course that's metaphorically speaking, data doesn't really flow through transformations but instead transformations pass over data as we learned in the previous module during the discussion on the pipeline architecture. Next we'll look at the split and join transformations. This group of transformations will take data and create multiple outputs from it, or do the opposite, taking multiple inputs to produce a single output. The enterprise edition of SQL Server includes several business intelligence transformations for integration services, for data cleansing, text mining, and data mining operations. And we'll wrap up with a handful of transformations that don't fit into any other category.

Expressions
Hello I'm Stacia Misner and I'll be presenting Expressions. Expressions give us the ability to change the behavior of a package at runtime, either because we pass in a value from an external source or because something changes during package execution. And we can use that information to change a property for a task dynamically. To help us create expressions we can use the expression builder, a graphical user interface that shows us the variables and columns that we can reference in addition to the functions that we can use. But to use the expression builder correctly we need to understand the syntax of the integration services expression language. To make packages flexible, we can use variables to change behavior at runtime. Or to store a value obtained from one task to use it in another task. Once we've developed an understanding of the syntax of the expression language and how to work with variables, we're ready to put our understanding into practice by learning how to use expressions in the control flow and in the data flow.

Scripts
Hello I'm Stacia Misner and I'll be presenting Scripts. Thus far in this course we've covered everything that we can make happen in a package with the out-of-the-box components. When we need to do something that isn't covered by a control flow task or a data flow component, usually we can accomplish what we need to do by using a script. We'll start this module with an introduction to the ways that we can use scripting in integration services. Then we'll look at how we configure the script task component that we can use in the control flow. Then we'll learn about the DTS object that allows us to interact with the package either by reading or writing to variables or by firing events or returning the result of the script back to the integration services engine, just to name a few example. And last we'll look at the script component that we can use in the data flow. This is a versatile component because we can use it as a source, transformation, or a destination.

Debugging SSIS Packages
Hello I'm Stacia Misner and I'll be presenting debugging packages. If you've ever developed code using Visual Studio, you'll recognize some of the debugging tools that integration services includes. Just as we can set breakpoints in Visual Studio to pause program execution, we can set breakpoints it the control flow of the package designer in BIDS to pause package execution. Breakpoints are a design time feature which means they'll only effect package execution when we're working with the package in BIDS. When we have paused package execution, we can view debug windows to review the current state of variables. Breakpoints can also be set within the VSTA environment when we have the script tasks. In this module we'll take a look at how we use the debugging tools when working with the script tasks in the control flow. Now as we'll see shortly, debugging is fairly straightforward in the control flow, but we cannot use the same techniques within the data flow. Now even though we're doing the same kind of scripting when we add a script component to the data flow, breakpoints will not work in script components. But we have workarounds to this limitation that we'll review in this module. And if you have viewed the other modules in this course, then you've already seen the data viewer. The data viewer effectively becomes a breakpoint for us in the data flow. And we'll spend a bit more time exploring the data viewer in this module.

Package Reliability
Hello I'm Stacia Misner and I'll be presenting Package Reliability. Our integration services packages need to run reliably every time. In this module we'll learn how we can monitor package activity so that if things do go wrong we have a record of what happened. And we'll learn how we can enable features in our packages to respond to certain conditions or to restart a package from the first failed task rather than from the very beginning. Before we get into package restart ability though we'll start by reviewing the steps necessary to enable package logging so that we can get some visibility into what happened during package execution. Now sometimes errors occur in the data flow and that can bring the data flow execution to a halt, but we can use error handling in the data flow to separate data that's causing a problem from the good data. In earlier modules we saw that we can fire events in our scripts, but other package activities can also fire events, such as warnings or errors. In this module we'll see how we can configure tasks to do something in response to an event. Now a big potential problem we have when packages don't run correctly is maintaining consistency in the data. To help manage data consistency we can enable transaction support in a package. And if a package fails we have to restart it again and by default it starts at the beginning. That can be a painful thing to have to do if the initial tasks in the package take a long time to execute, but it's the last task in the package that failed. If we can design the package in such a way to store results of the initial tasks in a staging area, then we can take advantage of checkpoints to restart a package from the point of failure and skip right over everything else that's already finished.

Package Execution
Hello I'm Stacia Misner and I'll be presenting package execution. After we've created packages we don't want to have to open BIDS every time we want to execute them. Instead we want to automate package execution. In this module we'll learn about the different options we have for executing packages and the considerations for deciding where to store and execute our packages. We'll start the module by reviewing the utilities available for package execution. We have one utility that provides a graphical interface making it easy to use. And it generates the arguments that we need to use with the command line utility, which is useful when we want to incorporate package execution into batch jobs. Which is useful when we want to incorporate package execution into batch jobs. A very common way to execute packages is to create a SQL Server agent job so that we can run the packages on a scheduled basis. As we'll see the interface for setting up the job is very similar to the package execution graphical interface, which makes it very easy to set up. And then last, we'll look at how the package storage location and the package execution location impacted server resources.