This course explains how to apply Integration Services features to build packages that support the extract, transform, and load operations of a data warehouse. It covers design patterns for staging data and for loading data into fact and dimension tables. In addition, this course describes how to enhance ETL packages with data cleansing techniques and offers insight into the buffer architecture of the data flow engine to hep package developers get the best performance from packages. This course was written for SQL Server 2012 Integration Services, but most principles apply to SQL Server 2005 and later.
Stacia Misner Varga is a consultant, instructor, author, mentor, BI Partner for SQLSkills, and principal of Data Inspirations specializing in Microsoft business intelligence technologies for over 10 years. She is a frequent speaker at the PASS Summit, IT/Dev Connections
conferences, and various SQL Saturday and Microsoft-related events
Data Warehousing Packages, Part I Hi, I'm Stacia Misner. This is for the first module of the Advanced Integration Services course: Data Warehousing Packages, Part 1. Designing Packages for Extract Transform and Load Projects. . . . In this module I assume you that you have some experience already with Integration Services Packages. Now I'm going to show you how to work with packages that perform the Extract-Transform and Load Processing for Data Warehouse projects. Now I'm going to start this with the assumption that you may not have a strong background in Data Warehousing, so I'm going to begin with a brief introduction to explain some concepts and clarify some terminology. And then as part of this introduction, I'll also give you a very high level explanation of Dimensional Modeling. Then I'll explain how to use the Data Profiling task, which you might find useful for your Data Warehousing projects before you being the Extract-Transform and Load Process, also known as ETL. And with all of that introduction out of the way, I'll describe the big picture of ETL and show you want needs to happen at a very high-level and the types of packages that need to be developed. One type is an Extract or Staging Package and I'll show you the concepts and provide a demonstration. And then in Part two, we'll continue with Load Processing.
Data Warehousing Packages, Part II Hi, I'm Stacia Misner and this is Part Two of Data Warehousing Packages. Where we'll be continuing our discussion of designing Packages for Extract-Transform and Load Projects. In this module we continue expanding on what we learned in the previous module about Data Warehousing and ETL. We'll start with a review of the concept of Slowly Changing Dimensions and explain the choices we have before continuing on to some concrete examples of applying Slowly Changing Dimensions in some of the Load Packages. I'll review the patterns that most dimensions follow and then move on to the Load Patterns for fact tables and then I'll wrap-up this module with an explanation of how to incorporate Analysis Services into the process. And I'll show you how everything all ties together for a complete ETL solution by the time we complete this module.
Data Cleansing Hi, this is Stacia Misner and this module is Data Cleansing: Techniques for Cleansing Data in your ETL Processes. In this module we review the various transformations that we can use in a Data Flow Task to cleanse data. We'll start by reviewing various scenarios that require Data Cleansing. Then we'll look at specific problems and solutions related to Data Quality in Columns. And we'll do the same for Data Quality related to Records, as well as Business Rules. We'll even take a quick look at Data Quality Services and the Data Cleansing options that we have there. . . .
Package Performance Hi, I'm Stacia Misner and this module is Package Performance: Tuning Packages for Optimal Performance. The focus of the previous two modules of this course was the development of packages to get specific tasks done with lots of demonstrations to show these packages in action, but in those demonstrations we were dealing with relatively small datasets, which bear little resemblances to the much larger datasets in production environments. Undoubtedly, there'll be some situations that arise where we need to do some performance tuning to run packages most efficiently to fit within a specific window of operations when moving data between environments. Before we look at the options we have for improving package performance, we need to delve more deeply into the Pipeline Buffer Architecture of data flowtasks. Specifically, we need to be clear on how Integration Services manages data in memory and how our choice of transformations affects memory and performance. Another aspect of data flowperformance is Thread Management, which requires us two explore two additional concepts: Execution Trees and Execution Threads. Once we understand how Integration Services uses memory and CPU threads during execution of the data flow task, we can then understand which actions help us to optimize package performance. Last, we'll review our options for Monitoring package behavior so we know when it might be appropriate to implement performance optimizations.