SQL Server Integration Services Design Patterns Second Edition Andy Leonard Tim Mitchell Matt Masson Jessica Moss Michelle Ufford Apress*
Contents J First-Edition Foreword About the Authors About the Technical Reviewer xv xvii xix Chapter 1: Metadata Collection 1 About SQL Server Data Tools 1 A Peek at the Final Product 1 SQL Server Metadatacatalog 3 sys.dm_os_performance_counters 3 sys.dm_db_index usage_stats 3 sys.dm_os_sys_info 3 sys.tables 3 sys.indexes 3 sys.partitions 4 sys.allocation_units 4 Setting Up the Central Repository 4 The Iterative Framework 6 Metadata Collection 14 Summary 26 HChapter 2: Execution Patterns 27 Building the Demonstration SSIS Package 27 Debug Execution 28 Command-Line Execution 29 Execute Package Utility 30 v
The SQL Server 2014 Integration Services Service 30 Integration Services Catalogs 30 Integration Server Catalog Stored Procedures 31 Scheduling SSIS Package Execution 53 Scheduling an SSIS Package 53 Scheduling a File System Package 54 Running SQL Server Agent Jobs with the Custom Execution Framework 55 Running the Custom Execution Framework with SQL Server Agent 56 Execute Package Task 57 Execution from Managed Code 58 The Demo Application 58 ThefrmMain Form 59 Conclusion 70 ^Chapter 3: Scripting Patterns 71 The Toolset 71 Should I Use Script? 72 The Script Editor 72 The Script Task 75 The Script Component 77 Script Maintenance Patterns 78 Code Reuse 78 Source Control 79 Scripting Design Patterns 79 Connection Managers and Scripting 80 Variables 82 Naming Patterns 85 Conclusion 85
Chapter 4: SQL Server Source Patterns 87 Setting Up a Source 87 Selecting a SQL Server Connection Manager and Provider 88 ADO.NET 89 ODBC 89 OLE DB 91 Creating a SQL Server Source Component 92 Writing a SQL Server Source Component Query 95 ADO.NET Data Access 95 OLE DB Data Access 96 Waste Not, Want Not 97 Data Translations 97 Source Assistant 97 Summary 99 Chapter 5: Data Correction with Data Quality Services 101 Overview of Data Quality Services 101 Using the Data Quality Client 102 Using DQS withssis 108 DQS Cleansing Transform 108 DQS Extensions on CodePlex 113 Cleansing Data in the Data Flow 114 Handling the Output of the DQS Cleansing Transform 114 Performance Considerations 117 Approving and Importing Cleansing Rules 121 Conclusion 123 ^Chapter 6: DB2 Source Patterns 125 DB2 Database Family 125 Selecting a DB2 Provider 126 Find the Database Version 126 Pick ProviderVendor 127 vii
Connecting to a DB2 Database 127 Querying the DB2 Database 130 DB2 Source Component Parameters 131 DB2 Source Component Dynamic Queries 132 Summary 133 Chapter 7: Flat File Source Patterns...135 Flat File Sources 135 Moving to SSIS! 136 Strong-Typing the Data 138 Introducing a Data-Staging Pattern 140 Variable-Length Rows 143 Reading into a Data Flow 144 Splitting Record Types 145 Terminating the Streams 146 Header and Footer Rows 147 Consuming a Footer Row 148 Consuming a Header Row 150 Producing a Footer Row 152 Producing a Header Row 159 The Archive File Pattern 163 Summary 169 Chapter 8: Loading a PDW Region in APS 171 Massively Parallel Processing 171 APS Appliance Overview 172 Hardware Architecture 172 Software Architecture 173 Shared-Nothing Architecture 175 Clustered Columnstore Indexes 175 viii
Loading Data 176 DWLoader vs. Integration Services 176 ETLvs. ELT 177 Data Import Pattern for PDW 178 Prerequisites 178 Preparing the Data 179 Package Overview 181 The Data Source 181 The Data Transformation 183 The Data Destination 184 Multithreading 189 Limitations 190 Summary 191 Chapter 9: XML Patterns 193 Using the XML Source 193 Dealing with Multiple Outputs 194 Making Things Easier with XSLT 200 Using a Script Component 203 Configuring the Script Component 203 Processing XML with XmlSerializer 209 Processing XML with XmlReader and LINQ to XML 210 Conclusion 212 Chapter 10: Expression Language Patterns 213 Getting to Know the Expression Language 213 What Is the Expression Language? 213 Why Use Expressions? 214 Language Essentials 215 Limitations 215 ix
Putting the Expression Language to Work 216 Package Expressions 216 Variable Expressions 217 Connection Managers 217 Project-Level Connection Managers 219 Control Flow 219 Data Flow Expressions 222 Conclusion 226 achapter 11: Data Warehouse Patterns 227 Incremental Loads 227 What Is an Incremental Load? 227 Why Incremental Loads? 228 The Slowly Changing Dimension 228 Incremental Loads of Fact Data 228 Incremental Loads in SSIS 228 Native SSIS Components 229 The Slowly Changing Dimension Wizard 232 The MERGE Statement 234 Change Data Capture (CDC) 237 Data Errors 242 Simple Errors 242 Missing Data 243 Coding to Allow Errors 246 Data Warehouse ETL Workflow 248 Dividing Up the Work 248 One Package = One Unit of Work 249 Conclusion 250 X
Chapter 12: OData Source -251 Understanding the OData Protocol 251 Data Type Mappings 252 Query Options 253 Configuring the OData Connection Manager 254 Enabling Microsoft Online Services Authentication 254 Configuring the Source Component 256 Overriding Data Types 259 Conclusion 260 a* Chapter 13: Slowly Changing Dimensions 261 The Slowly Changing Dimension Transform 261 Running the Wizard 262 Using the Transformations 267 Optimizing Performance 268 Third-Party SCD Components 269 Merge Pattern 270 Handling Type 1 Changes 271 Handling Type 2 Changes 272 Conclusion 272 HChapter 14: Loading the Cloud 275 Interacting with the Cloud 275 Incremental Loads to Azure SQL Database 276 Change Detection 276 New Rows (Only) 276 Building the Cloud Loader 277 Conclusion 280 xi
Chapter 15: Logging and Reporting Patterns 281 Package Logging and Reporting 281 Setting Up Package Logging 281 Reporting on Package Logging 282 Design Pattern: Package Executions 283 Catalog Logging and Reporting Setting Up Catalog Logging Catalog Tables Changing Logging Levels After the Fact 286 Design Patterns 287 Changing the Logging Level 287 Using the Existing Reports 289 283 283 285 Creating New Reports Summary 290 291 Chapter 16: Parent-Child Patterns 293 Master Package Pattern 293 Assign the Child Package 294 Configure Parameter Binding 295 Dynamic Child Package Pattern 296 Child-to-Parent Variable Pattern 302 Conclusion 303 Chapter 17: Configuration 305 Parameters 305 Configuring Your Package Using Parameters 307 Using the Parametrize Dialog 309 Creating Visual Studio Configurations 310 Specifying Entry-Point Packages 312 Connection Managers 313 xii
Parameter Configuration on the Server 313 Default Configuration 314 Server Environments 315 Default Parameter Values Using T-SQL 317 Package Execution Through the SSIS Catalog 317 Parameters with DTEXEC 320 Projects on the File System 320 Projects in the SSIS Catalog 321 Dynamic Configurations 322 Configuring from a Database Table 323 Setting Values Using a Script Task 326 Dynamic Package Executions 327 Conclusion 329 Chapter 18: Deployment 331 Project Deployment Model 331 SSIS Catalog 332 Deployment Methods 334 Deployment from the Command Line 335 Deployment Using Custom Code 336 Deployment Using PowerShell 337 Deployment Using SQL 338 Package Deployment Model 339 Conclusion 341 Chapter 19: Business Intelligence Markup Language 343 A Brief History of Business Intelligence Markup Language 343 Building Your First Biml File 344 Building a Basic Incremental Load SSIS Package 347 Creating Databases and Tables 347 Adding Metadata 349 xiii
Specifying a Data Flow Task 350 Adding Transforms 350 Testing the Biml 356 Using Biml as an SSIS Design Patterns Engine 360 Time for a Test 367 Conclusion 368 HChapter 20: Biml and SSIS Frameworks 369 Using Biml with an SSIS Framework 369 Adding SSIS Package Metadata to the Framework 369 Executing the Biml File 374 Generating the SSIS Command-Line 375 Summarizing 376 9Appendix A: Evolution of an SSIS Framework 377 Starting in the Middle 377 Introducing SSIS Applications 387 A Note About Relationships 389 Retrieving SSIS Applications in T-SQL 392 Retrieving SSIS Applications in SSIS 396 Monitoring Execution 399 Building Application Instance Logging 399 Building Package Instance Logging 406 Building Error Logging 410 Reporting Execution Metrics 420 Conclusion 434 Index 435 xiv