David Dye. Extract, Transform, Load



Similar documents
W I S E. SQL Server 2008/2008 R2 Advanced DBA Performance & WISE LTD.

Developing Microsoft SQL Server Databases 20464C; 5 Days

DBMS / Business Intelligence, SQL Server

Developing Microsoft SQL Server Databases (20464) H8N64S

20464C: Developing Microsoft SQL Server Databases

Course 20464: Developing Microsoft SQL Server Databases

AV-005: Administering and Implementing a Data Warehouse with SQL Server 2014

Developing Microsoft SQL Server Databases

Microsoft SQL Database Administrator Certification

How To Improve Performance In A Database

LearnFromGuru Polish your knowledge


Instant SQL Programming

Oracle 10g PL/SQL Training

Implementing and Maintaining Microsoft SQL Server 2008 Integration Services

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

MS SQL Performance (Tuning) Best Practices:

SQL SERVER DEVELOPER Available Features and Tools New Capabilities SQL Services Product Licensing Product Editions Will teach in class room

MOC 20462C: Administering Microsoft SQL Server Databases

Developing Microsoft SQL Server Databases MOC 20464

Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3

Optimizing Performance. Training Division New Delhi

Oracle Database 12c: Introduction to SQL Ed 1.1

Oracle Database 10g: Introduction to SQL

BCA. Database Management System

news from Tom Bacon about Monday's lecture

Oracle EXAM - 1Z Oracle Database 11g Release 2: SQL Tuning. Buy Full Product.

1 File Processing Systems

1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Course 20464C: Developing Microsoft SQL Server Databases

Backups and Maintenance

Testing of the data access layer and the database itself

MS Designing and Optimizing Database Solutions with Microsoft SQL Server 2008

Database Administration with MySQL

MySQL Storage Engines

ATTACHMENT 6 SQL Server 2012 Programming Standards

Outline. MCSE: Data Platform. Course Content. Course 10776C: MCSA: Developing Microsoft SQL Server 2012 Databases 5 Days

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

FHE DEFINITIVE GUIDE. ^phihri^^lv JEFFREY GARBUS. Joe Celko. Alvin Chang. PLAMEN ratchev JONES & BARTLETT LEARN IN G. y ti rvrrtuttnrr i t i r

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

NUTECH COMPUTER TRAINING INSTITUTE 1682 E. GUDE DRIVE #102, ROCKVILLE, MD WEB: TEL:

SQL Server 2008 Designing, Optimizing, and Maintaining a Database Session 1

Oracle Database: Introduction to SQL

ETL Overview. Extract, Transform, Load (ETL) Refreshment Workflow. The ETL Process. General ETL issues. MS Integration Services

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

Oracle Database: Introduction to SQL

Mind Q Systems Private Limited

Whitepaper: performance of SqlBulkCopy

SQL Server 2012 Database Administration With AlwaysOn & Clustering Techniques

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

Herve Roggero 3/3/2015

5. CHANGING STRUCTURE AND DATA

SQL Server Database Administrator s Guide

1.264 Lecture 15. SQL transactions, security, indexes

If a database is using the Simple Recovery Model, only full and differential backups of the database can be taken.

6231B: Maintaining a Microsoft SQL Server 2008 R2 Database

Geodatabase Programming with SQL

Beginning SQL Server Administration. Apress. Rob Walters Grant Fritchey

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

Design and Implementation

Module 1: Getting Started with Databases and Transact-SQL in SQL Server 2008

PERFORMANCE TIPS FOR BATCH JOBS

MySQL for Beginners Ed 3


Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

A basic create statement for a simple student table would look like the following.

Oracle Database: SQL and PL/SQL Fundamentals

Backing Up and Restoring the SQL Server 2005 Environment

DATABASE DESIGN AND IMPLEMENTATION II SAULT COLLEGE OF APPLIED ARTS AND TECHNOLOGY SAULT STE. MARIE, ONTARIO. Sault College

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL

Oracle Database 11 g Performance Tuning. Recipes. Sam R. Alapati Darl Kuhn Bill Padfield. Apress*

Oracle Warehouse Builder 10g

Oracle Database: Introduction to SQL

Implementing a Data Warehouse with Microsoft SQL Server 2012

Oracle Database 11g SQL

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Administering a Microsoft SQL Server 2000 Database

Oracle SQL. Course Summary. Duration. Objectives

Demystified CONTENTS Acknowledgments xvii Introduction xix CHAPTER 1 Database Fundamentals CHAPTER 2 Exploring Relational Database Components

Before attending this course, participants should have:

Performance Management of SQL Server

1. INTRODUCTION TO RDBMS

Querying Microsoft SQL Server (20461) H8N61S

The first time through running an Ad Hoc query or Stored Procedure, SQL Server will go through each of the following steps.

Implementing a Data Warehouse with Microsoft SQL Server

Oracle Database: SQL and PL/SQL Fundamentals NEW

SQL Server 2016 New Features!

Using SAS as a Relational Database

Course 20461C: Querying Microsoft SQL Server Duration: 35 hours

Oracle Database 10g Express

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Database Setup. Coding, Understanding, & Executing the SQL Database Creation Script

Innovative technology for big data analytics

Transcription:

David Dye Extract, Transform, Load

Extract, Transform, Load Overview SQL Tools Load Considerations

Introduction David Dye derekman1@msn.com HTTP://WWW.SQLSAFETY.COM

Overview

ETL Overview Extract Define the source of the data Can be SQL Oracle Excel XML Text Transform To insure the data is accurate and consistent Apply business logic Aggregate Load Define the destination Load the transformed data

SQL Tools

SQL Tools T-SQL MERGE Bulk copy program (bcp) BULK INSERT OPENROWSET(BULK)

T-SQL Data Manipulation Language DML Can be used to INSERT UPDATE DELETE T-SQL code can be compartmentalized using Views Stored procedures Functions

MERGE Included in all editions of T-SQL beginning in SQL 2008 T-SQL statement used to INSERT UPDATE DELETE All within a single statement Syntax MERGE Targettablename that will be inserted, updated, or deleted into USING source table that is joined back to the target WHEN MATCHED specifies what transaction should be done when the target and source predicates are met WHEN NOT MATCHED specifies what transaction should be done with the target and source predicates are not met

When to Use MERGE Since the MERGE statements treats all INSERT(S), UPDATE(S), and DELETE(S) as a single transaction this is often more efficient This is a general statement Validate that MERGE is more efficient than using separate INSERT, UPDATE, and DELETE statements

BCP bcp Utility Command line tool to import or export data to and from SQL Import can be done from a user specified file format bcp does not include any information about the data Table structure Data types Constraints A format file is used to hold this meta data Optionally supports a format file Can be used to ease importing or exporting -f switch specifies that a format file is used Bcp can optionally create a format file When used with in or out bcp requires an existing format file

BULK INSERT BULK INSERT Options similar to bcp, but implemented as T-SQL Runs in the SQL Server process Can be executed within Stored procedures User defined transactions Supports CHECK_CONSTRAINTS FIRE_TRIGGERS

OPENROWSET Function OPENROWSET Allows access to remote data sources using OLEDB provider Disabled by default Offers a bulk provider for imports from files Implemented as T-SQL used in the FROM clause Supports special tables hints

Demonstration BCP BULK INSERT OPENROWSET

Load Considerations

Check constraints Check Constraints Business logic can be incorporated in the transformation to insure constraint logic Constraints can be disabled during load and reenabled after Default behavior is that existing values will not be validated once re-enabled To validate existing values use WITH CHECK CHECK Once enabled, regardless of WITH CHECK CHECK, the constraint will insure incoming validation

Foreign Keys Foreign keys Like check constraints referential integrity can be verified during the transformation Foreign keys can be disabled during the load and re-enabled after the load Like check constraints once re-enabled existing values will not be validated Requires using WITH CHECK CHECK Once enabled all incoming values will be validated for referential integrity

Primary keys Primary Keys Primary keys can be disabled To re-enable it requires rebuilding the index During the index rebuild ALL values will be validated Disabling a primary key will disable all foreign keys that reference the primary key

Unique constraints Unique Constraint Unique constraints can be disabled To re-enable it requires rebuilding the index During the index rebuild ALL values will be validated Disabling a unique constraint will disable all foreign keys that reference the primary key

Indexes Both clustered and non-clustered indexes are transactionally based As rows are inserted, updated, and deleted the index(es) must be updated if the key column(s) are affected Disabling the index(es) can speed up load and reduce logging Enabling indexes requires rebuilding the index Default behavior is the index will be unavailable while being rebuilt Enterprise edition can be done online Uses the tempdb Once disabled the index(es) will no be available This can obviously dramatically affect query performance Often offset by the resources saved with the indexes disabled

Indexes and Constraints Constraints Foreign keys and check constraints are both constraints By default an index is not created MUST USE WITH CHECK CHECK to validate existing values meet constraint Indexes Primary key Enforces uniqueness for all values Does not accept any NULL values Unique constraint/index SAME THING Implemented as an index Require all values must be unique Will allow a single NULL value ANSI allows multiple NULL values

Locking Locking occurs automatically in SQL to insure the ACID properties of a database A Atomicity Each transaction is all or nothing. If one part of the transaction fails the entire transaction fails C Consistency Any transaction will bring the database from one valid state to another I Isolation Ensures concurrent execution of transactions results in a system state that would be obtained if transactions were executed one after the other D Durability Once a transaction is committed it will remain committed regardless

Minimizing Locking SQL works with locks using lock escalation Lower level locks are generated, page index range etc. Every lock requires resources but lower level locks increase concurrency Lock escalation trades many fine grain locks to fewer coarser grain locks This reduces the resources required, but reduces concurrency Ex. Trading many page locks for a table lock The import process could use a TABLOCK which will increase the load Although the load will be faster there is reduced concurrency

Minimizing Logging Logging can be minimized by changing the recovery model Reduced logging will speed the load process as well as reduce the disk IO during the load BULK_LOGGED recovery model provides minimal logging for bulk transactions SIMPLE recovery model will insure that the transaction log is truncated after checkpoints occur Changing the recovery model to simple will prevent the ability to restore to a point in time and bulk logged recovery can prevent restoring bulk transactions For a data warehouse you quite often can completely reload the database with the existing ETL solution Can be considered, but prohibitive for VLDB

Demonstration Check constraints Foreign keys Disabling and re-enabling check constraints and foreign keys Unique constraints(indexes) Primary keys