Data Warehousing SQL Server 2008 R2, Denali
Delta Sport Delta Sport ekskluzivni je distributer kompanije Nike i vodeći je sportski maloprodajni lanac u regionu. Franšizni je partner holandskog modnog brenda Mexx. Delta Sport zastupnik je i kanadskog brenda Aldo i italijanske robne marke Yamamay. Sa jednim od vodećih svetskih lanaca kafeterija Costa Coffee potpisan je franšizni ugovor o širenju mreže na teritoriji Balkana i ex-jugoslavije. Branimir Momčilović, Tech Leader, Delta Sport Group Branimir.Momcilovic@DeltaSport.com http://bug.rs
Introduction Motivation & Goals Dimensional Modeling Facts, Dimensions Retail Sales, Inventory Tools Agenda Sql Server 2008 R2 and services Sql Server Codename Denali
Vocabulary Business Intelligence (BI) Data Warehouse (DW) Data mart (DM) Extract, Transform and Load (ETL)
Motivation & Goals One accurate measurement is worth more than a thousand expert opinions. Admiral Grace Hopper
Why do we need BI? We have mountains of data in this company, but we can t access it. We need to slice and dice the data every which way. You ve got to make it easy for business people to get at the data directly. Just show me what is important. It drives me crazy to have two people present the same business metrics at a meeting, but with different numbers. We want people to use information to support more fact-based decision making.
Goal DATA Words and number without relationsips INFORMATION Words and number without relationsips KNOWLEDGE Comprises inferences derived from information 90 town fell dog 20 cm Tuesday rain 10 min On Tuesday 20 cm of rain fell in 10 min. Rainfall of such magnitude is likely to couse flooding and landslides.
Transaction System vs OLAP Source of data Purpose of data What the data Inserts and Updates Queries Processing Speed Space Requirements Database Design Backup and Recovery OLTP (Operational System) Operational data; OLTPs are the original source of the data. To control and run fundamental business tasks Reveals a snapshot of ongoing business processes Short and fast inserts and updates initiated by end users Relatively standardized and simple queries Returning relatively few records Typically very fast Can be relatively small if historical data is archived Highly normalized with many tables Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability OLAP (Data Warehouse) Consolidation data; OLAP data comes from the various OLTP Databases To help with planning, problem solving, and decision support Multi-dimensional views of various kinds of business activities Periodic long-running batch jobs refresh the data Often complex queries involving aggregations Depends on the amount of data involved; batch data refreshes and complex queries may take many hours Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Typically de-normalized with fewer tables; use of star and/or snowflake schemas Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method source: www.rainmakerworks.com
Data Warehouse Components Source Data Base Retail Cash Register Replica ETL Facts PROCESS CUBES (UDM) Dynamics AX 2009 ETL Dimensions DATA EXCEL REPORTS Other Data ETL Other Data PROCESS DATA MINING
Dimensional Modeling Walking on water and developing software from a specification are easy if both are frozen. Edward V Berard
Dimensional Modeling Vocabulary Fact Table Dimension Tables Simplicity Symmetry Extensible Date Dimension Store Dimension PK Date Key Retail Sales Facts PK Store Key Date Attributes... PK Id Store Attributes... PK Time Dimension Time Key FK1 FK4 FK2 FK3 Date Key Time Key Store Key Product Key Facts... PK Product Dimension Product Key Time Attributes... Product Attributes...
Dimensions Dimension tables are the entry points into the fact table. Data warehouses always need an explicit date dimension table. It is not uncommon to represent multiple hierarchies in a dimensional table. You must avoid null keys in the fact table. Every join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. A very large number of dimensions typically is a sign that several dimensions are not completely independent and should be combined into a single dimension.
Slowly Changing Dimensions 1. Overwrite the Value 2. Add a Dimension Row 3. Add a Dimension Column 4. Hybrid Techniques...
Dimensional Modeling Myths Dimensional models and data marts are for summary data only. Dimensional models and data marts are departmental, not enterprise, solutions. Dimensional models and data marts are not scalable. Dimensional models and data marts are only appropriate when there is a predictable usage pattern.
Common Pitfalls to Avoid Tackle a galactic multiyear project rather than pursuing more manageable, while still compelling, iterative development efforts. Load only summarized data into the presentation area s dimensional structures. Presume that the business, its requirements and analytics, and the underlying data and the supporting technology are static. If the users haven t accepted the data warehouse as a foundation for improved decision making, then your efforts have been exercises in futility.
Four-Step Dimensional Design Process Select the business process to model Declare the grain of the business process Choose the dimensions that apply to each fact table row Identify the numeric facts that will populate each fact table row
Retail Sales The first dimensional model build should be the one with most impact it should answer the most pressing business questions and be readily accessible for data extraction. Preferably you should develop dimensional models for the most atomic information captured by a business process. A careful grain statement determines the primary dimensionality of the fact table. Percentages and ratios, such as gross margin, are nonadditive. The numerator and denominator should be stored in the fact table.
Retail Dimensions Date Time Item Size Season Company Location Transaction Type Customer Vendor
Inventory Inventory Periodic Snapshot Inventory Transactions Inventory Accumulating Snapshot
Bus Architecture Purchase Orders Store Inventory Store Sales Item Date Vendor Promotion Store
DELL s Inventory Turnover Year Inventory Turnover Week's Inventory 1992 4.79 10.856 1993 5.16 10.078 1994 9.4 5.532 1995 9.8 5.306 1996 24.2 2.149 1997 41.7 1.247 1998 52.4 0.992 1999 52.40 0.992 2000 51.4 1.012 2001 63.5 0.819 source: http://www.themanufacturer.com
Tools I think it's fair to say that personal computers have become the most empowering tool we've ever created. They're tools of communication, they're tools of creativity, and they can be shaped by their user. Bill Gates
Microsoft Sql Server Database Engine Replication Services Integration Services Analysis Services Reporting Services
Unified Dimensional Model Items Aldo Costa Coffee Nike Dates 2010 Q1 Jan Feb Mar 25 6 2 3 1 Measures
Sql 2008 R2 Data compression Backup compression Star join query optimizations Partitioned table parallelism Change data capture MERGE SQL statements Scalable Integration Services Resource management Grouping sets
Excel
SQL Server 11 Denali Project codename Juneau, a single development environment for developing database, business intelligence (BI) and web solutions A new Business Intelligence Semantic Model (BISM) in Analysis Services Project codename Apollo, new column-store database technology aiming to provide greater query performance Project codename presentation solution Crescent, data visualization and SQL Server Data Quality Services (based on technology from Microsoft s 2008 Zoomix acquisition) SQL Server AlwaysOn Other data integration and management tools
Juneau A single development environment for all DBrelated project types including bringing BIDS and SSMS into the same IDE. It uses the new WPFbased shell.
C2 C1 Apollo Column-store indexes Significantly boost query performance, by up to 100x for star join and similar queries Row store (heap or B-tree) rows C1 C2... Column store pages
Integration Services in Denali Release of a new deployment model Object Impact and Data Lineage Analysis Usability Enhancements Reduced Memory Usage by the Merge and Merge Join Transformations SSIS : New Data Correction Component
Denali CTP Is Coming Soon The next CTP for SQL Server Code Name "Denali" is coming soon. Sign up to be notified of the next CTP release. http://www.sqlserverlaunch.com/
Resources SQL Server Denali Resource Center http://msdn.microsoft.com/enus/sqlserver/denali_resource_center.aspx Microsoft Business Intelligence http://www.microsoft.com/bi/ Ralph Kimbal http://www.ralphkimball.com/ Bill Inmon http://inmoninstitute.com/
Q&A A prudent question is one-half of wisdom. Francis Bacon
Session Evaluations Tell us what you think
Intelligence is quickness in seeing things as they are. George Santayana 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.