Business Intelligence 10. OLAP, KPI ETL December 2013.
Microsoft Analysis Services Poslovna inteligencija 2
Analysis Services freely available resources Analysis Services Tutorial: http://technet.microsoft.com/en-us/library/ms170208.aspx Visual Studio Solutions: http://msftdbprodsamples.codeplex.com/releases/view/55330 (2012) http://msftasprodsamples.codeplex.com/releases/view/34035 (2008) Movies: http://www.learnmicrosoftbi.com/ especially: SSAS 107 - Creating Dimensions SSAS 108 - Creating Hierarchies SSAS 109 - Attribute Relationships ZPR FER Zagreb Business Intelligence 2012/2013 3
Advanced modelling (not obligatory) Indirect dimensions Referenced dimensions (http://technet.microsoft.com/enus/library/ms175669(sql.90).aspx ) Many to many ( http://sqlcat.com/technicalnotes/archive/2008/02/11/analysisservices-should-you-use-many-to-many-dimensions.aspx ) Measures and measure groups: http://www.packtpub.com/article/measures-and-measure-groupsmicrosoft-analysis-services-part1 http://www.packtpub.com/article/measures-and-measure-groupsmicrosoft-analysis-services-part2 4
BIDS/SQL Server Data Tools New project In version 2008 VS Shell from VS2008 is used BI Dev Studio In version 2012 VS Shell from VS2010 is used SQL Server Data Tools New project Analysis Services Project Data Sources new data source Data Source Views new data source view 5
OLAP structures definition Define the connection, define the Data Source View Dimension wizard first: Define dimensions Define hierarchies, hide (AttributHierarchyVisible=false) hierarchy member attributes Define attribute relationships (Attribute Relationships Tab) instructs OLAP system to the attribure relationships (otherwisem every query wopuld have to be resolved to the bottom level). : Engances query performance Enhances data storage Reduced processig time the single most important thing you can do for performance defining attribute relationships improves dimension, partition, and query processing performance Then, Cube wizard ZPR FER Zagreb Business Intelligence 2012/2013 6
Hierarchies In dimensions, data are structured into hierarchies One dim can have N hierarchies Hierarchy can be one level deep E.g.: One level hierarchy (day of week) Three level hierarchy (year, month, day) Two types of hierarchies: 1. Natural, e.g.: year month day 2. The other kind (reporting, ), e.g.: Gender Marital status Income To the user navigating a hierarchy, these two types of user hierarchies are the same We could extract month as a separate hierarchy, what is the difference? (QuickHit#2 video) ZPR FER Zagreb Business Intelligence 2012/2013 7
Important dimension attributes Name (programmatic) attribute name, usually the same or similar as the one in the relational table KeyColumns key(s) that uniquely determine the attribute. Can be composite E.g. idstudent for studname Warning: for montname it is not (month) but (year, month) Name Column: that provides the name of the attribute that is displayed to users: If not assigned, KeyColumn is used If KeyColumn is composite(e.g. in montname) then it must be defined Usage: Regular ( common, non-key attribute) Key (key dimension attribute, each dim has exactly one) Parent (used in parent-child dimensions, e.g. (superior)orgunit) OrderBy, OrderByAttribute used for ordering, explained later AttributeHierarchyVisible defines wheteher attribute is visible outside its hierarchical position - as a separate "hierarchy" ZPR FER Zagreb Business Intelligence 2012/2013 8
Ordering (OrderBy and OrderByAttribute ) OrderBy OrderByAttribute Order by Key (NA) the defined key for this level: idemployee Name (default) (NA) the defined name for this level : EmployeeFullName Attribute Key Last Name the attribute key defined in the OrderByAttribute, that is Last Name (in this example it is the same attribute Last Name, but in general it can be some other int, ) Attribute Name Last Name the attribute key defined in the OrderByAttribute, that is Last Name (shown in the picture below) Po prezimenu ZPR FER Zagreb Business Intelligence 2012/2013 9
Attribute relationships(obligatory!) * Employee is type 2 SCD so the LastName (and full name) are fixec, that is, when they change a new record is created Relationship types: Rigid - unchangeable (better performance if we define the connection as rigid AS can keep previous aggregation when updating cubes) E.g.. date month year Denoted by a black arrow Flexible (default) changable in time E.g. student phone, address, Denoted by a gray arrow ZPR FER Zagreb Business Intelligence 2012/2013 10
Odnosi među atributima (obavezno definirati!) * Employee je tip2 pa je prezime (i puno ime) nepromijenjivo, odnosno kada se primijeni stvara se novi sifemployee Tipovi veza : Rigid - nepromijenjive (bolje performance ako definiramo vezu kao rigidnu AS može zadržati prethodne agregacije kod ažuriranja kocke) Npr. datum mjesec godina Označeno crnom strelicom Flexible (default) mijenjaju se vremenom Npr. student telefon, adresa, Označeno sivom strelicom 11
Member Properties Each member can have "properties" Typically, there are multiple attributes at one level - we show one and define others as properties Display depends on the client tool! (usually tooltip) 12
When processing error occurs 1. 2. Examine "View Details" (shown above - 1) Copy paste SELECT queries to the Management studio (and execute) to gain insight into the problem 13
Creating cubes Start wizard: Select table from Data Source View Select measures Cube dimensions - dimension instances within cubes: Inherited properties from shared dimensions Some properties can be locally overriden (within the cube): Examine dimension properties (from the cube structure tab) E.g. change dimension name Note that the wizard made three instance (aliases) of datedimensions for each of the three dates 14
Partitions Advantages (in large cubes) Query speedup (especially in 2005) Can significantly reduce processing times Sources can be different tables (or views) or one table fragmented by SQL statement Partitions must be disjunct (otherwise there can be duplicates!) Pay attention to the "special" records (unknown, etc.) Typically by time, e.g. Partition 1900-2000 Partition 2001-2010 Partition 2011 Guidelines: No more than 2000 partitions No more than 20M rows per partition For each partition, we can define different: Aggregations Storage(disk) Storage model ( ROLAP,MOLAP, HOLAP) 15
Aggregations For each dimension attribute: Default Aggregation wizards will use default rules: if this is a key attribute it will be considered for all aggregations; non-key attributes will be treated as unrestricted for aggregation consideration Full attribute considered for all aggregations (careful!) None attribute is NOT considered for any aggregation Unrestricted no restrictions are provided for aggregation wizards One-third rule, 30% rule : aggreate only if the number of rows in the aggregate table will be less than one third of the "base table". Aggregations can be limited by: Storage (disk) Until % performance gain Until I click Stop No aggregations (makes sense for < 500k) Guideline: Start with 20% Then: Usage Based Optimization More at: Best Practices Article http://technet.microsoft.com/en-gb/library/cc966399.aspx 16
1. Finally browse data: After the project is deployed and processed E.g. MS Excel: 2. 3. 17
Browse data 4. 5. 18
6 th Homework Define OLAP structure for your warehouse (Northwind): In every dimension, define one or more natural hierarchies (usually geographic) Be sure to define the parent-child hierarchy in the Employee dimension You don't have to define measures for which "AVG" is used (becasue that would require MDX calculated member) It is not necessary to define aggregations Define two partitions: records from last (current) year and others (older) Connect using Excel and solve one assignement from the 5th DZ (of your choice) Submit VS solution and xls 5 points Deadline: 13.1.2014 at 9:00 19
Key Performance Indicators KPI - Key Performance Indicator KSI - Key Success Indicator When an organization determines its own vision and goals, there is a need to measure its performance in approaching these goals Key Performance Indicators (KPI) are financial and non-financial measures or metrics used to help an organization define and evaluate how successful it is, typically in terms of making progress towards its long-term organizational goals. 20
KPI KPI = f(x 1, x 2,..., x n ) Very simple concept, but it is not trivial to define the function Although there is a large number of existings KPIs, each organization (and market) have their own characteristics Presented graphically in BI tools In a broader (and more complex) sense: Balanced Scorecard (Kaplan, Norton) 21
Similar concepts Car speed KRI = Key Result Indicator PI = Performance Indicator Transmission speed Revs / min Consumption Temperature 22
KPI features From extensive analysis* and from discussions with over 1,500 participants : 1. Nonfinancial measures (not expressed in dollars, yen, pounds, euros, etc.) 2. Measured frequently (e.g., daily or 24/7) 3. Acted on by the CEO and senior management team 4. Understanding of the measure and the corrective action required by all staff 5. Ties responsibility to the individual or team 6. Significant impact (e.g., affects most of the core critical success factors [CSFs] and more than one BSC perspective) 7. Positive impact (e.g., affects all other performance measures in a positive way) * David Parmenter: Key performance indicators: developing, implementing, and using winning KPIs, ISBN 0470095881, 9780470095881 23
KPI design recommendations SMART: o Specific : It has to be clear what the KPI exactly measures. There has to be one widelyaccepted definition of the KPI to make sure the different users interpret it the same way and, as a result, come to the same and right conclusions which they can act on. o Measurable: The KPI has to be measurable to define a standard, budget or norm, to make it possible to measure the actual value and to make the actual value comparable to the budgeted value. o Achievable: Every KPI has to be measurable to define a standard value for it. It is really important for the acceptance of KPI s and Peformance Management in general within the organization that this norm is achievable. Nothing is more discouraging than striving for a goal that you will never obtain. o Relevant: The KPI must give more insight in the performance of the organization in obtaining its strategy. If a KPI is not measuring a part of the strategy, acting on it doesn t affect the organizations performance. Therefore an irrelevant KPI is useless o Time phased: Every KPI only has a meaning if one knows the time dimension in which it is realized Less is more do not overdo it with the number of KPIs, e.g. each manager has five to eight KPI s to consider 24
Problems Can prove expensive or difficult to measure for some organizations, e.g. staff morale may be impossible to quantify with a number Once a KPI is created, it becomes difficult to change them as yearly comparisons can be lost Exclusively on in-house practices, it may be extremely difficult for an organization to use its KPIs to get comparisons with other similar organizations 25
Existing KPI databases There are free sources of KPIs, e.g. http://www.epmreview.com/kpi-library.html Books, e.g. The Big Book Of KPI's by Eric Peterson 26