Riadh Ben Messaoud
1. The Big Picture 2. Data Warehouse Philosophy 3. Data Warehouse Concepts 4. Warehousing Applications 5. Warehouse Schema Design 6. Business Intelligence Reporting 7. On-Line Analytical Processing 8. OLAP Applications 9. Data Warehouse Implementation 10. Warehousing Software Data Warehouses & OLAP 2
1. The Big Picture 2. Data Warehouse Philosophy 3. Data Warehouse Concepts 4. Warehousing Applications 5. Warehouse Schema Design 6. Business Intelligence Reporting 7. On-Line Analytical Processing 8. OLAP Applications 9. Data Warehouse Implementation 10. Warehousing Software Data Warehouses & OLAP 3
What is OLAP? On-Line Analytical Processing is not a definition It gives no help in deciding if a product is an OLAP tool or not! Since late 1994, many vendors claim to have OLAP compliant products It is not possible to rely on the vendors own description Membership of the OLAP council is not a good indicator Data Warehouses & OLAP 4
What is OLAP? Researchers were forced to create their own definition It had to be simple, memorable and product-independent The FASMI test is one of the most converging definition efforts for detecting OLAP compliance It defines the characteristics of an OLAP application in a specific way FASMI Data Warehouses & OLAP 5
FAST The system is targeted to deliver responses to users within about 5 seconds Simplest analysis ~ no more than 1 second Most complicated analysis ~ no more than 20 seconds End-users assume that a process has failed if results are not received within 30 seconds Unless the system warns that the report will take longer time, the user will hit Alt+Ctrl+Delete Data Warehouses & OLAP 6
FAST The OLAP response speed is not easy to achieve especially when on-the-fly and ad hoc calculations are required Vendors resort to many techniques to achieve this goal: Specialized forms of data storage, Extensive pre-calculations, Specific hardware requirements. Data Warehouses & OLAP 7
FAST None of the existent products is fully optimized an area of developing technology The full pre-calculation approach fails with large and sparse data Doing everything on-the-fly is much too slow with large data According to surveys, slow query response is consistently the most often-cited technical problem with OLAP product Data Warehouses & OLAP 8
ANALYSIS The system can cope with any business logic and statistical analysis relevant for the application and the user In some OLAP product some preprogramming may be needed Without having to program, it is necessary to allow the user to define new ad hoc calculations Data Warehouses & OLAP 9
ANALYSIS Analysis could include specific features like: Time series analysis, Cost allocations, Currency translation, Goal seeking, Ad hoc multidimensional structural changes, Non-procedural modeling, Exception alerting, Data mining. These capabilities differ between products, depending on their target markets Data Warehouses & OLAP 10
SHARED The system implements all the security requirements for confidentiality If multiple write access is needed, concurrent update locking at an appropriate level should be implemented The system should be able to handle multiple updates in a timely and secure manner This is a major area of weakness in many OLAP products assuming that OLAP applications will be read-only Data Warehouses & OLAP 11
MULTIDIMENSIONAL Is the key requirement for all OLAP applications The system must provide a multidimensional conceptual view including: Full support for hierarchies Multiple hierarchies This is the most logical way to analyze businesses and organizations Data Warehouses & OLAP 12
INFORMATION Is all of the data and derived information needed, wherever it is and however much is relevant for the application The capacity if handling data differ between OLAP products The largest OLAP products can hold at least a thousand times as much as the smallest Many considerations must be taking: Data duplication, RAM required, disk space utilization, performance, integration with DWs Data Warehouses & OLAP 13
The FASMI test is a reasonable and understandable definition of the goals OLAP is meant to achieve Researches encourage users and vendors to adopt this definition, which we hope will avoid the controversies of previous attempts Data Warehouses & OLAP 14
The Codd rules In 1993, Codd et al. published a white paper Providing OLAP to User-Analysts: An IT Mandate Codd was very well known as a respected database researcher from the 1960s till the late 1980s He is credited with being the inventor of the relational database model in 1969 Unfortunately, his OLAP rules proved to be controversial due to being vendor- Data Warehouses & OLAP 15
The Codd rules The OLAP white paper included 12 rules, which are now well known They were followed by another 6 rules in 1995 Codd restructured the rules into four groups, calling them features Basic Features Special Features Reporting Features Dimension Control Data Warehouses & OLAP 16
The Codd rules Basic Features 1. Multidimensional Conceptual View Few would argue with this feature Codd believes this to be the central core of OLAP Codd included slice and dice as part of this requirement Data Warehouses & OLAP 17
The Codd rules Basic Features 2. Intuitive Data Manipulation Data manipulation through direct actions on cells in the view Without recourse to menus or multiple actions, we assume that this is by using a mouse Many products fail on this, because they do not necessarily support double clicking or drag and drop Data Warehouses & OLAP 18
The Codd rules Basic Features 3. Accessibility: OLAP as a Mediator OLAP engines are considered as middleware, sitting between heterogeneous data sources and an OLAP front-end Most products can achieve this, but often with more data staging and batching than vendors like to admit Data Warehouses & OLAP 19
The Codd rules Basic Features 4. Batch Extraction vs Interpretive This rule effectively required that products offer both their own staging database for OLAP data as well as offering live access to external data Only a minority of OLAP products properly comply with it Data Warehouses & OLAP 20
The Codd rules Basic Features 5. OLAP Analysis Models Codd required that OLAP products should support all four analysis models : Categorical: parameterized static reporting ~ All OLAP tools Exegetical: slicing and dicing with drill down ~ All OLAP tools Contemplative: «what if?» analysis ~ Most OLAP tools Formulaic: goal seeking models ~ Very few OLAP tools Data Warehouses & OLAP 21
The Codd rules Basic Features 6. Client/Server Architecture The OLAP server component of an OLAP product should be sufficiently intelligent that various clients could be attached with minimum effort and programming for integration Relatively few OLAP products are qualified for this test A very tough test What the Web would deliver on this issue? Data Warehouses & OLAP 22
The Codd rules Basic Features 7. Transparency This test, dealing with openness, is also a tough but valid one A spreadsheet user should be able to get full values from an OLAP engine and not even be aware of where the data comes from OLAP products must allow live access to heterogeneous data sources from a full function spreadsheet add-in, with the OLAP server engine in between A very few products that do fully comply with Data Warehouses & OLAP 23
The Codd rules Basic Features 8. Multi-User Support OLAP tools must provide concurrent access (retrieval and update), integrity and security Many OLAP applications are still read-only However, almost all vendors claim compliance!!! Data Warehouses & OLAP 24
The Codd rules Special Features 9. Treatment of Non-Normalized Data Refers to the integration between an OLAP engine and denormalized source data Any data updates performed in the OLAP environment should not be allowed to alter stored denormalized data in feeder systems Data changes should not be allowed in what are normally regarded as calculated cells within the OLAP database Data Warehouses & OLAP 25
The Codd rules Special Features 10. Storing OLAP Results: Keeping them Separate from Source Data This is really an implementation rather than a product issue But few would disagree with it Read-write OLAP applications should not be implemented directly on live transaction data OLAP data changes should be kept distinct from transaction data The method of data write-back used in Microsoft Data Warehouses & OLAP 26
The Codd rules Special Features 11. Extraction of Missing Values All missing values are cast in the uniform representation defined by the Relational Model Missing values are to be distinguished from zero values A few OLAP tools do break this rule Data Warehouses & OLAP 27
The Codd rules Special Features 12. Treatment of Missing Values All missing values are to be ignored by the OLAP analyzer regardless of their source This is an almost inevitable consequence of how multidimensional engines treat all data Data Warehouses & OLAP 28
The Codd rules Reporting Features 13. Flexible Reporting The dimensions can be laid out in any way that the user requires in reports Most products are capable of this in their formal report writers It is preferable that analysis and reporting facilities be combined in one module Data Warehouses & OLAP 29
The Codd rules Reporting Features 14. Uniform Reporting Performance Reporting performance be not significantly degraded by increasing the number of dimensions or database size There are differences between products The principal factor that affects performance is the degree to which the calculations are performed in advance and where live calculations are done Data Warehouses & OLAP 30
The Codd rules Reporting Features 15. Automatic Adjustment of Physical Level OLAP system must adjust its physical schema automatically to adapt to the type of model, data volumes and sparsity Most vendors fall far short of this noble ideal Since 1996, users can benefit from it in Microsoft Analysis Services Data Warehouses & OLAP 31
The Codd rules Dimension Control 16. Generic Dimensionality Each dimension must be equivalent in both its structure and operational capabilities This has proven to be one of the most controversial Codd s rules With a strictly purist interpretation, few products fully comply If you are buying a product for a specific application, you may safely ignore the rule Data Warehouses & OLAP 32
The Codd rules Dimension Control 17. Unlimited Dimensions & Aggregation Levels Technically, no product can possibly comply with this feature There is no such thing as an unlimited entity on a limited computer Few applications need more than about eight or ten dimensions Few hierarchies have more than about six consolidation levels Data Warehouses & OLAP 33
The Codd rules Dimension Control 18. Unrestricted Cross-dimensional Operations All forms of calculation must be allowed across all dimensions, not just the measures dimension Many products which use only relational storage are weak in this area These types of calculations are important if you are doing complex calculations Data Warehouses & OLAP 34
OLAP Milestones Data Warehouses & OLAP 35
OLAP Milestones Data Warehouses & OLAP 36
OLAP Milestones Data Warehouses & OLAP 37
OLAP Milestones Data Warehouses & OLAP 38
OLAP Milestones Data Warehouses & OLAP 39
OLAP Milestones Data Warehouses & OLAP 40
OLAP Milestones Data Warehouses & OLAP 41
OLAP Milestones Data Warehouses & OLAP 42
OLAP Milestones Data Warehouses & OLAP 43
OLAP Milestones Data Warehouses & OLAP 44
OLAP Milestones Data Warehouses & OLAP 45
OLAP Milestones Data Warehouses & OLAP 46
OLAP Milestones Data Warehouses & OLAP 47
OLAP Milestones Data Warehouses & OLAP 48