Trivadis White Paper. Comparison of Data Modeling Methods for a Core Data Warehouse. Dani Schnider Adriano Martino Maren Eschermann
|
|
|
- Dinah Sims
- 10 years ago
- Views:
Transcription
1 Trivadis White Paper Comparison of Data Modeling Methods for a Core Data Warehouse Dani Schnider Adriano Martino Maren Eschermann June 2014
2 Table of Contents 1. Introduction Aspects of Data Warehouse Core Modeling Modeling Strategy Data Integration and ETL Historization Lifecycle Management Data Delivery to Data Marts Case Study CRM Source System Claims Source System Business Requirements Change Scenario Dimensional Core Modeling Modeling Strategy Data Integration and ETL Historization Lifecycle Management Data Delivery to Data Marts Relational Core Modeling Modeling Strategy Data Integration and ETL Historization Lifecycle Management Data Delivery to Data Marts Data Vault Modeling Modeling Strategy Data Integration and ETL Historization Lifecycle Management Data Delivery to Data Marts Conclusion Comparison Matrix Recommendations Sources and Links [1] Lawrence Corr: Agile Data Warehouse Design Collaborative Dimensional Modeling from Whiteboard to Star Schema, DecisionOne Press, 2011, ISBN [2] Ralph Kimball, Margy Ross: The Data Warehouse Toolkit, Second Edition, Wiley & Sons, Inc., 2002, ISBN , [3] Symmetry Corporation: Getting Started with ADAPT, OLAP Database Design, , [4] C. Jordan, D. Schnider, J. Wehner, P. Welker: Data Warehousing mit Oracle Business Intelligence in der Praxis, Hanser Verlag, 2011, ISBN , p [5] Hans Hultgren: Modeling the Agile Data Warehouse with Data Vault, Brighton Hamilton, 2012, ISBN , [6] Data Vault Challenge Workshop, Trivadis Blog, October 2013, [email protected]. Info-Tel Date Page 2 / 21
3 This white paper describes some common data modeling methods used for the Core data model in Data Warehouses. The purpose of this document is to give an overview of the different approaches and to compare their benefits and issues. The comparison and recommendations at the end of the document should help to decide what method is appropriate for the requirements of a particular Data Warehouse project. 1. Introduction In a Data Warehouse it is recommended to implement a central Core layer. Its purpose is the integration of data from multiple source systems and the historization of the data. The Core is the only source for the Data Marts that are used for BI applications such as reporting platforms or OLAP tools. The Data Marts are usually designed as a dimensional data model with dimensions and facts. This model is either implemented as a star schema or as a multidimensional cube. But what kind of data model is appropriate for the Core layer? There is no best approach that suits for all Data Warehouses, but different modeling techniques are commonly used. Each method described in this paper has its advantages depending on the requirements and the modeling strategy. Data Warehouse Source Systems Staging Area Cleansing Area Core Data Marts BI Platform Metadata The following chapters compare some of the common used Core data modeling methods: Dimensional Core Modeling with dimensions and fact tables Relational Core Modeling with master data versioning Data Vault Modeling with Hubs, Satellites and Links Of course, additional methods or combinations of the different approaches are possible, too. We explain only the approaches that we use in customer projects and where we have practical experience. The data models for other layers of a Data Warehouse (e.g. Staging Area, Cleansing Area and Data Marts) are not in scope of this white paper. [email protected]. Info-Tel Date Page 3 / 21
4 2. Aspects of Data Warehouse Core Modeling The following aspects are important for each Core modeling approach. When comparing the different methods, we will focus on these topics and decide for each method how easy or complex it is to support these requirements. 2.1 Modeling Strategy The Core data model can either be derived from the known analytical requirements (which sometimes are very specific requirements for reports) or from the data models of the source systems. These two strategies are known as Reporting-Driven and Data-Driven Analysis (see reference [1]). In an ideal world the Core layer design is neither driven by specific source systems nor by currently known report requirements but by the definition of your business and your overall analytical requirements. With a pure data-driven strategy, the Core model must always be changed whenever the source system changes. With a reporting-driven strategy, you will not be able to fulfill new requirements without changing the Core data model. The data model of the Core should ideally reflect the business processes in the best possible way. This is often a combination of the two strategies. For a stable and flexible Core data model, the current source systems as well as the known analytical requirements should be considered when designing the Core data model. 2.2 Data Integration and ETL Data that is delivered from one or more source systems to the Data Warehouse must be integrated before it can be stored in the Core. Data integration is usually performed in ETL processes in the Staging and Cleansing Area to transform the data into a form that can be loaded into the Core data model. The transformations for data cleansing and harmonization can be very complex, but are mainly independent from the Core modeling approach. They are more related to the OLTP data models and the data quality of the source data. Therefore, these tasks are not in scope of this document. However, the complexity of data integration from different source systems into a single Core model and the conversion steps between OLTP data models and Core target data model primarily depend on the modeling approach. This is one of the criteria to be considered when comparing the different modeling methods. 2.3 Historization Most analytical requirements need historic data. Comparison of current data with data of the previous year, quarter or month are quite common to detect trends and necessary for planning and predictions. Additional requirements for storing historical data in the Core are traceability of data changes, repeatability of user queries, or point in time views of historical data. The impact of the requirements above is that data in the Core must be stored in a way that supports change tracking and data versioning. Existing data must not be deleted or overwritten, but new rows have to be created for changes of existing source records. The behavior can be different for master data (dimensions) and transactional data (facts). Depending on the modeling approach, this can be implemented in different ways. [email protected]. Info-Tel Date Page 4 / 21
5 2.4 Lifecycle Management New analytical requirements and changes in the source systems result in adapting the Data Warehouse accordingly. The robustness against changes in the analytical layer and/or source systems is a crucial requirement, especially when business requires agility. For volatile layers (Staging Area, Cleansing Area) it is easy to drop and recreate the changed tables. Even for Data Marts it is possible to rebuild them from scratch as long as the required data is available in the Core. But for the Core data model, the handling of change requests is a potential issue. New tables or attributes must be included in the existing data model, and the historical data has to be migrated or enhanced because of the changes. All these situations as well as combinations of them occur in every Data Warehouse project on a regular base. Some change requests are just extensions of the data model and therefore easy to implement, other changes require a redesign and migration of the existing structures and are therefore more complex. The choice of the Core modeling approach can have an impact on how easy or complex it is to handle structure changes in a Data Warehouse. 2.5 Data Delivery to Data Marts All persistent Data Marts of a Data Warehouse are loaded from the central Core. That implies that the Core must contain all relevant information for the dimensions and fact tables of the Data Marts, including historical data that is required by the users. If the Data Marts are implemented just as a view layer on the Core ( Virtual Data Marts ), this precondition is obvious. In an ideal Data Warehouse it is always possible to rebuild a Data Mart from scratch, or to create a complete new Data Mart easily. The Core data model must be flexible enough to provide initial and incremental loads of dimension tables (SCD1 and SCD2) as well as fact tables of a dimensional Data Mart. This is possible if the Core is able to store a full history of all source data changes (see historization requirements above). Depending on the approach for historization, the loading steps for dimensions and facts can be more or less complex. If a Data Warehouse architecture has a persistent Data Mart layer, load processes are required to load data from the Core layer to a Data Mart. The complexity of these load processes depend on the modeling approach of your core layer, but also on some other design decisions. [email protected]. Info-Tel Date Page 5 / 21
6 3. Case Study To compare the different modeling approaches in this white paper, we use a fictitious case study adapted from a real example of an insurance company. The Data Warehouse contains data from two source systems and delivers information to a dimensional Data Mart based on known business requirements. During the project lifecycle, one of the source systems is changed in a way that affects the Core data model of the Data Warehouse. The OLTP data models of the source systems and the business requirements are described in the following sections. 3.1 CRM Source System One of the source systems for the Data Warehouse is a Customer Relationship Management (CRM) system. It contains information about the customers and their contracts with the insurance company. Each customer can have many contracts, but each contract is defined for exactly one customer. A contract can contain one or more products. For each product, a contract item is created in the CRM system. There are three possible events in the CRM system for a contract item: acquisition, renewal and termination of a contract item. The data model of the CRM system contains four tables and is displayed in the following entityrelationship diagram: 3.2 Claims Source System The second source system is the Claims system that manages incidents and claims of the insurance company. A claim is the declaration to the insurance. An incident is the reason for a claim. One claim can have only one incident. Services are provided under the coverage of the insurance (lawyer, towing of a car, car repair, ). Each claim can generate one or many service(s). The entity-relationship diagram of the Claims system has the following structure: [email protected]. Info-Tel Date Page 6 / 21
7 3.3 Business Requirements Several user groups will work with Business Intelligence applications on a Data Mart. Each group of people has its own requirements: Employees controlling claims want to have reports giving statistics about the claims and incidents. The need to analyze cost of coverage and number of claims o By place of incident o By type of incident o By time o By product Managers want to have a global view. They need to analyze the cost of coverage and the generated income o By client (country, city,age, gender) o By product o By time Marketing personal need to analyze the number of acquisitions, renewals and contract terminations as well as the number of contracts o By client (country, city,age, gender) o By product o By time 3.4 Change Scenario Source system changes are typical when maintaining a Data Warehouse, which cannot be robust against all kind of changes. In our case study, the Claims system is subject to change. since the underlying business process has been changed. The cardinality of the relation between claim and incident has been adapted, i.e. one claim can contain multiple incidents. An impact of this process change is that services are now related to incidents, not to claims anymore. For an incident, one or more services can be attached. [email protected]. Info-Tel Date Page 7 / 21
8 4. Dimensional Core Modeling Dimensional modeling is a set of techniques and concepts used in Data Warehouse design proposed by Ralph Kimball (see [2]) which is oriented around understandability and performance and which uses the concepts of facts (measures), and dimensions (context). Most reporting tools and frameworks require a dimensional model, business users intuitively understand a dimensional model and are able to formulate queries. A dimensional model is a conceptual model rather than a logical/physical data model, since it can be used for any physical form, e.g. as relational or multidimensional database. If a dimensional model is implemented in a relational database, the most commonly used schema types are star and snowflake schema. The latter one normalizes dimension tables, in a star schema the dimension tables are denormalized. The transition from a dimensional to a logical/physical data model is straight forward and should follow well defined guidelines. To make the different model approaches of this white paper comparable, the dimensional model is presented as star schema. Nevertheless, it is highly recommended to use a dimensional model for discussions with business users, documented by using a common notation like the ADAPT Notation (see [3]). 4.1 Modeling Strategy A dimensional model is driven by known and well defined analytical needs. The following star schema (in the following referenced as star schema 1) fulfills all report requirements of the case study: The star schema has less information than the original source data model. It has no information about a single claim or contract, nor does it contain any information about services, since the required reports do not need this information. When designing a star schema it is a valid approach to simplify, reduce information and precalculate certain measures, e.g. counting business entities, to derive a model, which is easy to understand, which answers the given analytical questions and which is optimized regarding performance. The latter is especially important when your Data Mart layer is only a thin view layer or if your warehouse architecture has no Data Mart layer at all. [email protected]. Info-Tel Date Page 8 / 21
9 The drawback is that new, unconsidered analytical questions cannot be answered: The model does not contain any information about the service type. Analyzing costs per service type would require sourcing additional tables and a modification of the model. The model is not able to answer questions like what is the average number of products in a contract, since the star schema does not contain information about single contracts. The same is true for claims. The model is not able to answer questions what is the average price of a product, since the star schema does not contain information about the price for a single product. Since interesting analytical discoveries often result in even more and new analytical questions, the above star schema although fulfilling the given analytical needs might turn out to be too restrictive in the long run. Therefore, the source system should be considered as well, if it is conceivable that additional requirements will have to be implemented. This model (in the following referenced as star schema 2) is able to answer a lot more analytical questions, but still has limitations and drawbacks compared with star schema 1. Higher degree of complexity: additional dimension tables, more complex fact tables Larger fact tables due to higher granularity Querying the number of contracts/claims requires a Count distinct operation The relation between contract and product and between claim, service and incident is implicitly contained in the fact table. Questions like how often does a price change during the last three years can be answered, but the model is far from optimal for this kind of analytics due to the query complexity. [email protected]. Info-Tel Date Page 9 / 21
10 Designing a dimensional model always is a very difficult tradeoff between simplicity and analytical flexibility. The more complex the source systems are, the more tables and relations have to be considered and the more fact tables and combinations of dimensions are possible, the more information is lost due the transition from an entity-relation model to a dimensional model because modeling options of a dimensional model are restricted, the more analytical questions may arise and the more difficult it will be to anticipate them. Consequently the gap between the analytical capabilities of your model and the future analytical needs will become larger with increasing complexity of your business and your source systems. 4.2 Data Integration and ETL The complexity of the ETL processes into a (dimensional) Core model is determined by several factors, such as the number of source systems and their complexity as well as the complexity of the integration and business transformation steps. Loading dimension tables usually is less complex than loading the fact tables, since less source tables have to be considered. Loading the product dimension table is easy since only one single source table is used. Loading the client dimension table is more complex since data two source systems have to be integrated. Fact table 1 of star schema 1: ETL requires o Reading all contract items and count all those, which have been acquired, renewed or terminated during the period to be loaded, the linkage to client and product is straight forward, but the age has to be calculated. o o Reading all contracts and count all active contracts Reading all covered claims where the corresponding incident occurred during the period to be loaded and aggregate the costs of the corresponding services Very often the ETL complexity for fact tables is rather high and usually requires transformation and integration of numerous source tables. Since a star schema is completely different than the typical OLTP data model, data model restructuring is highly complex and often requires multiple steps and intermediate tables to cope with that complexity and to achieve the necessary load performance. 4.3 Historization Kimball introduced the concept of slowly changing dimensions (SCD) in To deal with the various historization requirements different SCD types may be implemented. The most common types are SCD1 (overwrite old with new data) and SCD2 (a new record is created for each change). SCD2 dimension tables have two additional timestamps indicating the validity of a record. In our example, the client and product dimensions contain SCD2 attributes and therefore contain additional attributes for the validity range. The historization concept in a dimensional model is strongly interrelated with its key concept. Whereas the two other model approaches assign a surrogate key to each entity, in a star/snowflake schema a surrogate key is assigned to each new version of an entity. Therefore, since the surrogate key also contains information about validity information, joining fact and dimension tables can be done without considering any timestamp attributes, resulting in join operations which are more intuitive for users and can be more efficiently handled by the database. [email protected]. Info-Tel Date Page 10 / 21
11 Having different primary keys for each version has one drawback: the so called ripple effect. Each new version of a parent triggers new versions in all children, which might lead to an explosion of child elements. Example: If a product sub-category is assigned to a new product category, it gets a new primary key. Therefore, all products in this sub-category have to reference this new primary key, which in turn results in creating new version records with a new primary key for each affected product. In our example, the client and product dimensions contain SCD2 attributes and therefore contain additional attributes for the validity range. 4.4 Lifecycle Management Star schema 1 and 2 both are based on the fact that there is a 1:1 relation between incident and claim, however this is not enforced by any of the two models (e.g. by database constraints). Both schemas can remain unchanged if it is possible to either allocate the overall claim costs to multiple incidents or to define one single incident as primary incident. The latter solution results in losing the information about the other incidents of a claim. Of course, both solutions would require adapting the ETL processes, implementing the allocation process or the determination of the primary incident based on specified business logic. None of the solutions would require any data migration because the model remains unchanged. If none of the above approaches is possible, a 1:m relation between fact table and dimension has to be realized, which is not straight forward in a dimensional model. One possible modeling option is a bridge table, which results in a more complex model and ETL processes. The revised star schema 1 looks quite different, all affected/new tables are marked in the model below. The revised fact table now contains much more records (one record for each claim) due to a higher granularity. Data migration is not possible, since the original star schema did not contain any claim data. 4.5 Data Delivery to Data Marts Data Warehouse architectures with a dimensional core do not necessarily require a persistent Data Mart layer. A separate Data Mart layer may be required if a multidimensional database is in place or if Data Marts in a lower granularity compared to the data in the core layer are required. In the latter case the Data Mart layer may consist only of views, which can be materialized if the performance is not satisfactory. Complex ETL processes from core to Data Mart are not necessary since both layers follow the same modeling paradigm. [email protected]. Info-Tel Date Page 11 / 21
12 5. Relational Core Modeling Relational data modeling is usually used for operational systems (OLTP systems) that store their data on relational databases. But it can only be used for Data Warehouses most structured DWH databases are implemented with relational database systems. The term relational to describe the modeling approach of a Core data model is a bit misleading because all of the methods described in this paper use relational databases. Even the term 3NF model (for a data model in 3 rd normal form) does not clearly define this approach. When a dimensional model is implemented as a snowflake schema, it is in 3 rd normal form, too. The following data model is a suggestion of a relational Core model for our case study. The different aspects about modeling strategy, data integration, historization lifecycle management and data delivery to Data Marts are described in the next sections. A relational Core data model is not necessarily a 3NF data model. In our case study, the code tables for incident type and service type are denormalized in the Core. 5.1 Modeling Strategy A relational Core data model is often used for Data Warehouses when the user requirements for reporting or other BI applications are not yet defined or when the same information is required for different purposes. For example, the age of a client in our example is currently used as a dimension in the dimensional model of the previous chapter. But probably it will be used as a fact in another Data Mart, for example to derive the average age of clients per claim. A relational Core model allows more flexibility in the usage of the same data in different Data Marts. [email protected]. Info-Tel Date Page 12 / 21
13 The modeling strategy for a relational Core model is often data-driven or a combination of data-driven and reporting-driven. The Core data model is derived from the source systems, but known business requirements should be considered where possible. Our example Core model is derived from the data models of the two source systems. The known business requirements are considered in the Core model, i.e. it is easily possible to deliver the required information to a dimensional Data Mart with the facts and dimensions described in the previous chapter. 5.2 Data Integration and ETL An important characteristic of a Core data model is that data from different sources is integrated. The model has to be subject-oriented, not a one-to-one copy of the source tables. Similar data is stored in one place of the Core. An example for our case study is the Address table. It contains customer addresses of the CRM system, but also the places of incident of the Claims system. The complexity of the ETL processes depends on the transformation rules between source system and Core data model. If the Core model is very similar to the data model of the source system this is usually the case if only one (main) source system is used -, the transformations are typically easy. If multiple source systems have to be integrated in one common Core data model, the complexity will increase. The ETL processes for Master Data Versioning (see next section) are not trivial, but easy to implement because the logic is always the same, independent of the data contents. They can easily be generated or implemented with operations of an adequate ETL tool. An established method is to implement the (complex) data integration transformations between Staging Area and Cleansing Area. In this case, the Cleansing Area has the same structure as the Core data model, but without historization. Then, the ETL processes between Cleansing Area and Core are only used for Master Data Versioning and assignment of surrogate keys. 5.3 Historization For flexibility to load SCD1 and SCD2 dimensions in the Data Marts, master data (i.e. the descriptive data used in dimensions) must be versioned in the Core data model. Data versioning could be realized with SCD2 like in the dimensional model, but for relationships between entities, this can lead to many different versions when existing data of a referred entity is changed (see ripple effect described in chapter 4.3). With the concept of Master Data Versioning, this ripple effect can be avoided. For each entity in the relational data model, two tables are defined: The head table contains the identification (business key or primary key of the source system) and all attributes that either cannot change during their lifecycle, or where only the current state is relevant (SCD1). These can also be relationships to other entities, e.g. the relationship from a contract to a client. The version table contains all attributes and relationships that can change during their lifecycle, and where the change history is required in at least one Data Mart (SCD2). Each version has a validity range (valid from / valid to) and refers to the entity in the corresponding head table. In our example, the relationship between a contract and a client cannot change and is therefore stored in the head table of the contract. The relationship between client and address can change during the lifecycle of the client. Thus, the foreign key column to address is stored in the version table of the client. [email protected]. Info-Tel Date Page 13 / 21
14 A version table is only defined when at least one attribute or relationship requires change history tracking. This is not the case for addresses (we are interested only in current addresses) and for incidents (an incident happens once and is not changed afterwards). The separation of head and version tables is required to store independent change history for each entity. Although a foreign key can be static (stored in the head table) or dynamic (stored in the version table), it always refers to the head table of the corresponding entity, not to a particular version. This allows a high flexibility for data historization in the Core and prevents the ripple effect described above. An important difference between Master Data Versioning and Slowly Changing Dimensions Type 2 is the behavior of key references: A reference to an SCD2 dimension is always related to the particular version that is valid at the event time (or load time) of a fact. Relationships between entities in a relational Core model refer to the entity key (i.e. the primary key of the head table), not to a particular version. The version key is implemented for technical reasons only, but never referred. The decision what version is relevant is done at query time or when a Data Mart is loaded. See chapter 5.5 for details. 5.4 Lifecycle Management When the relational Core mode is derived from the OLTP data models, structure changes of a source system require modifications of the Core model. In our example, the Claims source system is changed, so that a claim can contain multiple incidents, and a service is attached to an incident, not directly to the claim anymore. Because the existing relationship between incidents and claims in the Core already allow to store multiple incidents per claim, no model changes are required for this part. But the relationship from services to claim must be changed to a relationship to the incident head table. Let s assume that a service can be transferred from one incident to another within the same claim. In this case, the foreign key column to the referred incident must be stored in the version table of the service. Adding new entities or attributes to a relational Core data model is usually not very complex. But when existing structures especially relationships are changed, the effort for data migration in the Core may be high. For our example, it is simple to add a new foreign key column to the service version table. But what contents must be filled in for the historical data records? For each service, we can identify the corresponding claim. If only one incident exists for this claim (and this was guaranteed in the initial data model of the source system), the foreign key to the correct incident can be determined clearly. All current and previous versions of all services must updated with the correct incident before the foreign key column and relationship between service and claim is eliminated [email protected]. Info-Tel Date Page 14 / 21
15 Unfortunately, not all model changes allow a complete and accurate data migration. If the previous Claim system already allowed to have multiple incidents for one claim, a definite assignment of an incident for each service would not be possible. If a new attribute is added to the system, no information about its contents in available for historical data. In such situations, default values or singletons must be assigned, and the business users must be aware that the required information is only available for claims after the release data of the changed source system. 5.5 Data Delivery to Data Marts To load the (dimensional) Data Marts from a relational Core model, the data must be transformed into dimensions and facts. For example, we have to load a Claim dimension with information about claims, incidents, services and products. Before an appropriate ETL process to load this dimension table can be implemented, we have to define the granularity of the dimension, i.e. the detail level of the data. In the Core data model, the most detailed level are the services. If the granularity of the Data Mart dimension is a claim, incidents and services can only be stored in an aggregated form. For example, one main service must be identified, or the number of services is stored in the dimension table. Another option is to skip the information about services in the Claim dimension and to identify a separate Service dimension. The relationship between claims and services is then only possible through the fact table. To avoid complex ETL processes to load the Data Marts, its is recommended to hide the Master Data Versioning with a History View Layer on the Core tables. For each pair of head and version table, the following views are created (or better generated): The Current View contains all business attributes of head and version table, but shows only the current state of the data. This is implemented with an join of head and version table and a filter on the latest version. Optionally, a context driven view can be implemented that allows to define a query date with a parameter or context variable. Current views are used to load SCD1 dimensions or for incremental loads of SCD2 dimensions. The Version View contains all business attributes of head and version table and the validity range (valid from / valid to) for all existing versions. Version views are used for initial loads of SCD2 dimensions. When only one version view is involved to load a dimension table, this is straightforward. But when two or more version views must be combined to load one SCD2 dimension, an additional intersection view is required. The Intersection View determines all intermediate versions based on the combinations of the validity ranges of all version views. A possible implementation is described in [4]. This view is used for initial loads of SCD2 dimensions with data from multiple head and version tables. With a combination of Master Data and a History View Layer, the complexity to load a dimensional Data Mart can be reduced. But if the relational Core data model is very different from the required dimensions and facts in a Data Mart, the additional transformations required to load the Data Mart can be rather complex. [email protected]. Info-Tel Date Page 15 / 21
16 6. Data Vault Modeling The Data Vault modeling, invented by Dan Linstedt in 1990, is used to model the enterprise Data Warehouse Core layer. This approach is suitable for multi-source environments needing a fast adaptation to changes. The model consists of a complete denormalization ( Unified Decomposition mentioned by Hans Hultgren, see [5]) of three base constructs into entities: A Hub is the business key defining a core business concept. This key is related to a business entity, not to a particular source system. A Link is the business natural relationship between business core entities. A Satellite is the contextual information around keys and relations. It contains the descriptive information of a business entity. This split allows your Enterprise Data Warehouse to have a quicker adaptation to business need, source changes or business rules, though with the disadvantage of an increasing number of structures in the core Data Warehouse. 6.1 Modeling Strategy In Data Vault, discussions with the business are mandatory to identify the business core concepts and the processes used for Hubs and Links. Interpretations of the source systems are possible, but sometimes source systems are not providing the truth about business reality. The core of the model has to be independent from the sources to guarantee the longevity of the enterprise core Data Warehouse and minimize dependencies with the source systems. The Hub structure contains the business core concepts of a functional area. In the design above, we can find seven Hubs, each Hub contains a business key and an internal DWH surrogate key. The business key is the enterprise wide identifier, the key used by the business to isolate a singleton of each business concept. The surrogate key is used with the Data Vault model for references. Transactions and events have to be modeled as independent Hubs; they are core concept of the business. In our example, the customer business key would probably be the CUSTOMER_REF_NUMBER, as long as this identifier is understandable by the business and known as the unique identifier. [email protected]. Info-Tel Date Page 16 / 21
17 Links are the Hub surrogate key associations. There is no limitation for the number of referenced Hubs as long as the association is a business relationship. The physical entity contains the different surrogate keys of the linked Hubs, each in a separate column, and a surrogate key for the association. The Link concept offers a lot of possibilities, the structure is independent and can help to define new relations or meet new needs. In our example, LNK_Event_UnitOfWork represents the natural association for an event. An event will always have a customer, a contract and a product. For Claim, it has to be verified with the business that a claim will always have a customer, a contract and a product. Same for Service: A service will always have an incident, a claim, a product and a customer. This has to be verified with the business. If information at the incident level for claim is needed, a new Link has to be created between those two entities, otherwise a select distinct on the Link will be necessary to get the correct granularity, and this is not acceptable. Again, the Data Vault model allows you to create Links without impacting the rest of your model. Here is a simplified view (without Satellites) of the Data Vault model: The Satellites contain all the information describing the business keys or the associations. There can be many Satellites for one Hub or one Link, the idea is to regroup attributes by frequency of changes or/and by source. Each Satellite entity contains a surrogate key linking to a business key in the Hub or to a Link. The particularity of the Satellite is that the load date is part of Satellite key. Each time an attribute change is loaded, a new row is inserted with the load date of the Satellites insertion. In our example, there are two Satellites for the Hub customer because we regroup attributes by frequency of changes. In the Satellite SAT_CURR_Customer, we have name, gender and birthdate that will most likely not change and where the business only need the current version of the record. The Satellite SAT_HIST_Customer contains the attributes whose changes have to be tracked by the DWH, for example the customer s place of residence. [email protected]. Info-Tel Date Page 17 / 21
18 6.2 Data Integration and ETL Data integration of multiple source systems into one DWH is always hard work. For Data Vault, the strategy will be to find the core business concepts and the natural business keys in the source systems and then verify that they confirm to the business point of view. The same approach is used for the relations. The flexibility of the Data Vault model is due to the fact that it allows adding objects without a complete redesign of the existing structure and the possibility to have different Satellites for the source systems. This strategy allows tracking of the differences and quality job into the core DWH but postpone the sourcing choice of the attributes to the Data Mart layer. For example, if a new source system, also containing customer entities, has to be loaded into the Data Warehouse, a new Satellite with different attributes can be added to the model. A simple report comparing the two Satellites will show the quality issues to the business. For quality problems during the integration, a quality Satellite can be added, containing the details of the integration issues. For example, there is a customer that is in the claim system but not in the CRM. As the core concept is the same, a step in the ETL is added to load the claim customer key into the Hub. The first load source of a key will be the claim system, and an entry will appear in SAT_Customer_Quality that this key is Not in the CRM system. The ETL jobs to load a Data Vault typically run in two steps: In a first step, all Hubs are loaded in parallel. In a second step, all Links and Satellites are loaded in parallel. The individual ETL operations for each type of objects are: Hubs: The business key must appear only once in the Hub; insert with a lookup on the business key Links: The association must appear only one time in the Link; insert with a lookup on the association of surrogate key Satellites: The loading of the object is the most complex, it depends on the historization mode. Either the changed attributes are just updated, or a new version must be inserted. Every object (Hub, Link, Satellite) in the Data Vault methodology stores two auditing attributes: The first load date (i.e. the first date when the specific line appeared) and the first load source (i.e. the first source where the specific line appeared). They are providing audit information at a very detailed level. 6.3 Historization In the Data Vault model, historization is located in the Satellites. It looks like SCD1 or SCD2 dimensions, except that the primary key of the Satellite is the surrogate key of your Hub and the load date of the new record in the Satellite. This historization approach is used including the facts (facts are Hubs and have Satellites). As already explained, it is possible to have multiple Satellites per Hub/per Link grouping them by frequency of change. The multiple Satellites strategy can become complicated when multiple Satellites are historized, the complexity is to regenerate time slices within multiple historizations. In this case, there is a possible solution called point-in-time table, the structure placed between the Hub/Links and the historized Satellites stores the time slice reconciliation and the keys. This reconciliation is heavy: every time there is a change in one of the Satellites, a new record in the point-in-time table linking each Satellite (surrogate key of each Satellites + load date of each Satellites) with the current time slice record is created. [email protected]. Info-Tel Date Page 18 / 21
19 6.4 Lifecycle Management The Unified Decomposition offers the possibility to adapt/change/create tables without touching to the core structure of your Data Warehouse. Based on the insurance case, it is possible to describe some practical lifecycle management scenarios: New attributes for the customer: o Add a Satellite with the new attributes: this method is very agile, there is no impact on the rest of the Data Vault model o Add the attributes in an existing Satellite: this method is heavier but, still, there is an impact only on the attributes in the same Satellite, the change is not impacting the Hubs or the Links. New Hub/Link/Satellite: o This change is transparent for the Data Vault, there is no impact on the core. Our change scenario (see chapter 3.4) will have no impact on the Data Vault structure for the service analysis, this is the many to many relationship benefits. In business language, a service will always have an incident and a claim. The only change is that we have to add a new Link between Claim and Incident, this change does not impact other objects of the model. 6.5 Data Delivery to Data Marts Generally, the Data Vault model fits in a Data Mart star model this way: Links become foreign keys in the fact tables The event Hubs are providing the granularity of the fact table while the other Hubs are becoming dimensions The Satellites are becoming facts or dimension attributes depending on the attached Hub (event or not) Data delivery from a Core Data Vault model can generate performance issues, because of the multiple joins that are sometimes outer in case all the records are not in the Satellite of a Hub. If the model contains point-in-time tables, the queries to reconsolidate the historized dimensions or facts through multiple SCD2 Satellites is very complex and, again, increase the number of joins in the queries. As seen in the previous chapters, all the information is physically stored in the Core Data Warehouse, every need can be generated from it. [email protected]. Info-Tel Date Page 19 / 21
20 7. Conclusion 7.1 Comparison Matrix Topic Dimensional Relational Data Vault Complexity Overall architecture complexity Simple and easy to understand Separation of static and dynamic attributes and relationships Requires in-depth understanding of Hubs, Links and Satellites Intuitive and understandable data model Low number of tables, data model near end user layer (Data Marts) High number of tables due to head and version tables Very high number of tables due to complete denormalization Data Integration and ETL Integration of multiple source systems Complexity of ETL processes to load Core Historization Handling of data versioning (SCD2) of attributes Handling of data versioning (SCD2) of relationships Lifecycle Management Robustness against source data model changes Transformation rules for integration must be implemented in ETL processes Transformations between OLTP models and dimensional Core can be complex Easy delta detection between source system and dimension table "Ripple effect" for changes on higher hierarchy level Source systems changes often have impact to Core data model Transformation rules for integration must be implemented in ETL processes Transformation rules relatively simple when data model is similar to source system Easy delta detection between source system and version table Same logic for relationships as for attributes Source system changes require table changes in Core Separate Satellite for each source system with common business key in Hub reduces complexity Simple standard ETL rules to load Hubs, Links and Satellites Easy delta detection between source system and Satellites, PIT tables for multiple Satellites Quite simple located in the Satellite of the Link, for SCD support of Data Marts, PIT Existing tables are not affected, only new Satellites required Robustness against new analytical requirements New requirements typically have impact to Core data model Model changes required only if required data is not yet in available Core No changes in Data Vault model, only data delivery to Data Marts must be adapted Refactoring of existing Easy to change model tables required in many situations Data Delivery to Data Marts Initial / incremental load of SCD1 dimension Initial / incremental load of SCD2 dimension Initial / incremental load of fact table Very easy because of similar structure in Core and Data Mart Very easy because of similar structure in Core and Data Mart Very easy because of similar structure in Core and Data Mart Historical data must be migrated in some situations Easy by joining current views of History View Layer For initial loads, intersection views are required Easy because facts are related to event date and have no version table Existing tables are not affected, only new Satellites required Easy by reading current version of each Satellite Complex, if PIT tables are required for Hubs with multiple Satellites Incremental load of large fact tables can be a performance challenge (see [6]) [email protected]. Info-Tel Date Page 20 / 21
21 7.2 Recommendations Each Data Warehouse is different. Not only the known (or unknown) analytical requirements and the number, complexity and structure of the source systems have an impact on the architecture and design of a Data Warehouse. Other aspects such as industry sector, company culture, used technologies and tools, data volumes and available knowledge in IT and business departments can have an influence on the recommended design methods. As the comparison matrix on the previous page shows, none of the data modeling methods described in this white paper fits for everything. Each method has its benefits and issues. The following recommendations should give some guidelines what method is appropriate in which situation. Dimensional modeling is strongly recommended for Data Marts. But a dimensional model is also very appropriate for the Core data model of a Data Warehouse, if the analytical needs are known and well defined. A great advantage of a dimensional model is its understandability for business users. Data integration and ETL processes to load a dimensional Core are usually more complex compared to the other modeling methods, but contrariwise the processes to load the Data Marts are much easier than those of the other approaches. A recommended architecture pattern with a dimensional Core is that the Data Mart layer is not realized as a physical layer, but only as a set of views on the Core model. The project team must be aware of the fact, that new analytical needs usually require the adaptation of the Core data model with all consequences. Relational modeling is useful for the Core layer if the analytical requirements are diffuse or not yet defined or if the purpose of the Data Warehouse is the delivery of information to different kind of BI applications and other target systems. In combination with Master Data Versioning, a relational Core gives flexible support of data historization and allows delivering current and historical data to the Data Marts. The ETL processes to load a relational Core are usually easier than for a dimensional Core, but the data delivery to Data Marts can be more complex especially if the Core model is derived from the data models of the source systems. A relational Core is recommended if the source system structures are relatively stable and if a highly integrated Core data model is required. Data Vault modeling is a powerful approach for Data Warehouses with many source systems and regular structure changes on these systems. Adding new data to a Data Vault model is quite simple and fast, but the price for this is a complex data model which is harder to understand than a dimensional or relational data model. Instead of a full integration, data from different sources is assigned to common business keys, but stored in its original form. This allows simpler ETL processes to load data into the Core, but postpones the integration effort to downstream processes. Hence data delivery to the dimensional Data Marts is highly complex and costly. Data Vault modeling is a recommended method especially for agile project environments with short incremental release cycles. Fundamental for all modeling approaches is that the method is accepted by the DWH development team, and all architects and developers understand the concepts, benefits and issues of the chosen modeling approach. [email protected]. Info-Tel Date Page 21 / 21
Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling
Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling Thanks for Attending! Roland Bouman, Leiden the Netherlands MySQL AB, Sun, Strukton, Pentaho (1 nov) Web- and Business Intelligence
Data Vault and The Truth about the Enterprise Data Warehouse
Data Vault and The Truth about the Enterprise Data Warehouse Roelant Vos 04-05-2012 Brisbane, Australia Introduction More often than not, when discussion about data modeling and information architecture
Data Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
SAS BI Course Content; Introduction to DWH / BI Concepts
SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console
Dimensional Modeling for Data Warehouse
Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application
When to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
Data Vault at work. Does Data Vault fulfill its promise? GDF SUEZ Energie Nederland
Data Vault at work Does Data Vault fulfill its promise? Leading player on Dutch energy market Approximately 1,000 employees Production capacity: 3,813 MW 20% of the total Dutch electricity production capacity
COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design
COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
THE DATA WAREHOUSE ETL TOOLKIT CDT803 Three Days
Three Days Prerequisites Students should have at least some experience with any relational database management system. Who Should Attend This course is targeted at technical staff, team leaders and project
Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Improving your Data Warehouse s IQ
Improving your Data Warehouse s IQ Derek Strauss Gavroshe USA, Inc. Outline Data quality for second generation data warehouses DQ tool functionality categories and the data quality process Data model types
The Data Warehouse ETL Toolkit
2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. The Data Warehouse ETL Toolkit Practical Techniques for Extracting,
Reflections on Agile DW by a Business Analytics Practitioner. Werner Engelen Principal Business Analytics Architect
Reflections on Agile DW by a Business Analytics Practitioner Werner Engelen Principal Business Analytics Architect Introduction Werner Engelen Active in BI & DW since 1998 + 6 years at element61 Previously:
Data Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
An Introduction to Data Warehousing. An organization manages information in two dominant forms: operational systems of
An Introduction to Data Warehousing An organization manages information in two dominant forms: operational systems of record and data warehouses. Operational systems are designed to support online transaction
POLAR IT SERVICES. Business Intelligence Project Methodology
POLAR IT SERVICES Business Intelligence Project Methodology Table of Contents 1. Overview... 2 2. Visualize... 3 3. Planning and Architecture... 4 3.1 Define Requirements... 4 3.1.1 Define Attributes...
CHAPTER 3. Data Warehouses and OLAP
CHAPTER 3 Data Warehouses and OLAP 3.1 Data Warehouse 3.2 Differences between Operational Systems and Data Warehouses 3.3 A Multidimensional Data Model 3.4Stars, snowflakes and Fact Constellations: 3.5
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
Volume 8, ISSN 1314-7269 (Online), Published at: http://www.scientific-publications.net
TOWARDS A GENERIC METADATA WAREHOUSE Ivan Pankov, Pavel Saratchev, Filip Filchev, Paul Fatacean Technical university Sofia, 8 St.Kliment Ohridski Boulevard, Sofia 1756, Bulgaria Abstract This document
COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server
Implementing a Data Warehouse with Microsoft SQL Server
Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL
Designing a Dimensional Model
Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and
Business Intelligence, Data warehousing Concept and artifacts
Business Intelligence, Data warehousing Concept and artifacts Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from
Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate. Technical White Paper
Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate Technical White Paper Mathias Zarick, Karol Hajdu Senior Consultants March-2011 While looking for a solution for
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4
Presented by: Jose Chinchilla, MCITP
Presented by: Jose Chinchilla, MCITP Jose Chinchilla MCITP: Database Administrator, SQL Server 2008 MCITP: Business Intelligence SQL Server 2008 Customers & Partners Current Positions: President, Agile
BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on
East Asia Network Sdn Bhd
Course: Analyzing, Designing, and Implementing a Data Warehouse with Microsoft SQL Server 2014 Elements of this syllabus may be change to cater to the participants background & knowledge. This course describes
The Benefits of Data Modeling in Business Intelligence
WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2
Data Warehouse Overview. Srini Rengarajan
Data Warehouse Overview Srini Rengarajan Please mute Your cell! Agenda Data Warehouse Architecture Approaches to build a Data Warehouse Top Down Approach Bottom Up Approach Best Practices Case Example
Implementing a Data Warehouse with Microsoft SQL Server
This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and
A Service-oriented Architecture for Business Intelligence
A Service-oriented Architecture for Business Intelligence Liya Wu 1, Gilad Barash 1, Claudio Bartolini 2 1 HP Software 2 HP Laboratories {[email protected]} Abstract Business intelligence is a business
Sterling Business Intelligence
Sterling Business Intelligence Concepts Guide Release 9.0 March 2010 Copyright 2009 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:
The Benefits of Data Modeling in Data Warehousing
WHITE PAPER: THE BENEFITS OF DATA MODELING IN DATA WAREHOUSING The Benefits of Data Modeling in Data Warehousing NOVEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2
14. Data Warehousing & Data Mining
14. Data Warehousing & Data Mining Data Warehousing Concepts Decision support is key for companies wanting to turn their organizational data into an information asset Data Warehouse "A subject-oriented,
How To Model Data For Business Intelligence (Bi)
WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2
Master Data Management and Data Warehousing. Zahra Mansoori
Master Data Management and Data Warehousing Zahra Mansoori 1 1. Preference 2 IT landscape growth IT landscapes have grown into complex arrays of different systems, applications, and technologies over the
Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance
Brochure More information from http://www.researchandmarkets.com/reports/2248199/ Mastering Data Warehouse Aggregates. Solutions for Star Schema Performance Description: - This is the first book to provide
Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Course
IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise.
IS YOUR DATA WAREHOUSE SUCCESSFUL? Developing a Data Warehouse Process that responds to the needs of the Enterprise. Peter R. Welbrock Smith-Hanley Consulting Group Philadelphia, PA ABSTRACT Developing
Implementing a Data Warehouse with Microsoft SQL Server MOC 20463
Implementing a Data Warehouse with Microsoft SQL Server MOC 20463 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing
COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER
COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER MODULE 1: INTRODUCTION TO DATA WAREHOUSING This module provides an introduction to the key components of a data warehousing
Data Warehouse and Business Intelligence Testing: Challenges, Best Practices & the Solution
Warehouse and Business Intelligence : Challenges, Best Practices & the Solution Prepared by datagaps http://www.datagaps.com http://www.youtube.com/datagaps http://www.twitter.com/datagaps Contact [email protected]
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Data Vault and Data Virtualization: Double Agility
Data Vault and Data Virtualization: Double Agility A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy March 2015 Sponsored by Copyright 2015 R20/Consultancy.
Optimizing Your Data Warehouse Design for Superior Performance
Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex
Introduction to Data Vault Modeling
Introduction to Data Vault Modeling Compiled and Edited by Kent Graziano, Senior BI/DW Consultant Note: This article is substantially excerpted (with permission) from the book Super Charge Your Data Warehouse:
Deductive Data Warehouses and Aggregate (Derived) Tables
Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,
Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum
Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence Module Curriculum This document addresses the content related abilities, with reference to the module.
Implementing Oracle BI Applications during an ERP Upgrade
Implementing Oracle BI Applications during an ERP Upgrade Summary Jamal Syed BI Practice Lead Emerging solutions 20 N. Wacker Drive Suite 1870 Chicago, IL 60606 Emerging Solutions, a professional services
Understanding Data Warehousing. [by Alex Kriegel]
Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.
Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach
2006 ISMA Conference 1 Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach Priya Lobo CFPS Satyam Computer Services Ltd. 69, Railway Parallel Road, Kumarapark West, Bangalore 560020,
DEVELOPING REQUIREMENTS FOR DATA WAREHOUSE SYSTEMS WITH USE CASES
DEVELOPING REQUIREMENTS FOR DATA WAREHOUSE SYSTEMS WITH USE CASES Robert M. Bruckner Vienna University of Technology [email protected] Beate List Vienna University of Technology [email protected]
Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition
Migrating a Discoverer System to Oracle Business Intelligence Enterprise Edition Milena Gerova President Bulgarian Oracle User Group [email protected] Who am I Project Manager in TechnoLogica Ltd
ETL Process in Data Warehouse. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT
ETL Process in Data Warehouse G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT Outline ETL Extraction Transformation Loading ETL Overview Extraction Transformation Loading ETL To get data out of
SENG 520, Experience with a high-level programming language. (304) 579-7726, [email protected]
Course : Semester : Course Format And Credit hours : Prerequisites : Data Warehousing and Business Intelligence Summer (Odd Years) online 3 hr Credit SENG 520, Experience with a high-level programming
An Architecture for Integrated Operational Business Intelligence
An Architecture for Integrated Operational Business Intelligence Dr. Ulrich Christ SAP AG Dietmar-Hopp-Allee 16 69190 Walldorf [email protected] Abstract: In recent years, Operational Business Intelligence
Extensibility of Oracle BI Applications
Extensibility of Oracle BI Applications The Value of Oracle s BI Analytic Applications with Non-ERP Sources A White Paper by Guident Written - April 2009 Revised - February 2010 Guident Technologies, Inc.
Business Intelligence. 1. Introduction September, 2013.
Business Intelligence 1. Introduction September, 2013. The content of the first lecture Introduction to data warehousing and business intelligence Star join 2 Data hierarchy Strategical data Operational
Building an Effective Data Warehouse Architecture James Serra
Building an Effective Data Warehouse Architecture James Serra Global Sponsors: About Me Business Intelligence Consultant, in IT for 28 years Owner of Serra Consulting Services, specializing in end-to-end
Data Testing on Business Intelligence & Data Warehouse Projects
Data Testing on Business Intelligence & Data Warehouse Projects Karen N. Johnson 1 Construct of a Data Warehouse A brief look at core components of a warehouse. From the left, these three boxes represent
Week 3 lecture slides
Week 3 lecture slides Topics Data Warehouses Online Analytical Processing Introduction to Data Cubes Textbook reference: Chapter 3 Data Warehouses A data warehouse is a collection of data specifically
How To Write A Diagram
Data Model ing Essentials Third Edition Graeme C. Simsion and Graham C. Witt MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE
Lection 3-4 WAREHOUSING
Lection 3-4 DATA WAREHOUSING Learning Objectives Understand d the basic definitions iti and concepts of data warehouses Understand data warehousing architectures Describe the processes used in developing
Data Warehousing Concepts
Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis
Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server
Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server 2014 Delivery Method : Instructor-led
Integrating SAP and non-sap data for comprehensive Business Intelligence
WHITE PAPER Integrating SAP and non-sap data for comprehensive Business Intelligence www.barc.de/en Business Application Research Center 2 Integrating SAP and non-sap data Authors Timm Grosser Senior Analyst
Implementing Oracle BI Applications during an ERP Upgrade
1 Implementing Oracle BI Applications during an ERP Upgrade Jamal Syed Table of Contents TABLE OF CONTENTS... 2 Executive Summary... 3 Planning an ERP Upgrade?... 4 A Need for Speed... 6 Impact of data
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server
Extending Hyperion BI with the Oracle BI Server Mark Ostroff Sr. BI Solutions Consultant Agenda Hyperion BI versus Hyperion BI with OBI Server Benefits of using Hyperion BI with the
Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning
Course Outline: Course: Implementing a Data with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning Duration: 5.00 Day(s)/ 40 hrs Overview: This 5-day instructor-led course describes
Data warehouse and Business Intelligence Collateral
Data warehouse and Business Intelligence Collateral Page 1 of 12 DATA WAREHOUSE AND BUSINESS INTELLIGENCE COLLATERAL Brains for the corporate brawn: In the current scenario of the business world, the competition
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments.
Application Of Business Intelligence In Agriculture 2020 System to Improve Efficiency And Support Decision Making in Investments Anuraj Gupta Department of Electronics and Communication Oriental Institute
Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)
Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463) Course Description Data warehousing is a solution organizations use to centralize business data for reporting and analysis. This five-day
The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.
The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, [email protected] ABSTRACT Health Care Statistics on a state level is a
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 OVERVIEW About this Course Data warehousing is a solution organizations use to centralize business data for reporting and analysis.
Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage
Moving Large Data at a Blinding Speed for Critical Business Intelligence A competitive advantage Intelligent Data In Real Time How do you detect and stop a Money Laundering transaction just about to take
MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus
MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus I. Contact Information Professor: Joseph Morabito, Ph.D. Office: Babbio 419 Office Hours: By Appt. Phone: 201-216-5304 Email: [email protected]
IST722 Data Warehousing
IST722 Data Warehousing Components of the Data Warehouse Michael A. Fudge, Jr. Recall: Inmon s CIF The CIF is a reference architecture Understanding the Diagram The CIF is a reference architecture CIF
Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 Length: Audience(s): 5 Days Level: 200 IT Professionals Technology: Microsoft SQL Server 2012 Type: Delivery Method: Course Instructor-led
Part 22. Data Warehousing
Part 22 Data Warehousing The Decision Support System (DSS) Tools to assist decision-making Used at all levels in the organization Sometimes focused on a single area Sometimes focused on a single problem
Speeding ETL Processing in Data Warehouses White Paper
Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are
Establish and maintain Center of Excellence (CoE) around Data Architecture
Senior BI Data Architect - Bensenville, IL The Company s Information Management Team is comprised of highly technical resources with diverse backgrounds in data warehouse development & support, business
SimCorp Solution Guide
SimCorp Solution Guide Data Warehouse Manager For all your reporting and analytics tasks, you need a central data repository regardless of source. SimCorp s Data Warehouse Manager gives you a comprehensive,
Multidimensional Modeling - Stocks
Bases de Dados e Data Warehouse 06 BDDW 2006/2007 Notice! Author " João Moura Pires ([email protected])! This material can be freely used for personal or academic purposes without any previous authorization
Chapter 3 - Data Replication and Materialized Integration
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 [email protected] Chapter 3 - Data Replication and Materialized Integration Motivation Replication:
