Designing Data Warehouses with Object Process Methodology Roman Feldman

Size: px
Start display at page:

Download "Designing Data Warehouses with Object Process Methodology Roman Feldman"

Transcription

1 Designing Data Warehouses with Object Process Methodology Roman Feldman

2 Designing Data Warehouses with Object Process Methodology Research Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science Roman Feldman Submitted to the Senate of the Technion Israel Institute of Technology Sivan, 5766 HAIFA May 2006

3 The research thesis was done under the supervision of Prof. Dov Dori, from the Faculty of Industrial Engineering and Management at the Technion and Dr. Arnon Sturm, from the department of Information Systems Engineering at Ben Gurion University of the Negev. The generous financial help of the Technion is gratefully acknowledged. I would like to acknowledge the effort of my advisors, Prof. Dov Dori and Dr. Arnon Sturm, for their patience and trust given to me. I would like to express my keen thanks to the following undergraduate students: Michal Hashavit, Inbal Bar-Noy, and Eyal Tzadka, who, by the means of their software projects, helped implementing the ideas and concepts presented in this thesis. I would also like to thank my spouse, Marina, who enriches me by her endless support.

4 Table of Contents ABSTRACT... 1 SYMBOLS AND ABBREVIATIONS LIST INTRODUCTION RESEARCH MOTIVATION AND GOALS TRANSFORMING AN OPERATIONAL SYSTEM MODEL TO A DATA WAREHOUSE MODEL: A SURVEY OF TECHNIQUES Data warehouse construction from structural (data) models of operational systems From E/R to dimensional fact model (ERDFM) From XML DTD to dimensional fact model (XDDFM) From SERM to star schema (SERM) From E/R to fact schema (ERFS) From E/R to various dimensional models (GER) From E/R to ME/R (MER) Schema transformation (ST) Data warehouse construction from business process models From semantic object model to Star schema (SOM) From rule based customer relationship management to Star Schema (RBCRM) Evaluating techniques for data warehouse model construction from an operational system model Evaluation criteria Techniques evaluation OBJECT-PROCESS METHODOLOGY OPM Highlights Benefits of OPM for Data Warehouse Construction AN OPM-BASED METHOD FOR TRANSFORMATION OF OPERATIONAL SYSTEM MODEL TO DATA WAREHOUSE MODEL Guidelines for Operational System Specification A Purchasing Organization Case Study... 38

5 Table of Contents (continued) 5.3. Transforming the OPM system specification into a data warehouse model Utilizing Semantic Features of OPM Process and Object characterization Inheritance Object States Instantiation IMPLEMENTATION AND INTEGRATION INTO OPCAT METHOD EVALUATION Feature-Based Analysis A Catalog Trading Company Case Study Real Life Case Study Data Warehouse Expert Analysis Limitations of the Proposed Method Evaluation Summary CONSTRUCTING DATA WAREHOUSE FROM NON-OPM OPERATIONAL SYSTEM SPECIFICATION SUMMARY REFERENCES... 96

6 List of Figures Figure 1: A Star Schema of a Sales DW cube... 5 Figure 2: A Snowflake Schema of the same Sales DW Cube... 6 Figure 3: The System Diagram (SD) the top-level OPD describing a Goods Purchasing process Figure 4: Zooming into the Goods Purchasing process Figure 5: Zooming into the Full Vendor Selection process Figure 6: Unfolding of the Purchasing Decision object Figure 7: A Snowflake schema of the Purchasing Decision, drawn from the Goods Purchasing process Figure 8: A Snowflake schema of the Purchase Orders, drawn from the Goods Purchasing process Figure 9: A Snowflake schema of the Purchase Orders, drawn from the Order Issuing subprocess Figure 10. An example of an object characterized by a process Figure 11: Example of applying inheritance Figure 12: Graphical User Interface of the OPM 2 DW tool Figure 13: A top-level OPD describing a Customer Handling process in a trading organization Figure 14: Zooming into the Customer Handling Process Figure 15: Zooming into the Customer-Salesman Negotiating process Figure 16: Zooming into the Final Decision Handling process Figure 17: Unfolded OPDs of the Catalog Trading System objects... 71

7 List of Figures (continued) Figure 18: A Snowflake Schema of the Bonus Wages File Entries, drawn from the System Diagram Figure 19: A Snowflake Schema of the Customer Requests for Proposals, drawn from the System Diagram Figure 20: A Snowflake Schema of the Customer Decisions For Proposals, drawn from the Customer Decision Making subprocess Figure 21: A top-level OPD describing a Breakdown Maintenance process in SAP R/3 concept Figure 22: Zooming into the Breakdown Maintenance process Figure 23: Zooming into the Problem Solving process Figure 24: Unfoldings of the major objects: Equipment and Notification Figure 25: Schematic representation of the SAP BW Notifications Cube Figure 26: Notifications Cube, drawn from the Breakdown Maintenance process... 83

8 List of Tables Table 1: Comparison between data warehouse generation techniques Table 2: OPM Elements Table 3: Intermediate conversion stages for Purchasing Decision... 52

9 Abstract Data warehouse modeling is a complicated task, which involves knowledge of business processes, as well as familiarity with operational information systems structure and behavior. Several modeling techniques were suggested to utilize the operational system structural or behavioral model in order to construct a data warehouse conceptual model. In this thesis, we present a feature-based technique to evaluate the existing methods of operational system based data warehouse creation, and analyze the methods according to this technique. Our analysis of these methods indicates that they are limited in their applicability to model large-scale systems, as they require acquaintance with the business processes and ability to select relevant transactional entities. In addition, these usually disregard the process perspective and require multiple unassisted manual actions, as discovering measures and relevant dimensional entities. To overcome the limitations of existing techniques, we propose OPM-based Data Warehouse Construction (ODWC), a method based on Object-Process Methodology (OPM) for constructing a data warehouse model out of an operational system specification. OPM was the modeling method of choice primarily because it unifies all system aspects within its single view, which enables the integration of both business perspective and the system data structure. The method uses both the structural and behavioral aspects of the underlying operational system to create a multidimensional conceptual data warehouse model. Utilizing the semantic features of OPM, we present two case studies to demonstrate our method. We construct a software tool, which implements our method and allows us to apply the method on large sized systems. 1

10 Consequently, we compare the ODWC method to existent methods of data warehouse creation. The proposed method was evaluated by four means: (1) a feature-based evaluation; (2) case studies that showed the feasibility of the method; (3) a real life case study; and (4) a data warehouse expert evaluation. The evaluation we performed shows that the ODWC method is the most suitable for the task of transforming operational system specification to a data warehouse model for the following reasons: (1) OPM s scaling mechanisms allow presenting the operational system and the supported business processes at varying levels of abstraction. This feature aids selection of the business process to be analyzed, and allows creating cubes at different summation levels. (2) OPM allows distinction of the business objects relevant to the business functionality. (3) OPM enables clear identification of the outcomes of a business process. 2

11 Symbols and Abbreviations List Code Full Name CASE Computer Aided Software Engineering COS Conceptual Object Schema CWM Common Warehouse Metamodel DFD Data Flow Diagrams DTD XML Document Type Definition DW Data Warehouse ERD Entity Relationship Diagram ERDFM Transformation Technique: From E/R to dimensional fact model ERFS Transformation Technique: From E/R to fact schema ERP Enterprise Resources Planning GER Transformation Technique: From E/R to various dimensional models MER Multidimensional Entity Relationship model OLAP Online Analytical Processing OLTP Online Transactional Processing OMG Object Management Group OPCAT OPM Case Tool OPM Object Process Methodology RBCRM Transformation Technique: From rule-based customer relationship management to Star Schema SERM Structured Entity Relationship Model SOM Semantic Object Model ST Transformation Technique: Schema Transformation UML Unified Modeling Language XML Extensible Markup Language 3

12 1. Introduction Data warehousing (DW) [ 22, 33] is a rapidly growing area within the field of information systems that focuses on the enablement of business intelligence. Many companies, which invested resources in a new generation of operational information systems during the 1990s, are trying to gain added value from the vast amounts of information stored in their systems by applying different tools and techniques, jointly dubbed Business Intelligence [ 12, 16], for information analysis, decision support, and strategic planning. These techniques and tools analyze massive amounts of data, which are continuously gathered during the operation of large, complex information systems, in order to produce the information needed for decision-making processes while providing the end user with autonomy and flexibility in browsing, summarizing, crosscutting, and analyzing the operational data. The requirements from these analytical tools have led to data warehousing a new approach to data analysis built on top of multidimensional databases [ 4, 22, 33] that go beyond normalized relational databases. Off-the-shelf data warehousing products supply solid physical data models and On-Line Analytical Processing (OLAP) foundation. Yet, conceptual and logical design of the data warehouse remains a complicated task, which is assigned to system engineers and analysts of the organization that own and run these data warehouses. Data warehouse design usually follows the Star schema diagram [ 22, 32]. A star schema consists of a fact table and a single de-normalized dimension table for each dimension of a data model. The dimension tables can be normalized to create a Snowflake schema, which can then support attribute hierarchies. Both the Star and the Snowflake schemata present data as a single cube, which is the basic data warehouse structure that allows users to analyze the performance of business processes. 4

13 Figure 1 presents a star schema of a sales Cube suggested Chaudhuri and Dayal [ 4]. The cube analyzes sales quantities and revenues by products, customers, retailers, locations, orders, and dates. The underlined attributes are the primary keys of the tables and the arrows show the navigation direction. The end-user is able to produce a report on that cube on any subset of these characteristics (or their attributes, such as the State for a City). Figure 2 shows the same example in a snowflake schema, where the Time, Product and Address attributes were normalized to create State, Country, Month, Year, Category, and Type tables. Snowflake schema decreases database size, and allows more master data sharing between different cubes, but is more costly in terms of amount of joins in a query. Snowflaking is argued to be undesirable [ 22], since it adds complexity to the schema and requires extra joins. Best solutions are believed to be the balance of these two schemas [ 29], as dimension tables are only partially normalized. Some industrial models [ 34] indeed limit snowflaking by introducing a limitation on the number of transitive joins to reach the attributes that may be used as facts aggregating factors. In the example of Figure 2, SAP [ 34] will not allow aggregation of facts on Types or Countries, and the Time dimension would not be snowflaked anyway. Product ProdNo ProdName ProdDescr Category UnitPrice CategoryName CategoryDescr Grade Type Customer CustomerNo CustomerName CustomerDOB City Retailer RetailerNo RetailerName City FactTable OrderNo ProdNo CustomerNo RetailerNo DateKey CityName Quantity TotalPrice Order OrderNo OrderDate Date DateKey Date Month Year City CityName State Country Figure 1: A Star Schema of a Sales DW cube. 5

14 Product ProdNo ProdName ProdDescr Category UnitPrice Category CategoryName CategoryDescr Grade Type Type Customer CustomerNo CustomerName CustomerDOB City Retailer RetailerNo RetailerName City FactTable OrderNo ProdNo CustomerNo RetailerNo DateKey CityName Quantity TotalPrice Order OrderNo OrderDate Country Country Date DateKey Date Month Year Month Month Year Year Year City CityName State State State Country Figure 2: A Snowflake Schema of the same Sales DW Cube. Besides the star/snowflake schema, several conceptual data warehouse models have been developed to support the design of data warehouses. While these approaches assist in data warehouse modeling, they address implementation issues only partially, implying that the physical database design might lead to the Star or Snowflake architectures. Golfarelli, Maio and Rizzi [ 13, 14] suggested the Dimensional Fact Model. This model allows the distinction between fact attributes (quantitative measures in the middle of the cube), dimensions (attributes related directly to the fact entity), hierarchies (attributes of dimension which can serve as aggregation factors for the facts), and non-dimension attributes (descriptive attributes of dimensions which can be presented in reports, but cannot be used for aggregation). It also allows determination of restrictions of possible aggregation functions between fact attributes and dimensions (which is useful in non-additive attributes like inventory levels, which cannot be summarized along different months, but it is reasonable to calculate averages, standard deviations, etc. along the time dimension). Popularity of the Entity Relational Diagrams [ 5] led to two models based on extensions to the Entity Relationship Model: StarER by Tryfona, Busborg and Christiansen [ 39] and Multidimensional E/R model by Sapia, Blaschka, Hofling and 6

15 Dinter [ 36]. Both use notations close to the standard E/R ones to represent multidimensional databases. ME/R allows distinction of cubes, their attributes (facts), dimensions and hierarchies, while StarER addresses also restrictions of aggregation functions by the classification of the quantitative attributes into stock (state in a specific point of time), flow (commutative effect over a period of time, like change in stock) and value-per-unit (measured for a fixed time, but unlike stock has a unit context). StarER also allows many-to-many relationships between facts and dimensions and inside the dimensions, without addressing their problematic implementation, as fact aggregation can become ambiguous in these cases. Object-oriented data warehouse design was introduced in the framework of the Object Relational View (ORV) approach [ 18], which is an object-oriented data warehouse design framework. The ORV model can be obtained by a transformation of the Snowflake schema. Arguably, ORV can lead to better system performance through object links instead of the more expensive joins. UML [ 40] extensions for data warehouse conceptual modeling have also been proposed [ 27, 38]. OMG [ 3] suggested an approach for metadata interchange, called the Common Warehouse Metamodel (CWM) [ 31], which is capable of expressing major multidimensional properties [ 27]. Most data warehouse modeling techniques refer mainly to the user requirements [ 22] and advocate building a new multidimensional model regardless of existing models of the underlying operational systems. This approach raises several problems [ 29]: 1. User requirements are unpredictable and subject to change over time, so using them yields an unstable basis for design. When utilizing the operational system schema as the design basis, one may have better chances of foreseeing future analysis requirements, such as the need for additional dimensions. 7

16 2. Incorrect designs are possible if the designer does not understand the underlying relationships among the various data types. 3. Premature aggregation causes loss of information that limits the options to analyze the data in fine-grained resolutions. These drawbacks of relying on user requirements as a basis for data warehouse design have motivated studies to use operational system models as a basis for modeling, designing, and constructing data warehouses. Since data warehouses depend on the underlying operational systems as their data supply sources, the latter become important for the data warehouse conceptual and logical design. Therefore, most of these methods utilize the structural operational system models. The fundamental role of data warehousing is business performance measurement and redesign support [ 20, 24]. In this context, several other techniques were offered to create data warehouse models out of business process models. Only one known technique [ 2] creates a data warehouse from relatively generic business process models, and it can be applied to various content areas. Another transformation technique [ 21] is contentspecific and is applied to Rule-Based Customer Relationship Management (CRM) systems. List et al. [ 24] offered a specific data warehouse model aimed at analyzing business workflow performance and using a workflow management system as the data source. However, this approach can create infeasible data warehouse models, which cannot be obtained by loading data from operational sources, as it does not account for the data structure of the operational systems. The need for a generic data warehouse construction method, which is based both on the structural and the procedural part of the conceptual model of the underlying operational system remains unfulfilled. This thesis explicitly addresses this need by 8

17 providing a comprehensive method for transforming operational system specification into a data warehouse model. The work is organized as follows. Chapter 2 presents the research motivation and goals, Chapter 3 presents the existing methods for transformation of operational system models into data warehouse models, while Chapter 4 introduces the Object-Process Methodology (OPM) and provides explanations for its selection for the task at hand. Chapter 5 describes our OPM-based method for operational system model transformation into a data warehouse model, while Chapter 6 describes the integration of the method into the OPM CASE tool - OPCAT. Chapter 7 deals with the evaluation of the proposed method. Chapter 8 discusses the applicability of our method to data warehouse construction from non-opm operational sources. Finally, Chapter 9 summarizes the contribution of this research and discusses future research plans. 9

18 2. Research Motivation and Goals While various techniques for transforming operational system specification into a data warehouse model were suggested, they require a lot of manual work, they offer little assistance for discovering facts and identifying relevant dimensional attributes, and they hardly address the process perspective. The main objective of this research is to formulate a method for constructing a data warehouse model from an operational system model that overcomes the aforementioned shortcomings of existing techniques. Examining candidates for modeling approaches to be used for the source operational model from which the DW schema can be extracted, we found out that the Object- Process Methodology (OPM) holds the greatest potential to be enhanced for that task due to its scaling (abstraction/refinement) mechanisms and its integration of both business process and data structure perspectives. The research goals are achieved in the following steps: 1. Developing a framework for comparing and evaluating techniques for the transformation of an operational system model to a data warehouse model, 2. Providing a method and a corresponding algorithm for the transformation that is based on Object-Process Methodology and capitalizes on its relevant advantages, 3. Implementing the proposed method and integrating it into the OPM CASE tool, and 4. Evaluating the suitability of the new method for the transformation task. 10

19 3. Transforming an Operational System Model to a Data Warehouse Model: A Survey of Techniques In this chapter we survey existing techniques for transforming operational system specification to a data warehouse model [ 7]. We classify these techniques into two categories according to their origin, namely, the structural (data) models of operational systems (which are presented in Section 3.1) and the business process models (which are presented in Section 3.2) Data warehouse construction from structural (data) models of operational systems Techniques that analyze the data models of the underlying operational sources to produce a data warehouse model are presented here From E/R to dimensional fact model (ERDFM) Golfarelli, Maio, and Rizzi [ 13, 14] suggested a transformation technique based on their Dimensional Fact conceptual model. The construction of a Dimensional Fact schema involves the following stages: 1. Selecting facts (entities or n-ary relationships between entities) from the operational system s ER [ 5] schema. 2. Building an attribute tree for each fact by advancing along many-to-one relationships or functional dependencies, such that the fact serves as the roots of its attribute tree. 3. Reducing (manually) the attribute trees by means of pruning (excluding attributes from the scheme by dropping sub-trees) and grafting (excluding 11

20 attributes from the tree, but preserving their sons) attributes that are irrelevant for the information analysis expected from the data warehouse. 4. Choosing dimensions among the child vertices of the root. 5. Defining fact attributes and describing how they can be calculated from the attributes of the ER scheme. The fact attributes are the remaining children of the root (after removing dimensions and their sub-trees), as well as additional attributes, calculated by application of arithmetic functions on the quantitative attributes of the root entity and its neighbors. 6. Defining hierarchies and identifying non-dimensional attributes. Hierarchies are arrangements of attributes in dimensions along trees. Non-dimensional attributes are attributes that will not be used for aggregation, but only for informative purposes. Most of the stages of that method have to be performed manually. Building the attribute trees (stage 2) is automated. Defining the fact attributes (stage 5) can also be automated under the assumption that all the children of the root in the attribute tree remaining after removing dimensions (and no others) are the fact attributes From XML DTD to dimensional fact model (XDDFM) Golfarelli, Rizzi, and Vrdoljak [ 17] define a similar approach to handling XML sources of data instead the Entity-Relationship source. They assume that the XML source is constrained by a Document Type Definition (DTD) to remove uncertainty on the structure of data. Given the DTD, their procedure consists of the following stages: 1. Simplifying the DTD and creating from it a DTD graph, that plays the role of the ER schema. 2. Choosing some of the graph vertices as facts. 12

21 3. Automatically building an attribute tree from the DTD graph, considering the chosen facts. 4. Rearranging the attribute tree. 5. Defining dimensions and measures. Note that disregarding the challenge of building the attribute tree from a DTD source, this approach is a subset of the previous one From SERM to star schema (SERM) Boehnlein and Ende [ 1] suggested a different approach, which originated from the Structured Entity Relationship Model (SERM). SERM [ 37] differs from the ER mainly in visualization of the dependencies between data object types. The left-hand side of a SER-diagram consists of the basic data object types (without dependencies), and each (directed) edge in the graph represents the fact of the right-hand side entity being dependent on the left-hand side entity. As a result, the created scheme is a directed acyclic graph. The right-hand side of the graph contains the most dependent and transactional entities. The left-hand side contains the most structural entities. To begin the derivation process, the designer identifies business measures, which can be either the quantitative attributes of the right-hand-side entities or derived measures involving quantitative attributes of one or more entity. Then, the following procedure is carried out: 1. A closure of existence prerequisites C EX is defined as the set of all the entities which are (directly or transitively) the existence prerequisites of the measures entities. 2. The designer uses C to discover chains of dependencies, which serve as EX candidate dimensions and dimensional hierarchies. Eventual choice of 13

22 dimensions and dimensional hierarchies is performed manually, and a time dimension is essentially added. 3. The designer identifies integrity constraints along the dimension hierarchies. This stage introduces restriction of possible aggregation functions of the facts along the dimensional attributes. This procedure is done manually. In addition, the authors argue that The automatic creation of multidimensional tables is not practical because of varying requirements and creative scope. They also argue that they improved the work of Golfarelli, Maio, and Rizzi [ 13] by providing an easier option to locate potential measures, since the relevant entities are concentrated on the right-hand side of the SERM diagram From E/R to fact schema (ERFS) Husemann, Lechtenborger and Vossen [ 19] suggested an approach that takes as input the identification of relevant entities and their attributes from the ER scheme and free text formulation of expected multidimensional queries. The procedure is as follows. 1. Classification of the chosen attributes into dimensional (quantitative) and measurable (qualitative), and indication of whether each attribute is mandatory or optional. Derived attributes, which are not part of the operational schema, may appear here, depending on the multidimensional requirements. 2. Identification of functional dependencies from dimension levels to measures. Dimensions are recognized as the terminal functional dependencies between entities and measures. Fact schemata are established, combining all the measures sharing the same terminal dimension levels. This is a semiautomatic stage, since functional dependencies are determined manually. 3. Dimensional hierarchy design is performed in accordance with the functional dependencies between different dimensional attributes. Based on the analysis 14

23 requirements, dimensional attributes are classified into property attributes and dimension levels. Dimension levels can be used for summarization and aggregation of facts, while property attributes may be informative in a query, but cannot be summarized upon. 4. Finally, summarizability constraints are introduced. Each measure and dimension level is assigned to one of four possible groups of restriction levels 1. Here again, most of the design is performed manually. Automatic processing may be introduced in stage 2 (identifying dimensions) and part of stage 3 (dimensional hierarchy design, not including classification of attributes into property attributes and dimension levels), once functional dependencies have been manually determined From E/R to various dimensional models (GER) Moody and Kortink [ 29] suggested the following technique to derive dimensional models from operational ER schemata: 1. Classifying Entities in the ER schema into transaction entities (describing events at a point in time or containing measurements that can be summarized), component entities (directly related to the transaction entities via a one-tomany relationship) and classification entities (entities related to component entities by a chain of one-to-many relationships). 2. Identifying possible hierarchies as maximal sequences of entities connected by one-to-many relationships in the same direction. 3. Producing dimensional models using two operators: 1 Possible groups of restriction levels include: {SUM, AVG, MIN, MAX, STDDEV, VAR, COUNT}, {AVG, MIN, MAX, STDDEV, VAR, COUNT}, {COUNT} and {}. 15

24 a. Hierarchy collapse higher-level entities are collapsed into lower level entities, creating de-normalization. b. Aggregation transaction entities are aggregated by their values in some of the key attributes to create new transaction entities. This operator results in loss of information and reduces data size. Several dimensional models are possible as an outcome, due to the various applications of the two operators: a. Flat schema collapsing all possible entities down all the to-many relationships. b. Terraced schema collapsing entities until they reach a transactional entity. c. Star schema collapsing hierarchically related classification entities into component entities (which creates dimension tables), forming fact tables by combination of keys of component entities, and aggregation of numerical attributes within transaction entities by key attributes (dimensions). d. Constellation schemas combination of several Star Schemas with hierarchically linked fact tables (like Sale and Sale Item). e. Galaxy schemas collection of Star schemas/constellations with shared dimensions. f. Snowflake schemas Similar to the Star schema, but without applying the Hierarchy Collapse operator. These schemas can be created by normalizing the dimension tables of a Star schema. g. Star Cluster a compromise between a Star schema and a Snowflake schema. All dimensional hierarchies in a Snowflake schema are collapsed, except entities, which act as higher hierarchical levels for two or more dimensions, to avoid collapse of the same entity into two entities. 16

25 4. Refining the model by: a. Combining fact tables with the same primary keys (dimensions) to reduce the number of Stars. b. Combining related dimension tables by the combination of their keys to reduce the number of dimension tables (this is a manual stage due to the need to identify related dimension tables ). c. Dealing with many-to-many relationships by disposing of them, converting them into one-to-many relationships, or including them in the resulting structure. d. Handling supertype/subtype relationships by adding classification entities to distinguish between subtypes (like Vehicle and Vehicle Type). Even though the authors did not discuss automating the process, some of the stages can be automated. These include stage 2, stage 3 if the user chooses what dimensional model is desirable before hand, and part of stage From E/R to ME/R (MER) A method by Phipps and Davis [ 30] automatically generates a candidate MER [ 36] schema for every entity with numeric fields in the source ER schema. After building a fact node and a time dimension, they recursively examine the relationships of the entities to add levels in a hierarchical manner. The recursive examination stops as soon as a many-to-many relationship was found. This results in a bulk of candidate schemas, which have to be evaluated and improved manually by the following stages: 1. Eliminating unnecessary candidate schemas using known user queries. 2. Inspecting whether measures in each fact node are indeed measures or attributes (and moving them into the dimensions, if necessary). 3. Defining the necessary granularity of time/date information. 17

26 4. Defining whether additional calculated fields are necessary. 5. Attempting to merge schemata and eliminate fields. 6. Defining whether additional data (not originating from the On Line Transactional Processing system) is necessary. Although the authors claim that their method makes headways in the automation of DW conceptual schema generation, this method forces the designer to examine at least O( q n) different query matches, where q is the number of known queries and n is the number of entities in the OLTP. This results in a lot of manual work that utilizes little of the knowledge about the OLTP architecture. Moreover, the method will not find some of the possible multidimensional schemata, since the creation of a factless fact schema (i.e., a schema without numerical fields in its fact table) cannot be achieved by this method Schema transformation (ST) Marotta and Ruggia [ 25] offer a complementary method, which assumes that the designer already knows the conceptual DW schema she wants to reach, and suggest a toolbox of 14 possible transformations. The designer should then choose the appropriate transformations in order to convert the ER schema to the desired conceptual DW schema. The transformation also produces a trace graph between the input operational ER entities and the output DW entities. The decisions regarding which transformations should be applied are done manually. 18

27 3.2. Data warehouse construction from business process models Techniques that analyze the business process models to produce a data warehouse model are surveyed here. For each technique, we elaborate on its models and procedures From semantic object model to Star schema (SOM) A method by Boehnlein and Ende [ 2] constructs data warehouse structures using Semantic Object Model business process models. Semantic Object Model (SOM) is a methodology for business engineering, which can be used for analysis and design of business systems, as proposed by Ferstl and Sinz [ 10]. In this method, data warehouse construction is performed as an extension of Business Process Models construction according to the SOM methodology. Therefore, three preliminary stages are needed before data warehouse structures identification can begin. The preliminary stages are the following: (a) determination of services and goals of the system; (b) analysis of the business processes; and (c) derivation of the conceptual object scheme. Stages (a) and (b) are done manually and are partially performed externally to the organization. Their contributions to data warehouse construction are (a) identifying goals and services of the system; and (b) marking off the examined process. At the end of stage (b), Interaction and Task-Event schemata are produced for the marked-off process. Stage (c) constructs a Conceptual Object Schema (COS), which is a conceptual data model in the framework in the aforementioned SERM notation. Then, attributes that are relevant for analytical purposes must be assigned to objects in the COS. The assignment is performed manually and requires participation of an end user. 19

28 The rest of the process is the data warehouse structures identification, which consists of the following stages: 1. Identification of Metrics: Every object in COS, which serves as an enforcing transaction (service exchange) in the Interaction Schema, can become a fact table in the Star schema. Addition of facts (measures) is necessary, and this is performed manually. 2. Like stage 1 of SERM, a closure of existence prerequisites C is defined as EX the set of all the entities, which are (directly or transitively) the existence prerequisites of the conceptual entity of the metrics. 3. Using C, the designer (manually) identifies possible dimensions and includes EX them in a Star schema according to their relevance. 4. A time dimension is added. 5. Consideration of aggregation constraints and classifying metrics as additive, semi-additive, or non-additive, according to their dimensions. The authors noted that their model includes a feedback loop to previous stages for model refinement. In case of a change, all the stages must be repeated. The entire procedure is performed manually, except for the identification of the existence prerequisites closure ( C ). Although the business model serves as the EX baseline, the designer should be fluent in the business processes, since some of the data warehouse creation activities are to be performed during the business process model creation (stages a-c). In addition, the derivation of data warehouse structures from business process models may lead to incompatibility between the modeled data warehouse metadata and the information offered by the operational data sources. This may lead either to extension to operational data sources or to a reconsideration of the data warehouse design. 20

29 3.2.2 From rule based customer relationship management to Star Schema (RBCRM) An algorithm by Kim, Lee, Lee and Chun [ 21] derives data warehouse structures from their own rule-based Customer Relationship Management (CRM) campaigns. A campaign rule in rule-based CRM defines sales campaigns and is supposed to include predefined rule measures (how a campaign action success is to be measured), actions (promotion actions along with their arguments and measures) and terms (calculated figures with links to relevant objects). All of these are formally defined. The derivation algorithm is comprised of three main stages: 1. Dimension tables are created by identifying all the relevant terms (database fields) from the campaign definition and classified into dimensions according to key fields of their database tables. 2. Fact tables are created by utilizing the predefined rule measures. 3. Indexing considerations and query definitions. The almost full automation proposed in this method relies totally on the formal definitions of rule based CRM, including definitions of rule terms, campaign actions and campaign rules. These formal definitions include predefinition of campaign measures, which then serve as facts. As this method is clearly content- and approach-specific, we exclude it from further methods comparison. In the following section we evaluate the aforementioned techniques according to a set of predefined criteria. 21

30 3.3. Evaluating techniques for data warehouse model construction from an operational system model In this section, we evaluate the surveyed techniques according to a set of criteria that examine the efficiency, expressiveness, and accessibility of the data warehouse design process. We first define the criteria and then evaluate the techniques based on these criteria Evaluation criteria The set of criteria and their desired values and importance are as follows. 1. Process automation: As the techniques are aimed to create a multidimensional schema from the underlying operational one, it is important that the transformation process will be as automatic as possible. Providing the designer with an automatic design tool makes it possible to make the transformation procedure transparent, while significantly shortening the data warehouse design process. The process automation consists of the following stages: fact definition, initial model creation, dimensions definition, model refinement, and summarizability constraints. The importance of this criterion is high, since automation increases the utilization of the method. 2. Applicability to large-scale systems: Large-scale heterogeneous operational systems provide the most difficult challenge in data warehouse modeling. Utilization of operational system structure specification seems to make it easier to construct an enterprise data warehouse. However, the multiplicity of the manual actions and their complexity make the methods applicable only on small-scale systems. In real-world enterprise operational systems (especially in ERP systems), which may include up to tens of thousands of entities, browsing the operational systems ERDs to discover relevant entities and considering 22

31 them as candidates for facts or summarized attributes becomes unrealistic. The importance of this criterion is high, as most of the data warehouses are based on large heterogeneous domains. 3. Required business processes knowledge: Since DW design is most commonly done by experts other than those who performed the operational system specification, it is desirable that the transformation process requires as little as possible knowledge about the business processes analyzed and specified within the operational system specification. The importance of this criterion is medium-high, since less knowledge required means shorter training times. 4. Handling business process analysis: The importance of DW lies in its ability to analyze the business processes. The extent to which the method enables business process analysis is important, so the importance of this criterion is high. Indeed, business process analysis is the main motivation behind numerous data warehousing projects. 5. Mapping relevant dimensional attributes: the ability to pick up relevant dimensional attributes, namely, the attributes upon which the data is to be analyzed using the data warehouse. The importance of this criteria is mediumlow, since producing cubes with unnecessary dimensional attributes may lead to unneeded growth in database sizes, but will not harm user s analysis capabilities. 6. Compliance with a modeling standard or formalism: does the method assure compliance of the resulting data warehouse model to a formal standard? The importance of this criterion is low, as currently there is no common standard. 23

32 7. Data transformation from the operational system to the data warehouse: The designed data warehouse is to be populated with data using the operational data sources. To ease the Extract, Transform and Load (ETL) processes, it is important that the resulting data warehouse structures be as close as possible to the data sources. It is desirable not to use data that does not exist in the operational data sources and to minimize the amount of calculations needed to load the operational data into data warehouse. The importance of this criterion is high, since this transformation is essential before any data analysis can be performed Techniques evaluation In this section we evaluate the techniques described in Sections 3.1 and 3.2 using the criteria listed in Section Process automation: Most of the methods examined in the previous section provide means for creating semi-automatic translation into a data warehouse specification. We split the schema creation process into several stages and evaluate the extent to which each technique supports automation within each one of the stages. a. Facts definition - In ERDFM, XDDFM, SERM, ERFS, and GER, relevant facts (measurements) are selected manually by choice of the relevant entities and attributes. In SOM, it is up to the designer to identify possible metrics after selecting entities. In MER, a candidate schema is created for every numerical field, which automates this stage by leaving the manual work to later stages. b. Initial model creation - Initial model creation refers to the identification of fact tables (entities) and the names of all the dimensions without 24

33 coping with the issues of dimensional hierarchies and attributes. ERDFM and XDDFM automate this stage as part of creating an initial attribute tree. In ERFS, this stage is semiautomatic. It includes manual determination of functional dependencies between entities and then automatic establishment of dimensions using terminal functional dependencies, in which facts take part. In SERM and SOM, this stage is manual. In GER this stage is not performed, as the selection of the target model in the next stage yields dimension architecture. MER automates this stage as part of the automatic creation of candidate schemata. c. Dimensions (hierarchies) definition - In XDDFM, SERM, ERFS, GER and SOM the definition of the internal dimensional structure is done manually. ERDFM, ERFS and GER utilize functional dependencies/existence prerequisites to assist the designer in performing this task. MER keeps its automatic candidate creation policy all the way through, but then requires a lengthy query suitability analysis to select the proper schemes. d. Refinement - All the techniques allow the designer to perform a manual scheme refinement. e. Summarizability Constraints - ERDFM, XDDFM, SERM and SOM allow manual addition of summarizability constraints (definition of what aggregation functions on a subset of dimensional attributes are impossible). ERFS allows classification of possible aggregations into four predefined categories. GER and MER do not address summarizability constraints. 25

34 2. Applicability to large-scale systems: Identifying hierarchies by functional dependencies or many-to-one relations can result in hundreds of functional dependencies or paired entities with different levels of relevance to the analysis. Most of the methods do not provide any assistance in this task. SERM and SOM are slightly better than other methods in this matter, as they use Structured Entity Relationship schemata, which make it easier to select fact entities by concentrating on the most dependent entities on the right hand side of the diagram. Unfortunately, such a topological ordering is not enough to assist in the creation of a data warehouse specification from a largescale system because the amount of transactional entities is also high. 3. Required business process knowledge: Although most of the methods (ERDFM, XDDFM, SERM, ERFS, GER, MER) utilize structural models of the operational systems, one needs to be acquainted with the business processes to be able to use them successfully, since such knowledge is needed to select relevant transactional entities, make decisions in dimensional design, and define summarizability constraints. The methods do not utilize the information about functionality of underlying operational system. In SOM, data warehouse modeling is integrated into business process design, so the data warehouse modeler can get help in understanding business processes. 4. Handling business process analysis: A user s initiation of a Data Warehousing project is typical motivated by: I would like to analyze my customer requests handling process, to discover flaws and exceptional cases rather than I need a system to present the sales orders This means that the process, rather than the data, is to be analyzed. Construction of data warehouse models from structural (data) models (ERDFM, XDDFM, SERM, ERFS, 26

35 GER, MER), however, lacks the process perspective. These methods use the data of the system, and not its operational semantics, which is expressed through processes. Disregarding the process perspective hampers the ability of the model to be used throughout the conceptual design, as the modeling technique can only express the last stage of the solution, namely, the actual structure of the metadata in the data warehouse, rather than the needs and the solution path. SOM, however, does achieve this goal by identifying data warehouse structures in the context of business processes and their data flow. 5. Mapping relevant dimensional attributes: The methods presented above use functional dependencies or one-to-many relationships to initially select dimensional attributes, and leave the final decision whether to leave a selected attribute or entity in the structure or not to the designer s judgment. This is due to the inability of ER diagrams to map this relevance in means other than structural relations. However, a given entity might be structurally related to many other entities, some of which are irrelevant to data analysis. 6. Compliance with a modeling standard or formalism: Lehner, Albrecht and Wedekind [ 23] suggested multidimensional normal form and generalized multidimensional normal form as guidelines for good schema design. These normal forms are aimed to assure summarizability of property attributes and validity of analytical computations. The ERFS method results are complying with the generalized multidimensional normal form, as proved in [ 19]. Other methods do not address this issue. 7. Data transformation from the operational system to the data warehouse: None of the compared methods considers Extract, Transform and Load issues directly. However, most of the methods derive the data warehouse entities 27

36 from the operational sources E/R diagrams, implying that the resulting structures are reachable by simple transformations of the operational sources. ERDFM, GER, MER, ERFS and SERM fall into this category, as they use the operational entities in their multidimensional models or apply simple manipulations (such as entity collapse along a 1-N relationship). XDDFM faces semi-structured XML data sources, which may complicate the ETL process. SOM utilizes business process models, and is not based on actual information offered by the operational sources. Based on these criteria and their importance, a comparison between the various data warehouse generation techniques is presented in Table 1, where each technique is rated as high (H), average (A), low (L), very low (VL), or irrelevant (I). Note that the process automation criterion was divided into five sub-criteria. To summarize, we surveyed and evaluated techniques for the transformations of an operational system model to a data warehouse model. The existing techniques provide a toolbox, rather than a heuristic, to derive a data warehouse structure from existing operational systems data models. Most of the methods provide no assistance in discovering facts (measures). ERDFM, XDDFM, ERFS, GER. SERM and SOM provide only a topological sorting of the SER scheme by existence dependencies, and general guidance according to which possible measures are to be found on the right-hand side of the SER diagram. MER simply generates a fact schema from every numerical field, which is not practical due to the multiplicity of created candidate schemes. Identification of relevant dimensional attributes does not help either, as the existing methods use all the entities related to the facts by a series of many-to-one relationships. These two reasons (problematic identification of facts 28

Data Warehouse Design

Data Warehouse Design Data Warehouse Design Modern Principles and Methodologies Matteo Golfarelli Stefano Rizzi Translated by Claudio Pagliarani Mc Grauu Hill New York Chicago San Francisco Lisbon London Madrid Mexico City

More information

Metadata Management for Data Warehouse Projects

Metadata Management for Data Warehouse Projects Metadata Management for Data Warehouse Projects Stefano Cazzella Datamat S.p.A. stefano.cazzella@datamat.it Abstract Metadata management has been identified as one of the major critical success factor

More information

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems

A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin

More information

Aligning an ERP System with Enterprise Requirements: An Object-Process Based Approach

Aligning an ERP System with Enterprise Requirements: An Object-Process Based Approach Aligning an ERP System with Enterprise Requirements: An Object-Process Based Approach Pnina Soffer 1, Boaz Golany 2 and Dov Dori 2 1 Haifa University, Carmel Mountain, Haifa 31905, Israel 2 Technion Israel

More information

Dimensional Modeling for Data Warehouse

Dimensional Modeling for Data Warehouse Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or

More information

When to consider OLAP?

When to consider OLAP? When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: erg@evaltech.com Abstract: Do you need an OLAP

More information

Integrations of Data Warehousing, Data Mining and Database Technologies:

Integrations of Data Warehousing, Data Mining and Database Technologies: Integrations of Data Warehousing, Data Mining and Database Technologies: Innovative Approaches David Taniar Monash University, Australia Li Chen LaTrobe University, Australia Senior Editorial Director:

More information

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented

Requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal-oriented A Comphrehensive Approach to Data Warehouse Testing Matteo Golfarelli & Stefano Rizzi DEIS University of Bologna Agenda: 1. DW testing specificities 2. The methodological framework 3. What & How should

More information

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING

META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com

More information

How To Model Data For Business Intelligence (Bi)

How To Model Data For Business Intelligence (Bi) WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

The Benefits of Data Modeling in Business Intelligence

The Benefits of Data Modeling in Business Intelligence WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing

Database Applications. Advanced Querying. Transaction Processing. Transaction Processing. Data Warehouse. Decision Support. Transaction processing Database Applications Advanced Querying Transaction processing Online setting Supports day-to-day operation of business OLAP Data Warehousing Decision support Offline setting Strategic planning (statistics)

More information

Logical Design of Data Warehouses from XML

Logical Design of Data Warehouses from XML Logical Design of Data Warehouses from XML Marko Banek, Zoran Skočir and Boris Vrdoljak FER University of Zagreb, Zagreb, Croatia {marko.banek, zoran.skocir, boris.vrdoljak}@fer.hr Abstract Data warehouse

More information

Designing a Dimensional Model

Designing a Dimensional Model Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and

More information

Multidimensional Modeling with UML Package Diagrams

Multidimensional Modeling with UML Package Diagrams Multidimensional Modeling with UML Package Diagrams Sergio Luján-Mora 1, Juan Trujillo 1, and Il-Yeol Song 2 1 Dept. de Lenguajes y Sistemas Informáticos Universidad de Alicante (Spain) {slujan,jtrujillo}@dlsi.ua.es

More information

Data Modeling Basics

Data Modeling Basics Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy

More information

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA

OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1 Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics

More information

How To Write A Diagram

How To Write A Diagram Data Model ing Essentials Third Edition Graeme C. Simsion and Graham C. Witt MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF ELSEVIER AMSTERDAM BOSTON LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

CHAPTER 5: BUSINESS ANALYTICS

CHAPTER 5: BUSINESS ANALYTICS Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse

More information

INTEROPERABILITY IN DATA WAREHOUSES

INTEROPERABILITY IN DATA WAREHOUSES INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

More information

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT BUILDING BLOCKS OF DATAWAREHOUSE G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT 1 Data Warehouse Subject Oriented Organized around major subjects, such as customer, product, sales. Focusing on

More information

Optimization of ETL Work Flow in Data Warehouse

Optimization of ETL Work Flow in Data Warehouse Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design

From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design From Enterprise Models to al Models: A Methodology for Data Warehouse and Data Mart Design Daniel L. Moody Department of Information Systems, University of Melbourne, Parkville, Australia 3052 email: d.moody@

More information

The Benefits of Data Modeling in Business Intelligence. www.erwin.com

The Benefits of Data Modeling in Business Intelligence. www.erwin.com The Benefits of Data Modeling in Business Intelligence Table of Contents Executive Summary...... 3 Introduction.... 3 Why Data Modeling for BI Is Unique...... 4 Understanding the Meaning of Information.....

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Conceptual Data Warehouse Design

Conceptual Data Warehouse Design Conceptual Data Warehouse Design Bodo Hüsemann, Jens Lechtenbörger, Gottfried Vossen Institut für Wirtschaftsinformatik Universität Münster, Steinfurter Straße 109 D-48149 Münster, Germany bodo.huesemann@uni-muenster.de

More information

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design

COURSE OUTLINE. Track 1 Advanced Data Modeling, Analysis and Design COURSE OUTLINE Track 1 Advanced Data Modeling, Analysis and Design TDWI Advanced Data Modeling Techniques Module One Data Modeling Concepts Data Models in Context Zachman Framework Overview Levels of Data

More information

Transformation of OWL Ontology Sources into Data Warehouse

Transformation of OWL Ontology Sources into Data Warehouse Transformation of OWL Ontology Sources into Data Warehouse M. Gulić Faculty of Maritime Studies, Rijeka, Croatia marko.gulic@pfri.hr Abstract - The Semantic Web, as the extension of the traditional Web,

More information

Institute of Research on Information Systems (IRIS) Course Overview

Institute of Research on Information Systems (IRIS) Course Overview Department of Supply Chain Management, Information Systems & Innovation Institute of Research on Information Systems (IRIS) Course Overview BACHELOR PROGRAM COURSES... 2 INFORMATION SYSTEMS DEVELOPMENT...

More information

Aligning an ERP system with enterprise requirements: An object-process based approach

Aligning an ERP system with enterprise requirements: An object-process based approach Computers in Industry 56 (2005) 639 662 www.elsevier.com/locate/compind Aligning an ERP system with enterprise requirements: An object-process based approach Pnina Soffer a, *, Boaz Golany b, Dov Dori

More information

Chapter 8 The Enhanced Entity- Relationship (EER) Model

Chapter 8 The Enhanced Entity- Relationship (EER) Model Chapter 8 The Enhanced Entity- Relationship (EER) Model Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Outline Subclasses, Superclasses, and Inheritance Specialization

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE CONCEPTUAL DESIGN

TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE CONCEPTUAL DESIGN IADIS International Journal on Computer Science and Information Systems Vol. 9, No. 1, pp. 43-54 ISSN: 1646-3692 TOWARDS A FRAMEWORK INCORPORATING FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS FOR DATAWAREHOUSE

More information

CHAPTER 4: BUSINESS ANALYTICS

CHAPTER 4: BUSINESS ANALYTICS Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

Lecture 12: Entity Relationship Modelling

Lecture 12: Entity Relationship Modelling Lecture 12: Entity Relationship Modelling The Entity-Relationship Model Entities Relationships Attributes Constraining the instances Cardinalities Identifiers Generalization 2004-5 Steve Easterbrook. This

More information

Measurement Information Model

Measurement Information Model mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides

More information

DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

More information

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak

IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak 174 No:13 Intelligent Information and Engineering Systems IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY Maria Kowal, Galina Setlak Abstract: in this paper the implementation of Data

More information

University of Gaziantep, Department of Business Administration

University of Gaziantep, Department of Business Administration University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.

More information

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application

More information

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial

More information

Mario Guarracino. Data warehousing

Mario Guarracino. Data warehousing Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the

More information

My Favorite Issues in Data Warehouse Modeling

My Favorite Issues in Data Warehouse Modeling University of Münster My Favorite Issues in Data Warehouse Modeling Jens Lechtenbörger University of Münster & ERCIS, Germany http://dbms.uni-muenster.de Context Data Warehouse (DW) modeling ETL design

More information

Data Warehouse: Introduction

Data Warehouse: Introduction Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimensional

More information

Encyclopedia of Database Technologies and Applications. Data Warehousing: Multi-dimensional Data Models and OLAP. Jose Hernandez-Orallo

Encyclopedia of Database Technologies and Applications. Data Warehousing: Multi-dimensional Data Models and OLAP. Jose Hernandez-Orallo Encyclopedia of Database Technologies and Applications Data Warehousing: Multi-dimensional Data Models and OLAP Jose Hernandez-Orallo Dep. of Information Systems and Computation Technical University of

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.

PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc. PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4

More information

Data Warehousing Systems: Foundations and Architectures

Data Warehousing Systems: Foundations and Architectures Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository

More information

Determining Preferences from Semantic Metadata in OLAP Reporting Tool

Determining Preferences from Semantic Metadata in OLAP Reporting Tool Determining Preferences from Semantic Metadata in OLAP Reporting Tool Darja Solodovnikova, Natalija Kozmina Faculty of Computing, University of Latvia, Riga LV-586, Latvia {darja.solodovnikova, natalija.kozmina}@lu.lv

More information

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches

Concepts of Database Management Seventh Edition. Chapter 9 Database Management Approaches Concepts of Database Management Seventh Edition Chapter 9 Database Management Approaches Objectives Describe distributed database management systems (DDBMSs) Discuss client/server systems Examine the ways

More information

THE DIMENSIONAL FACT MODEL: A CONCEPTUAL MODEL FOR DATA WAREHOUSES 1

THE DIMENSIONAL FACT MODEL: A CONCEPTUAL MODEL FOR DATA WAREHOUSES 1 THE DIMENSIONAL FACT MODEL: A CONCEPTUAL MODEL FOR DATA WAREHOUSES 1 MATTEO GOLFARELLI, DARIO MAIO and STEFANO RIZZI DEIS - Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy {mgolfarelli,dmaio,srizzi}@deis.unibo.it

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998

Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998 1 of 9 5/24/02 3:47 PM Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M. Firestone, Ph.D. White Paper No. Eight June 22, 1998 Introduction Dimensional Modeling (DM) is a favorite

More information

SQL Server 2012 Business Intelligence Boot Camp

SQL Server 2012 Business Intelligence Boot Camp SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations

More information

Graphical Web based Tool for Generating Query from Star Schema

Graphical Web based Tool for Generating Query from Star Schema Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604

More information

BUSINESS RULES AND GAP ANALYSIS

BUSINESS RULES AND GAP ANALYSIS Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE

INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE INTERACTIVE DECISION SUPPORT SYSTEM BASED ON ANALYSIS AND SYNTHESIS OF DATA - DATA WAREHOUSE Prof. Georgeta Şoavă Ph. D University of Craiova Faculty of Economics and Business Administration, Craiova,

More information

The Importance of Multidimensional Modeling in Data warehouses

The Importance of Multidimensional Modeling in Data warehouses COVER FEATURE Designing Data Warehouses with OO Conceptual Models Based on a subset of the Unified Modeling Language, the authors objectoriented approach to building data warehouses frees conceptual design

More information

Information Package Design

Information Package Design Information Package Design an excerpt from the book Data Warehousing on the Internet: Accessing the Corporate Knowledgebase ISBN #1-8250-32857-9 by Tom Hammergren The following excerpt is provided to assist

More information

A Service-oriented Architecture for Business Intelligence

A Service-oriented Architecture for Business Intelligence A Service-oriented Architecture for Business Intelligence Liya Wu 1, Gilad Barash 1, Claudio Bartolini 2 1 HP Software 2 HP Laboratories {name.surname@hp.com} Abstract Business intelligence is a business

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram

Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram Paper DM10 SAS & Clinical Data Repository Karthikeyan Chidambaram Cognizant Technology Solutions, Newbury Park, CA Clinical Data Repository (CDR) Drug development lifecycle consumes a lot of time, money

More information

MDM and Data Warehousing Complement Each Other

MDM and Data Warehousing Complement Each Other Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

More information

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology

Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Sales and Operations Planning in Company Supply Chain Based on Heuristics and Data Warehousing Technology Jun-Zhong Wang 1 and Ping-Yu Hsu 2 1 Department of Business Administration, National Central University,

More information

Business Process Oriented Development of Data Warehouse Structures

Business Process Oriented Development of Data Warehouse Structures Business Process Oriented Development of Data Warehouse Structures Michael Böhnlein, Achim Ulbrich-vom Ende University of Bamberg, Feldkirchenstr. 21, D-96045 Bamberg, Germany {michael.boehnlein,achim.ulbrich}@sowi.uni-bamberg.de

More information

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days

Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Understanding Data Warehousing. [by Alex Kriegel]

Understanding Data Warehousing. [by Alex Kriegel] Understanding Data Warehousing 2008 [by Alex Kriegel] Things to Discuss Who Needs a Data Warehouse? OLTP vs. Data Warehouse Business Intelligence Industrial Landscape Which Data Warehouse: Bill Inmon vs.

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Chapter 5 Foundations of Business Intelligence: Databases and Information Management 5.1 Copyright 2011 Pearson Education, Inc. Student Learning Objectives How does a relational database organize data,

More information

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

More information

Business Intelligence, Analytics & Reporting: Glossary of Terms

Business Intelligence, Analytics & Reporting: Glossary of Terms Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report

More information

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28

www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28 Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT

More information

Data Warehousing Concepts

Data Warehousing Concepts Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis

More information

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Chapter 5. Warehousing, Data Acquisition, Data. Visualization Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives

More information

Data Warehousing and OLAP Technology for Knowledge Discovery

Data Warehousing and OLAP Technology for Knowledge Discovery 542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories

More information

DIMENSIONAL MODELLING

DIMENSIONAL MODELLING ASSIGNMENT 1 TO BE COMPLETED INDIVIDUALLY DIMENSIONAL MODELLING Describe and analyse the dimensional modelling (DM) design feature allocated to you. (The allocation of a design feature to a student will

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management

More information

Model-Driven Data Warehousing

Model-Driven Data Warehousing Model-Driven Data Warehousing Integrate.2003, Burlingame, CA Wednesday, January 29, 16:30-18:00 John Poole Hyperion Solutions Corporation Why Model-Driven Data Warehousing? Problem statement: Data warehousing

More information

Lesson 8: Introduction to Databases E-R Data Modeling

Lesson 8: Introduction to Databases E-R Data Modeling Lesson 8: Introduction to Databases E-R Data Modeling Contents Introduction to Databases Abstraction, Schemas, and Views Data Models Database Management System (DBMS) Components Entity Relationship Data

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

More information

Course Design Document. IS417: Data Warehousing and Business Analytics

Course Design Document. IS417: Data Warehousing and Business Analytics Course Design Document IS417: Data Warehousing and Business Analytics Version 2.1 20 June 2009 IS417 Data Warehousing and Business Analytics Page 1 Table of Contents 1. Versions History... 3 2. Overview

More information

Advanced Data Warehouse Design

Advanced Data Warehouse Design Data-Centric Systems and Applications Advanced Data Warehouse Design From Conventional to Spatial and Temporal Applications Bearbeitet von Elzbieta Malinowski, Esteban Zimányi 1st ed. 2008. Corr. 2nd printing

More information

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc.

Technology in Action. Alan Evans Kendall Martin Mary Anne Poatsy. Eleventh Edition. Copyright 2015 Pearson Education, Inc. Copyright 2015 Pearson Education, Inc. Technology in Action Alan Evans Kendall Martin Mary Anne Poatsy Eleventh Edition Copyright 2015 Pearson Education, Inc. Technology in Action Chapter 9 Behind the

More information

Bridging the Gap between Data Warehouses and Business Processes

Bridging the Gap between Data Warehouses and Business Processes Bridging the Gap between Data Warehouses and Business Processes A Business Intelligence Perspective for Event-Driven Process Chains Veronika Stefanov 1 and Beate List 1 Women s Postgraduate College for

More information

Object-Process Methodology as a basis for the Visual Semantic Web

Object-Process Methodology as a basis for the Visual Semantic Web Object-Process Methodology as a basis for the Visual Semantic Web Dov Dori Technion, Israel Institute of Technology, Haifa 32000, Israel dori@ie.technion.ac.il, and Massachusetts Institute of Technology,

More information

CDC UNIFIED PROCESS PRACTICES GUIDE

CDC UNIFIED PROCESS PRACTICES GUIDE Purpose The purpose of this document is to provide guidance on the practice of Modeling and to describe the practice overview, requirements, best practices, activities, and key terms related to these requirements.

More information

Data warehouse design

Data warehouse design DataBase and Data Mining Group of DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, Data warehouse design DATA WAREHOUSE: DESIGN - 1 Risk factors Database

More information

Data Warehouse Requirements Analysis Framework: Business-Object Based Approach

Data Warehouse Requirements Analysis Framework: Business-Object Based Approach Data Warehouse Requirements Analysis Framework: Business-Object Based Approach Anirban Sarkar Department of Computer Applications National Institute of Technology, Durgapur West Bengal, India Abstract

More information