Optimizing the Data Warehouse Design by Hierarchical Denormalizing
|
|
|
- Ada Ferguson
- 10 years ago
- Views:
Transcription
1 Optimizing the Data Warehouse Design by Hierarchical Denormalizing Morteza Zaker, Somnuk Phon-Amnuaisuk, Su-Cheng Haw Faculty of Information Technology, Multimedia University, Malaysia Somnuk. Abstract: Data normalization and denormalization processes are common in database design community as these processes have a great impact on the underlying performance. Current data warehouse queries involve a set of aggregations and joining operations. Thus, normalization process is not a good choice as many relations need to be merged in order to answer queries involving aggregation. On the other hand, denormalization process engages a lot of administrative task. This task takes into account the documentation structure of the denormalization assessments, data validation, schedule of migrating of data and so on. In this paper, we show that the mentioned justifications can not be convincible reasons, under certain circumstances, to ignore the effects of denormalization. Until now denormalization techniques have been introduced for various types of database design. One of the techniques is hierarchical denormalization. Our experimental results indicate that the query response time is significantly decreased when the schema is deployed by hierarchical denormalization on a large dataset with multibillion records. Thus, we suggest that hierarchical denormalization could be considered as a fundamental method to enhance query processing performance. Key Words: Data warehouse, Normalization, Hierarchical denormalization,query processing 1 Introduction Data Warehouse (DW) is the foundation for Decision Support Systems (DSS) with large collection of information that can be accessed through an On-line Analytical Processing (OLAP) application. This large database stores current and historical data from several external data sources [1, 2]. The queries built on DW system are usually complex and include some join operations that incur high computational overhead. They generally consist of multi-dimensional grouping and aggregation operations. In other words, queries used in OLAP are much more complex than those used in traditional applications. The large size of DWs and the complexity of OLAP queries severely increase the execution cost of the queries and have a critical effect on the performance and productivity of DSS. [3, 4] At present, the majority of database software solutions for real-world applications are based on a normalized logical data model. Normalization is easy to implement; however, it has some limitations in supporting business application requirements [5]. denormalization is low degree of support for potential frequently update. Indeed, data warehouses entail fairly less data updates and mostly data are retrieved only in most transactions [2]. In other words, applying denormalization strategies is best suited to a data warehouses system due to infrequent updating. There are multiple ways to construct denormalization relationships for a database, such as Pre-Joined Tables, Report Tables, Mirror Tables, Split Tables, Combined Tables, Redundant Data, Repeating Groups, Derivable Data and Hierarchies[7, 8]. The focus of the paper is on utilizing Hierarchical denormalization to optimise the performance in DW. Although, designing, representing and traversing hierarchies are complex as compared to the normalized relationship, the main approach to reduce the query response time is by integrating and summarizing the data[9, 10]. Hierarchical denormalization is particularly useful in dealing with the growth of star schemas that can be found in most data warehouse implementations [5, 10]. Date [6] supports the fact that denormalization speeds up data retrieval, but one disadvantage of The open issue at this point is, according to the conventional wisdom, all relational database ISSN: ISBN:
2 designs should be based on a normalized logical data model [11]. Beyond this, even though the idea of improving the system by reducing table joins is not new, the decision for denormalization is hardly practical because it engages a lot of administrative devotion. This devotion takes the documentation structure of the denormalization assessments, data validation, schedules for migrating of data, and so on. On the contrary, there are some research studies which have indicated that denormalization may give better performance and a more flexible data structure for users[5, 10]. Altough normalization can organize data into well-balanced structure to optimize data accessability, there are still some deficiencies which decrease system performance[12, 13, 14]. It is also obvious that there are great diversities between the focus in database design of the academician and the IT practitioner. Certainly, denormalization is a process that has not attracted interest in the academic world but is a reasonable strategy in practical database community. In this paper, we demonstrate that: (i) The data model and structure in a data warehouse which involves many joins operation, is strongly suggested to be considered as a candidate to be denormalized by hierarchical technique. (ii) Although data maintenance in a normalized structure is easier as compared to denormalized structure, query processing time in the structure provided by hierarchical denormalizing is extraordinarily less than normalized structure. (iii) Query processing performance is significantly affected by building Bitmap Index on columns which is involved by denormolized implementation. In this paper, we give an overview of related works and background studies on normalization, denormalization and hierarchical denormalization in section 2. Section 3 defines a case study and performance methodology on a set of queries to compare the performances of normalized and denormalized table structures. Section 4 discusses the experimental results. Lastly, Section 5 concludes the paper. 2 Background 2.1 Normalization Normalization is a technique first mentioned by Codd [15] and has been deployed by Date[6] for determining the optimal logical design to simplify the relational design of an integrated database, based on the groupings of entities to make better performance in storage operations. While an entirely normalized database design decreases the inconsistencies, it can cause other kind of difficulties such as insufficient system response time for data storage and referential integrity problems due to the disintegration of natural data objects into relational tables [16]. 2.2 Denormalization Some pertinent explanations exist for denormalizing of a relational design to improve its performance. Realizing of these explanations will assist to identify of systems and entities as denormalization candidates. These explanations need to be considered to help designers to reach the mentioned identification. These explanations are: (i) critical queries which involve more than one entity. (ii) Frequent times of queries which must to be processed in on-line environments. (iii) A lot of calculations which need to be applied to single or many columns upon queries can be efficiently answered; (v) entities which need to be extracted in different ways during the same time and (iiv) to be aware about relational database which can brings better performance and enhanced access options that may increase the possibility for denormalization. The most OLAP processes within the data warehouse is to extract summarized and aggregated data, such as sums, averages, and trends to access aggregated and time series data with immediate display. The components which are best suited for denormalization in a data warehouse include: multidimensional analysis in a complex hierarchy, aggregation, and complicated calculations [5, 17, 18]. 2.3 Hierarchies From a technical point of view, a parent-child relationship refers to a hierarchy where a child has only one parent. A hierarchy is a collection of levels which can be drill-down. Drill-down refers to traversing of a hierarchy from the top levels of aggregation to the lower levels of detail [9, 10, 18]. To illustrate what is the structure of hierarchy, we show an example by [10]. Each member in a hierarchy is known as a node. The topmost node in a hierarchy is called the root node while the bottommost nodes are known as leaf nodes. A parent node is a node that has children, and a child node is a node which belongs to a parent. Apparent (except ISSN: ISBN:
3 activities, a business computer application may have to join several tables. Reducing the number of table joins will improve system response time. Figure 1: Hierarchy (balanced tree) a root node) may also be a child, and a child (except a leaf node) may also be a parent. Figure 1 shows such a hierarchy. Hierarchies are essential aspect of DWs. Thus, supporting different kind of hierarchies in the dimensional data, and allowing more flexibility in defining the hierarchies can enable a wider range of business scenarios to be modeled [19]. We outline several types of hierarchies in the data warehouse environment as follows [10]. 1. Balanced tree structure: In this structure, hierarchy has a consistent number of levels and each level can be named. Each child have one parent at the level immediately above it. 2. Variable depth tree structure: In this structure, the number of levels is inconsistent (such as a bill of materials) and each level cannot be named. 3. Ragged tree structure: This hierarchy has a maximum number of levels, each of which can be named and each child can have a parent at any level (not necessarily immediately above it). 4. Complex tree structure: In this hierarchy, a child may have multiple parents. 5. Multiple tree structures for the same leaf node [10]. 3 Related Works Bock and Schrage [16] have indicated that a number of factors affecting system response time are related to ineffective use of database management system tuning, insufficient hardware platforms, poor application programming techniques, and poor conceptual and physical database design. In their studies they focused on the effects of multiple table joins on the system response time. In order to construct an object view that managers need to extract data for their trade Hanus [11] has outlined the advantages of normalization process, such as, easing the designs process and physical implementation, reduces data redundancy and protects data from update and delete anomalies. He also showed that entities, attributes, and relations can easily be modified without restructing the entire table. Moreover, since the tables are smaller with fewer numbers of bytes, less physical storage is required. Beyond this, however, a join operation must be accomplished. In such cases, it might be mandatory to denormalize that data. Nevertheless, he agreed that denormalization must be used with care by understanding of how the data will be used. He has also shown that denormalization can be used to improve system response time without redundant data storage or incurring difficulty of data maintenance. Sanders and Shin [17] have presented a methodology for the assessment of denormalization effects, using relational algebra operations and query trees. They believed that denormalization is an intermediate step between logical and physical modeling, which is involved with analyzing the functionality of the design regarding to the applications requirements criteria. Nevertheless, their methodology is limited due to the lack of sufficient comparison of computation costs between the normalized and denormalized data structures. Shin and Sanders [20] have discussed the effects of denormalization on relational database system performance with using denormalization strategies as a database design methodology. They have presented four common denormalization strategies and evaluate their effects with illustrating the conditions where which strategies are most effective. They have concluded that denormalization methods may receive positive effects for database performance such as Data warehouse. 4 Methodology In order to compare efficiency of denormalization and normalization processes and analysis the performance of these data modeling, we build a series of queries on some columns for evaluation. In our dataset, there are 4 tables; Fact, D1, D2 and D1D2. Fact, D1 and D2 tables have approximately 1.7 billion of records and D1D2 table (a combination of D1 and D2 tables) ISSN: ISBN:
4 Figure 2: Schema 1 with normalized design Figure 3: Schema 2 with denormalized design has approximately 3.36 billion records. These records were randomly generated using PL/SQL Block by Oracle11G tools. These tables can be categorized into two database schemas, Schema 1 and Schema 2 which are portrayed in Figure 2 and Figure 3 respectively. In Schema 1, the tables were applied by normalization modeling where D1 table is connected to the D2 table by one-to-many relationship and similarly, D2 table is also connected to the Fact table by one-to-many relationship. In Schema 2, D1D2 table is directly connected to the Fact table by one to many relationship. The D1D2 table is implemented by hierarchical technique. All attributes, except the keys(pk), of the dimensions are associated by Bitmap index; Schema 1 are contained by 5 indexed columns while Schema 2 are contained by 3 indexed attributes. Schema 1 has fact data which is chained with huge amount of data stored in D2 and D1 dimensions (shown in Figure 2) while Schema 2 contains the fact table and one dimension table implemented by hierarchy technique (shown in Figure 3). 4.1 Query Set The Set Query Benchmark has been used for frequentquery application as much like as Star-Schema in the data warehouse design [22, 23]. The queries of the Set Query Benchmark have been designed on based business analysis missions. In order to evaluate the time required for answering different query types including range, aggregation and join queries; we implemented the three (out of six) queries adopted from Set Query Benchmark[22, 23]. Briefly, we describe all of our selected SQL queries used for our performance measurements as indicated in Table 1. Basically, for each query, we used suffix A to represent query on Schema 1 and suffix B to represent query on Schema 2. Table 1: Descriptions for Set Queries Query Description Set-Query 1 Schema 1 Q1A: One-Dimensional query which SELECT count (*) FROM D2 involve only one column at a WHERE D2-Name = abcdefgh time in the WHERE clause.[21] Q1B: Schema 2 SELECT count (*) FROM D1D2 WHERE D1D2-name = abcdefgh ; Query Description Set-Query 2 Q2A: SELECT D2-name FROM D2 WHERE (D2-agg between and Schema 1 or D2-agg between and or D2-agg between and or D2-agg between and or D2-agg between and ) Q2B: SELECT D1D2-name FROM D1D2 WHERE (D2-agg between and Schema 2 or D1D2-agg between and or D1D2-agg between and or D1D2-agg between and or D1D2-agg between and ) Query Description Set-Query 3 Q3A: Schema 1 SELECT D2-name, count (*) FROM D2 GROUP BY D2-name. Q3B: Schema 2 SELECT D1D2-name, count (*) FROM D2 GROUP BY D1D2-name. ISSN: ISBN:
5 Query Description Set-Query 4 Q4A: Schema 1 SELECT * FROM D1, D2 WHERE D1.D1-name= abcefgh and D1.D1-id=D2.D1-id Q4B: Schema 2 SELECT * FROM D1D2 WHERE D1D2-name= abcefgh Table 2: Information about the test system CPU Pentium 4 (2.6 GHZ) Disk 7200 RPM, 500 GB Memory 1 GB Database Oracle11G Query Description Set-Query 5 Schema 1 Q5A: SELECT D1. D1.Name, sum( fact1.f1*d2. D2-Agg) FROM D1,D2, fact WHERE D1. D1.id = D2. D2-Id and D2. D2-Id = Fact. D2-Id Group by D1. D1-name Q5B: SELECT D1D2. D1D2-NAME,sum(T.jam) FROM(SELECT D1D2. D1D2-ID Id,( Schema 2 D1D2. D1-PARENT-ID pid,fact.f1 * D1D2. D1D2-AGG ) as jam FROM D1D2, fact WHERE D1D2. D1D2-ID = fact. D2-ID ) T,D1D2 WHERE D1D2. D1D2-ID = T.pid GROUP BY D1D2. D1D2-NAME Query Description Set-Query 6 Schema 1 Schema Experimental Setup Q6A: SELECT D1. D1-NAME,D2. D2-NAME, D2. D2-AGG FROM D1,D2 WHERE D1. D1-ID = D2. D1-ID Q6B: SELECT D1D2. D1D2-NAME D1, connect by root D1D2. D1D2-NAME, D1D2. D1D2-AGG D2 FROM D1D2 WHERE level > connect by prior D1D2. D1D2-ID = D1D2. D1-PARENT - ID We performed our tests on the Microsoft Windows Server 2003 machine with Oracle11G database systems. Table 2 shows some basic information about the test machines and the disk system. To make sure the full disk access time is accounted for we disable all unnecessary services in the system and keep the same condition for each query. To avoid inaccuracy, all queries are run 4 consecutive times to give an average elapsed time. 5 Results and Discussions 5.1 Query Response Time In this section, we show and discuss on the time required to answer the queries. These timing measurements directly reflect the performance of normalizing or denormalizing methods. A summary of all the timing measurements on several kinds of queries is shown in Table 3. Table 3: Query response time (per seconds) First Schema(QXA) (Normalized) Second Schema(QXB) (Denormalized) Set-Query One-dimensional Set-Query Set-Query Set-Query Multi-dimensional Set-Query Set-Query One-Dimension Queries Firstly, we examine the performance on count queries (Set-Query 1). When the Schema is deployed by denormalizaion, it takes slightly more time to execute the queries. The time needed to answer higher amount of count queries is dominated by the time needed to answer the number of rows in the dimension. For example, Q1B applied on dimension with 3.6 billion records and Q1A applied on dimension with 1.7 billion records; we expect Q1B to take about twice as much time as Q1A. However from the results in Table 3, this estimation is not accurate, presumably because the initialization of the tablespace for Q1B and Q1A is easily associated with the Bitmap index techniques. This observation is accessed, because the average time used by normalized schema to read in the data blocks is nearly 0.020s (20 ms) and in denormalized schema is about 31 ms. ISSN: ISBN:
6 Next, we focus for Set-Query 2. We see that the time required by both of the schemas rises significantly. It is necessary to understand the retrieval time to compute how many pages or how many disk sectors are accessed for the retrieval operation. The query response time required to fetch data for Q2A and Q2B has the same doing as each other. The number of records by these queries that has to be selected are uniformly scattered among rows 100,000 and 100,000,000. Here, we also expect Q2B to take about twice as much time as Q2A. However, this estimation is not accurate. Consequently, the elapsed time of both schemas that is needed to answer the queries which are executed within a range of predicates is affected by the distribution of data; not by the normalization or denormalization conditions. Another query that can be a main way to promise the indexing effects is Set-Query 3. In Q3A and Q3B, we see that the response time of Q3B is almost twice as much time as Q3A. On the other hand, the required time to answer these queries is extremely more than the other one-dimensional queries (Set-Query1). This is because to execute this type of query, the optimizer will not make use of any indexes. Rather, it will prefer to do a full table scan. Since there is an abnormal growth of data, table scan will be needed to increase physical disk reads to avoid insufficient memory allocation. As the result, this does not scale very well as data volumes increase. Even though, there is one implementation of Bitmap indexe (FastBit) which can support these queries directly [13, 14, 21], but Oracle 11G has not utilize this method of implementation. In summary, Figure 4 shows the query elapse time for one-dimensional queries which were applied on the two schemas. This figure shows that although the query retrieval time on the Schema 1 (which has been designed by normalization method) is faster than Schema 2 (denormalized schema), query performance can be enormously enhanced by using of index techniques especially Bitmap index technique. 5.3 Multi-Dimension Queries In Set-Query 4, the denormalized solution is impressive since no table joins are required. We see that the denormalized schema query remains unchanged, but the normalized schema structure results in a query that joins the D1 and D2 tables. This resulted in slower system response time and performance will degrade as the table size increases. This query shows that if the First Normal Form tables be transformed by denormalization method, the query response time Figure 4: queries Query elapse times for one dimension can be extraordinary decreased as the number of join operations are reduced. Data aggregation is a frequent operation in data warehouse and OLAP environments. Data is either aggregated on the fly or pre-computed and stored as materialized views [18]. Since, data aggregation and data computation can really grow to be a complex process through time; having a good and well-define architecture to support dynamic business environments have more long term benefits with data aggregation and computation. In Set-Query 5, we see that the query response time of Q5B (denormolized schema) is excessively less than the Q5A. Thus, we claim that one of the component in a data warehouse that are good candidates for denormalization are aggregation, and complicated calculations. A flexible architecture leads to potential growth and flexibility lean towards exponential growth. The database software which supports the flexible architecture such as hierarchical methods in data warehouses need to use in state-of-the-arts components to reply complex aggregation needs. It should also be able to support all kinds of reports by using modern operators. These operators should extend the flexibility of hierarchical queries. Oracle does show promises in hyper-computational abilities to process more complex structures by up-to-date operators which other database software might not be able to. To the best of our knowledge and experience there are many operators in oracle11g which can promise the high flexibility in query writing. Provably, we see in Set-Query6 that using oracle operators can enhance speed of data extraction from hierarchical table. Query elapsed time in Q6B has been enormously ISSN: ISBN:
7 In this paper, we have presented an experimental examination of denormalization with essential guidelines to be applied in a data warehouse design. We have also shown set queries for the measurement of denormalization effects using hierarchical implementation. In order to elucidate the effects of hierarchical denormalization in further detail, we have invested endeavors in preparing a real test environment for measuring query retrieval times, system performance, ease of utilization and bitmap index effects. These experiments compared aggregation and computation costs between normalized and hierarchical denormalized data structures. Most probably hierarchical denormalization can improve query performance by reducing the query response times when the data structure in Data warehouse is involved by many joins operations. Consequently, these experimental results should possibly be considered as a general guideline which can be applicable to the majority database designs. Moreover hierarchical denormalization is recommended as a fundamental phase in a data warehouse data modeling which is rarely dependant applications requirements since the data warehouse is infrequently updated. Acknowledgements: Thanks God. Figure 5: Query elapse times for multi dimension queries which are involved by join operations decreased as comparing Q6A. Despite the complexity of query scripting, Oracle has long provided specific support for querying hierarchical data. This support comes in the form of the START WITH, CONNECT BY ISCYCLE pseudocolumn and so on which assist the designers to easily query hierarchical data. Figure 5 shows the query elapse time for multidimensional hierarchical queries which have been applied on first and second schemas. This figure shows that using hierarchical denormalization method can improve system response time when the queries are unanticipated ad hoc queries. 6 Conclusion References: [1] S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology. ACM SIGMOD RECORD, 1997 [2] R. Kimball and L. Reeves and M. Ross, The Data Warehouse Toolkit. John Wiley and Sons, NEW YORK, 2002 [3] W. H. Inmon, Building the Data Warehouse. John Wiley and Sons, 2005 [4] C. S. Park and M. H. Kim and Y. J. Lee, Rewriting olap queries using materialized views and dimension hierarchies in data warehouses. In Data Engineering, Proceedings. 17th International Conference on. [5] S. K. Shin and G. L. Sanders, Denormalization strategies for data retrieval from data warehouses. Decis. Support Syst. Oct. 2006, pp DOI= [6] C. J. Date, An Introduction to Database Systems, Addison-Wesley Longman Publishing Co., Inc, 2003 [7] C. S. Mullins, Database Administration: The Complete Guide to Practices and Procedures. Addison-Wesley, Paperback, June 2002, 736 pages, ISBN [8] C. Zaniolo and S. Ceri and C. Faloutsos and R. T. Snodgrass and V. S. Subrahmanian and R. Zicari,Advanced Database Systems. Morgan Kaufmann Publishers Inc [9] W. T. Joy Mundy, The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset. John Wiley and Sons, NEW YORK, [10] I. Claudia and N. Galemmo Mastering Data Warehouse Design -Relational And Dimensional. John Wiley and Sons, 2003, ISBN: [11] M. Hanus, To normalize or denormalize, that is the. question. In Proceedings of Computer Measurement Group s 1993 International Conference, pp [12] C. J. Date, The normal is so...interesting. Database Programming and Design. 1997, pp ISSN: ISBN:
8 [13] J. C. Westland, Economic incentives for database normalization. Inf. Process. Manage. Jan. 1992, pp DOI= W [14] D. Menninger, Breaking all the rules: an insider s guide to practical normalization. Data Based Advis. (Jan. 1995), pp [15] E. F Codd, The Relational Model for Database Management. In: R. Rustin (ed.): Database Systems, Prentice Hall and IBM Research Report RJ 987, 1972, pp [16] D. B. Bock and J. F. Schrage, Denormalization guidelines for base and transaction tables. SIGCSE Bull.(Dec. 2002), pp DOI= [17] G. Sanders and S. Shin, Denormalization Effects on Performance of RDBMS. In Proceedings of the 34th Annual Hawaii international Conference on System Sciences ( Hicss-34)-Volume 3 - Volume 3 (January 03-06, 2001). HICSS. IEEE Computer Society, Washington, DC, [18] C. Adamson, Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance. John Wiley and Sons, 2006, ISBN: [19] R. Strohm.Oracle Database Concepts 11g. Oracle, Redwood City,CA [20] S. K. Shin and G. L. Sanders, Denormalization strategies for data retrieval from data warehouses. Decis. Support Syst.(Oct. 2006), PP DOI= [21] E. E-O Neil and P. P-O Neil, Bitmap index design choices and their performance implications. Database Engineering and Applications Symposium. IDEAS th International, pp [22] P. O Neil, The Set Query Benchmark. In The Benchmark Handbook For Database and Transaction Processing Benchmarks. Jim Gray, Editor, Morgan Kaufmann, [23] P. ONeil and E. ONeil, Database Principles, Programming, and Performance. 2nd Ed. Morgan Kaufmann Publishers ISSN: ISBN:
Morteza Zaker ( Student ID :1061608853 ) [email protected]
Data Warehouse Design Considerations 1. Introduction Data warehouse (DW) is a database containing data from multiple operational systems that has been consolidated, integrated, aggregated and structured
INTERNATIONAL JOURNAL OF COMPUTERS AND COMMUNICATIONS Issue 2, Volume 2, 2008
1 An Adequate Design for Large Data Warehouse Systems: Bitmap index versus B-tree index Morteza Zaker, Somnuk Phon-Amnuaisuk, Su-Cheng Haw Faculty of Information Technologhy Multimedia University, Malaysia
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
Data Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
When to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
A Brief Tutorial on Database Queries, Data Mining, and OLAP
A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
Using the column oriented NoSQL model for implementing big data warehouses
Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 469 Using the column oriented NoSQL model for implementing big data warehouses Khaled. Dehdouh 1, Fadila. Bentayeb 1, Omar. Boussaid 1, and Nadia
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
Data Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 11/11/2013-1- Data Warehouse design DATA MODELLING - 2- Data Modelling Important premise Data warehouses typically reside on a RDBMS
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
Optimizing the Performance of Your Longview Application
Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not
Data Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
Data Warehousing Concepts
Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis
Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
QuickDB Yet YetAnother Database Management System?
QuickDB Yet YetAnother Database Management System? Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Radim Bača, Peter Chovanec, Michal Krátký, and Petr Lukáš Department of Computer Science, FEECS,
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0
SQL Server Technical Article Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0 Writer: Eric N. Hanson Technical Reviewer: Susan Price Published: November 2010 Applies to:
Data Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
MS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance
Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate
The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.
The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, [email protected] ABSTRACT Health Care Statistics on a state level is a
Chapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
Dimensional Modeling for Data Warehouse
Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or
DATABASE MANAGEMENT SYSTEM
REVIEW ARTICLE DATABASE MANAGEMENT SYSTEM Sweta Singh Assistant Professor, Faculty of Management Studies, BHU, Varanasi, India E-mail: [email protected] ABSTRACT Today, more than at any previous
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions. A Technical Whitepaper from Sybase, Inc.
PowerDesigner WarehouseArchitect The Model for Data Warehousing Solutions A Technical Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Data Warehouse Modeling.....................................4
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
www.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 28
Data Warehousing - Essential Element To Support Decision- Making Process In Industries Ashima Bhasin 1, Mr Manoj Kumar 2 1 Computer Science Engineering Department, 2 Associate Professor, CSE Abstract SGT
SAS BI Course Content; Introduction to DWH / BI Concepts
SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console
ISM 318: Database Systems. Objectives. Database. Dr. Hamid R. Nemati
ISM 318: Database Systems Dr. Hamid R. Nemati Department of Information Systems Operations Management Bryan School of Business Economics Objectives Underst the basics of data databases Underst characteristics
Data Warehouses & OLAP
Riadh Ben Messaoud 1. The Big Picture 2. Data Warehouse Philosophy 3. Data Warehouse Concepts 4. Warehousing Applications 5. Warehouse Schema Design 6. Business Intelligence Reporting 7. On-Line Analytical
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
LEARNING SOLUTIONS website milner.com/learning email [email protected] phone 800 875 5042
Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300
Data Warehousing and Data Mining
Data Warehousing and Data Mining Part I: Data Warehousing Gao Cong [email protected] Slides adapted from Man Lung Yiu and Torben Bach Pedersen Course Structure Business intelligence: Extract knowledge
Data Warehousing: Data Models and OLAP operations. By Kishore Jaladi [email protected]
Data Warehousing: Data Models and OLAP operations By Kishore Jaladi [email protected] Topics Covered 1. Understanding the term Data Warehousing 2. Three-tier Decision Support Systems 3. Approaches
New Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
ETL-EXTRACT, TRANSFORM & LOAD TESTING
ETL-EXTRACT, TRANSFORM & LOAD TESTING Rajesh Popli Manager (Quality), Nagarro Software Pvt. Ltd., Gurgaon, INDIA [email protected] ABSTRACT Data is most important part in any organization. Data
E-government Supported by Data warehouse Techniques for Higher education: Case study Malaysian universities
E-government Supported by Data warehouse Techniques for Higher education: Case study Malaysian universities Mohammed Abdulameer Mohammed Universiti Utara Malaysia Kedah, Malaysia Ahmed Rasol Hasson Babylon
Metadata Technique with E-government for Malaysian Universities
www.ijcsi.org 234 Metadata Technique with E-government for Malaysian Universities Mohammed Mohammed 1, Ahmed Hasson 2 1 Faculty of Information and Communication Technology Universiti Teknikal Malaysia
Whitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com
Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Description: This five-day instructor-led course teaches students how to design and implement a BI infrastructure. The
Real Life Performance of In-Memory Database Systems for BI
D1 Solutions AG a Netcetera Company Real Life Performance of In-Memory Database Systems for BI 10th European TDWI Conference Munich, June 2010 10th European TDWI Conference Munich, June 2010 Authors: Dr.
Data Warehousing with Oracle
Data Warehousing with Oracle Comprehensive Concepts Overview, Insight, Recommendations, Best Practices and a whole lot more. By Tariq Farooq A BrainSurface Presentation What is a Data Warehouse? Designed
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and
Business Intelligence and Column-Oriented Databases
Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 [email protected] Nikola Modrušan
Deriving Business Intelligence from Unstructured Data
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving
University of Gaziantep, Department of Business Administration
University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.
Designing a Dimensional Model
Designing a Dimensional Model Erik Veerman Atlanta MDF member SQL Server MVP, Microsoft MCT Mentor, Solid Quality Learning Definitions Data Warehousing A subject-oriented, integrated, time-variant, and
Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:
Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Healthcare Big Data Exploration in Real-Time
Healthcare Big Data Exploration in Real-Time Muaz A Mian A Project Submitted in partial fulfillment of the requirements for degree of Masters of Science in Computer Science and Systems University of Washington
Data warehousing with PostgreSQL
Data warehousing with PostgreSQL Gabriele Bartolini http://www.2ndquadrant.it/ European PostgreSQL Day 2009 6 November, ParisTech Telecom, Paris, France Audience
Microsoft Data Warehouse in Depth
Microsoft Data Warehouse in Depth 1 P a g e Duration What s new Why attend Who should attend Course format and prerequisites 4 days The course materials have been refreshed to align with the second edition
5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2
Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on
Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner
24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB Badal K. Kothari 1, Prof. Ashok R. Patel 2 1 Research Scholar, Mewar University, Chittorgadh, Rajasthan, India 2 Department of Computer
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
Dimensional Modeling and E-R Modeling In. Joseph M. Firestone, Ph.D. White Paper No. Eight. June 22, 1998
1 of 9 5/24/02 3:47 PM Dimensional Modeling and E-R Modeling In The Data Warehouse By Joseph M. Firestone, Ph.D. White Paper No. Eight June 22, 1998 Introduction Dimensional Modeling (DM) is a favorite
Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies
Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized
MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus
MIS636 AWS Data Warehousing and Business Intelligence Course Syllabus I. Contact Information Professor: Joseph Morabito, Ph.D. Office: Babbio 419 Office Hours: By Appt. Phone: 201-216-5304 Email: [email protected]
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
CS2032 Data warehousing and Data Mining Unit II Page 1
UNIT II BUSINESS ANALYSIS Reporting Query tools and Applications The data warehouse is accessed using an end-user query and reporting tool from Business Objects. Business Objects provides several tools
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
Paper 3510-2015 SAS Visual Analytics: Emerging Trend in Institutional Research Sivakumar Jaganathan, Thulasi Kumar University of Connecticut
Paper 3510-2015 SAS Visual Analytics: Emerging Trend in Institutional Research Sivakumar Jaganathan, Thulasi Kumar University of Connecticut ABSTRACT Institutional research and effectiveness offices at
Deductive Data Warehouses and Aggregate (Derived) Tables
Deductive Data Warehouses and Aggregate (Derived) Tables Kornelije Rabuzin, Mirko Malekovic, Mirko Cubrilo Faculty of Organization and Informatics University of Zagreb Varazdin, Croatia {kornelije.rabuzin,
A Model-based Software Architecture for XML Data and Metadata Integration in Data Warehouse Systems
Proceedings of the Postgraduate Annual Research Seminar 2005 68 A Model-based Software Architecture for XML and Metadata Integration in Warehouse Systems Abstract Wan Mohd Haffiz Mohd Nasir, Shamsul Sahibuddin
OLAP, Knowledge Discovery from Database, Social Security Fund, Oracle Warehouse Builder, Oracle Discoverer.
ABSTRACT Mohamed Salah GOUIDER 1, Amine FARHAT 2 BESTMOD Laboratory Institut Supérieur de Gestion 41, rue de la liberté, cite Bouchoucha Bardo, 2000, Tunis, TUNISIA [email protected] 1, [email protected]
A COMPARATIVE STUDY BETWEEN THE PERFORMANCE OF RELATIONAL & OBJECT ORIENTED DATABASE IN DATA WAREHOUSING
A COMPARATIVE STUDY BETWEEN THE PERFORMANCE OF RELATIONAL & OBJECT ORIENTED DATABASE IN DATA WAREHOUSING Dr. (Mrs Pushpa Suri) 1 and Mrs Meenakshi Sharma 2 1 Associate professor, Department of Computer
OLAP and OLTP. AMIT KUMAR BINDAL Associate Professor M M U MULLANA
OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,
Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.
Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services By Ajay Goyal Consultant Scalability Experts, Inc. June 2009 Recommendations presented in this document should be thoroughly
Fluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
The Cubetree Storage Organization
The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction
Database Design Patterns. Winter 2006-2007 Lecture 24
Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each
Jet Data Manager 2012 User Guide
Jet Data Manager 2012 User Guide Welcome This documentation provides descriptions of the concepts and features of the Jet Data Manager and how to use with them. With the Jet Data Manager you can transform
Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries
Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters
Optimizing Your Data Warehouse Design for Superior Performance
Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Data Warehousing. Jens Teubner, TU Dortmund [email protected]. Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1
Jens Teubner Data Warehousing Winter 2015/16 1 Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2015/16 Jens Teubner Data Warehousing Winter 2015/16 13 Part II Overview
Building Cubes and Analyzing Data using Oracle OLAP 11g
Building Cubes and Analyzing Data using Oracle OLAP 11g Collaborate '08 Session 219 Chris Claterbos [email protected] Vlamis Software Solutions, Inc. 816-729-1034 http://www.vlamis.com Copyright 2007,
