Data Management in Forecasting Systems: Case Study Performance Problems and Preliminary Results
|
|
- Jemimah Lloyd
- 8 years ago
- Views:
Transcription
1 Data Management in Forecasting Systems: Case Study Performance Problems and Preliminary Results Haitang Feng 1,2, Nicolas Lumineau 1, Mohand-Saïd Hacid 1, and Richard Domps 2 1 Université de Lyon, CNRS Université Lyon 1, LIRIS, UMR5205, F-69622, France {firstname.lastname}@liris.cnrs.fr 2 Anticipeo, 4 bis impasse Courteline, 94800, Villejuif, France {firstname.lastname}@anticipeo.com Résumé. La production de prévisionnels de ventes est un enjeu important dans la mise en place de stratégies commerciales pour les entreprises. Les applications permettant de calculer des prévisionnels, tels que Anticipeo dans notre cas, exploitent de grandes quantités de données issues des réalisations passées et des modèles statistiques souvent complexes. Les performances de ces applications dépendent de leur capacité à traiter et visualiser avec des latences acceptables pour un usage via le web des masses de données mises à jour périodiquement. La spécificité des données et les choix stratégiques de l entreprise font qu une migration vers des entrepôts de données n est pas toujours possible. Dans cet article, nous proposons une caractérisation des applications dédiées au calcul de prévisionnels et nous offrons une feuille de route permettant d optimiser un système de gestion de bases de données aux niveaux physique et logique pour réduire les latences liées à l accès aux données. Nous montrons que certaines solutions théoriques ne sont pas applicables dans le contexte du calcul de prévisionnels et que dans le cas concret de l application Anticipeo des gains réels ont été obtenus. Keywords: Query optimization, performance, DBMS Tuning, OLAP 1 Introduction Forecasting systems are based on historical information and use specific methods (typically statistical models) to derive decisional information and present them in an easily exploitable way. These systems are an important issue in an industrial context because strategic decisions partly depend on forecasting. The features of a good forecasting system are: economy, availability, plausibility, simplicity and accuracy[1]. Therefore, a forecasting system has several singularities that the well-known solutions of data warehousing[7] do not consider. These singularities concerns the limited life duration of data and its non-reusability in future forecasting. Moreover, data updates could occur which require to update derived analyses[8]. Finally, its role in the business intelligence implies data access latency has to be acceptable. Our concrete case study is Anticipeo 3. Currently Anticipeo application has a number of performance limits. Performance issues exist both in the forecasts calculation stage and in the exploration stage on which data are visualized and updates are performed. This paper mainly focuses on the exploration stage performed via a web browser. The contributions of this paper are the following: we first characterize forecasting systems w.r.t. requirements, data and queries. Then, key elements at the logical and physical level of the database have been investigated in order to provide some recommendations regarding optimization strategies. These recommendations could also be useful for other forecasting applications. Finally, preliminary solutions are evaluated with the focus on the benefits we got and the bounds of existing solutions stemming from the data warehouse domain in this specific case. The paper is organized as follows : Section 2 presents the features of Anticipeo and forecasting systems in general, in terms of process, data and queries, Section 3 describes some related works and the optimization solutions we consider. In Section 4, we show the experimental results. We conclude in Section 5. 2 Forecasting Application Features In this section, we discuss the case of Anticipeo with more details and then define generic features of forecasting systems. Research partially supported by the French Agency ANRT ( and Anticipeo ( 3 Anticipeo provides software and service for the forecasting and monitoring of sales activities.
2 Processing approach used in Anticipeo Anticipeo is one of the forecasting systems which uses statistical models to derive forecasting sales from historical sales. It helps decision makers or executives to adapt their business plan to future trends of the market. Every month, customers provide Anticipeo with their new achieved sales. Anticipeo integrates the data into the system, calculates forecasting data and prepares information for analysis. Customers can then navigate the forecasting sales via a devoted interface. If they perform some modifications on the forecasts, the visualization generator will be relaunched to produce updated presentations. Anticipeo uses relational databases to store information. Issues regarding the manipulation of very large tables at the calculation stage and at the visualization stage constitute the main problems that we tackle. To achieve a reliable result 4, the calculation engine deals with each sale individually with a corresponding statistical model (e.g., standard, seasonal, occasional, etc.). The treatment relies on 36-month historical data in addition to tens of parameters (e.g., weighting coefficient, serial correlation, seasonal coefficient, etc.) used in statistical models. As a consequence, the calculation stage is quite time-consuming. Regarding the exploration part, results are displayed in hierarchies. To show the sales trend, we display both 36-month historical data and 24-month predictive data. This means 60-month aggregations should be generated. In addition, we need some meta-data to understand/interpret the displayed data. As the number of sales could be very high, the tables used to display the results can easily reach gigabyte scale. Data in Anticipeo To analyze the results, three dimensions are defined: Customer, Product and Time. Since the sales are grouped by month, the time dimension is implicit. Only two dimensions are explored: Customer and Product. Each of these two dimensions is composed of several hierarchies such as geographical distribution for the customer dimension. Regarding the navigation of sales, materialized views containing all information about customer, product, hierarchy, purchase date and sales volume are created (called MV example in the rest of this paper) to guarantee a quick data access. The main schema of MV example is shown in Table 1. Attribute Name Attribute Type sales key customer key product key customer name varchar(25) product name varchar(25) hierarchy key parent hierarchy key dimension char(2) hierarchy number tinyint(3) hierarchy level tinyint(3) hierarchy name varchar(25) achieved sale 35 decimal(18,8)... (achieved sales) decimal(18,8) achieved sale 0 decimal(18,8) forecasting sale 1 decimal(18,8)... (forecasting sales) decimal(18,8) forecasting sale 24 decimal(18,8) fiscal year sales cumulation 1 decimal(18,8)... (fiscal year sales cumulation) decimal(18,8) fiscal year sales cumulation 4 decimal(18,8) Table 1. Schema for one of the materialized views used for efficient data display Queries in Anticipeo We gather and analyze the workload of users and we notice that there are in total four categories of typical queries performed by users. Most of the time, users require information on sales at different levels and according to different hierarchies. The data have been precomputed and stored in materialized views such as MV example, so that the response to queries is fast. The first category of queries consists of simple data retrieval from materialized views. In some cases, users need to modify forecasting results. The modification can occur on any level of any given hierarchy. When it happens, the system distributes the modification over the sales level and recomputes all the superior levels of hierarchies. Of course, the heavy work is performed by resorting to batch tasks. 4 Here, reliable stands for acceptable quality and acceptable approximation.
3 However, at the time a modification occurs, the system should provide users with updated information for the level of hierarchy where users perform the modification. Two other categories of typical queries are respectively updates on sales level and construction of a certain level of a certain hierarchy on the fly. Other manipulations on the interface consist of retrieving data across hierarchies. Unfortunately, unlike the simple browsing in the first case, it is almost impossible to build all the aggregates across hierarchies, because the number of combinations is tremendously high even if we consider only two hierarchies from two different dimensions at a time. So, the last category of queries consists of calculation of across-hierarchy information on the fly. Generic features of forecasting systems Sales forecasting systems often use multidimensional design structure to analyze resulting data so as to support better business decision-making. Queries on the data cubes are not only simple data retrieval, but also updates on part of the data cube. Compared to other information systems, the design of forecasting systems should comply with some requirements: Limited life duration: Forecasting calculation is periodically triggered. Once the period passed, the forecasts are no longer exploitable. For example, the weather or the traffic forecasts predicted for yesterday are not necessarily useful today. Non-reusable: The forecasting is a global calculation and it is almost impossible to reuse the former results to reduce future forecasting calculation. This means partial recomputation of precomputed results is not easy. Updates on built data cubes: As forecasting information results from a computation process, there is often no need for updates. However, in some cases, the system needs some corrections for some unpredictable situations: e.g., the impact of a car crash for a traffic forecasting system. Timeliness: Forecasting systems should provide just in time responses for users. Decision makers devise corresponding plans of purchasing, manufacturying, logistics, etc., according to sales forecasting. The delay between achieved data entering and new data forecasting should be as short as possible. 3 Optimization Guidelines Here, the problem we face is a performance problem with regards to the exploration of a multidimensional database using a relational database. Since the application is already operational, we should consider solutions that require less effort for their implementation. 3.1 Hardware and Operating System The first question we consider is whether our application is working in the appropriate environment. The main technical characteristics of the server in use are: two Intel Quad core Xeon-based 2.4GHz, 16GB RAM and one SAS disk of 500 GB. The operating system is a 64-bit Linux Debian system using EXT3 file system. Our focus in this audit is to inspect the hardware for three criteria: CPU, memory and I/O. 3.2 DBMS Configuration Anticipeo uses MySQL to implement and manage forecasting data. We would like to know if the DBMS is well tuned to support the workloads of the application. Since InnoDB is the main storage engine used here, the main MySQL system variables 5 manipulated during the tuning are innodb buffer pool size, innodb log file size, query cache size, innodb flush log at trx commit, key buffer size, etc. Some blind tuning has been done based on existing experimental results on different web services 6. The actual configuration is 8 GB, 800 MB, 64 MB, 0, and 512 MB, respectively. Additional benchmarking will be discussed in the following to determine whether the adopted configuration is efficient. If not, we will propose an optimal configuration. 3.3 Additional Materialized views The visualization of achieved and forecasting results by considering different dimensions is a typical Data Warehouse problem (using relational databases). In this domain, one of the most used solutions is to select useful intermediate results and store them as materialized views. Many approaches have been proposed for the selection of materialized views. The main idea is to use the greedy approaches[4, 6, 10]. These solutions pre-process the most beneficial intermediate results in a limited-space hypothesis to avoid complex computations so as to enhance data access. Extensions of these solutions consider also the maintenance cost[2] or large scale databases[5, 9], or make the set of materialized views dynamic according to the workload[3]. They have already been proved to bring significant improvements to data access. For this, we implemented the basic Greedy Algorithm. 5 See for variable definition 6
4 3.4 Database Design This application works on a large materialized view for result visualization. The advantage is obvious: we can omit time-consuming joins over tables containing millions of rows. However, it creates other problems, e.g., for queries which need to make several joins on this view to derive aggregations for superior levels. Our idea is to find a medium solution that can both avoid costly joins on different tables and reduce the time of self join of this table as well. The star schema or snowflake schema[11] (normalized version of a star schema) could be a solution. The first test is to simply break MV example into two, otherwise, we create two materialized views instead of one: one view contains only the hierarchical information and the other one contains all remaining information. The second step is to build a real star schema on existing data to study the impact on performance issues. 4 Experimental Results All the following evaluations were carried out in the environment presented in Section 3.1 and Observations on Current Implementation of the Application First of all, some experiments on different databases have been conducted to estimate the execution time of the application for both the calculation and the navigation processes. The information of different databases is given in Table 2 with the execution time on the calculation part and a query evaluation which refers to information retrieval across hierarchies. DB size Sales number MV size Calculation time(ct) CT/sale Query evaluation time(qet) QET/sale DB1 55.8GB GB 89.23min 8.02ms 18.28s 0.027ms DB2 61.7GB GB min 6.18ms 21.89s 0.021ms DB3 68.9GB GB min 6.37ms 43.31s 0.031ms DB4 81.7GB GB min 6.37ms 41.65s 0.020ms Table 2. Experimental databases characteristics and results The result shows that the calculation time and the query evaluation time is polynomial in respect to the sales number. This conclusion helps us to estimate the execution on a new database if we know the number of sales. The previous results also show one brake of the enterprise: in the case of a 55-GB database, it takes more than 18 seconds to answer an online user query. If the application considers larger databases, the response time can hardly be acceptable by the user. 4.2 Diagnosis of Latency Provenance Two diagnoses are performed in this part to determine the latency provenance. Program level: We first conduct an analysis on time distribution. For the mathematical calculation, time spent on the application server represents 30% of the total time and the remaining 70% is used to access the database. During the navigation, the part of the time spent on the application server represents less than 10% of the total time. Hardware level: In a second time, we are interested in better knowing the system behavior. We use SAR, one of Linux performance monitoring tools to collect and analyze system activity information. CPU is idle during on average 89.42% of time. Memory is normally used without swapping activities (0% during all the process time). Disk I/O also shows a normal activity with rare occurrences of device saturation according to the average number of read and written sectors. The diagnosis shows that the latency observed on the application is not due to hardware, neither to software. The performance issues of the database might be the source of the performance problem. 4.3 DBMS configuration We set values above and below the current settings of main variables presented in 3.2 to run some tests. The result is, for most of variables, the current value is already the optimal one. Only two variables innodb buffer pool size and innodb log file size could do better if defined as 12 GB and 1 GB, respectively. We then run a complete test with new DBMS settings on the whole system and we got a better performance of 7.32% with innodb log file size set to 1 GB. However, the current value of innodb buffer pool size is already optimal for a machine that serves at the same time as a database and a web server. So, the system is running with an almost optimal configuration.
5 Number of selected views Total evaluating cost Total gain no view materialized /4 tuples materialized % 1/2 tuples materialized % 3/4 tuples materialized % all views materialized % Table 3. Result of theoretical gain of implementing Greedy Algorithm 4.4 Selection of Materialized Views After the implementation of the classical greedy algorithm on a 55-GB database, the theoretical benefit can be estimated (see Table 3). We have done some adaptation during the implementation. We have noticed that the cardinality of nodes (# tuples of views) is highly skewed among the nodes in the lattice. Instead of limiting the number of selected views, we limit the number of tuples to be materialized. The result reveals a significant interest in precomputing materialized view candidates for this application. With only one quarter of materialized tuples, more precisely, rows, approximately 87.91% performance improvement can be achieved. Unfortunately, MySQL, the DBMS used in this case, supports neither materialized views nor automatic query rewriting by materialized views. Our results show the possibility of performance improvement if we had used another DBMS that supports automatic query rewriting. 4.5 Database Schema Modification We instantiate every query type presented in Section 2 and evaluate them on both the actual data schema and the new data schemas. Creating two separate materialized views Table 4 shows the results using the actual schema and two separate materialized views. We have an interesting improvement: an average gain of 8.08%, 4.03%, 17.13% and 12.09% respectively for the four query types. Regarding the response time of a data retrieval from precomputed data, there is some degradation Response time 1st try 2nd try 3rd try 4th try 5th try Average Gain Query type 1 Before After % Query type 2 Before After % Query type 3 Before After % Query type 4 Before After % Table 4. Evaluation of slow queries on actual and new data schema using two materialized views (in seconds) because an extra join has to be added to put the hierarchical information and the rest together. However, the evaluation time is still on the scale of tens of milliseconds. So, this newly created difference could be ignored for a web service. Using star schema In a second time, we make some more important changes on the data schema. We consider a star schema and implement data in fact table(s) and dimension tables. In this case, there are three dimensions Customer, Product and Time. Fact table stores foreign keys of the dimension tables and sales measures, i.e. turnover, quantity and price. This means that the former implicit time dimension becomes explicit. Due to some technical difficulties, we have to build these tables on two steps: create two dimension tables for customer and product and one fact table within the implicit time dimension and then create the time dimension table and the final fact table. We have conducted benchmarking on the two schemas and we have obtained some unexpected results shown Table 5 (here, DIM stands for dimension). The percentage of gain in comparison to the actual data schema shows that all other queries got a significant improvement using the star schema without time dimension instead of the one with time dimension except the query type 1. The particularity of this application is at least 36 month of sales are displayed in the interface to show the sales trend. In this case, the time dimension is rather an aggregation criteria than a condition of
6 Response time 1st try 2nd try 3rd try 4th try 5th try Average Gain Actual schema Query type 1 Without time DIM % With time DIM % Actual schema Query type 2 Without time DIM % With time DIM % Actual schema Query type 3 Without time DIM % With time DIM % Actual schema Query type 4 Without time DIM % With time DIM % Table 5. Evaluation of slow queries on actual and star data schemas with and without time dimension (in seconds) selection, which makes the query take more time to execute. For the query type 1, we define the month to be manipulated. This condition gives the query optimizer a chance to eliminate many tuples before entering the heavy part of the execution tree. So our conclusion is to design the star schema with adaption (here with only two dimensions). 5 Conclusion and Future Work We have presented four main optimization approaches that we consider in order to improve the performance of a forecasting system: hardware, DBMS configuration, DB design and selection of materialized views. In our context, we consider a real world application, Anticipeo, and we prove that with our solutions, the system presents a better performance. Further research work will include indexing strategies and parallelization. So far, we have not considered the improvement of the calculation performance. As mentioned in the introduction, former forecasting results are not used for future calculations. This means, at time t i, the database DB is at state S i. Once new forecasting information at time t i+δ has been derived, the database DB changes to S i+δ, which is completely different from S i. There could be some repeating intermediate results during the calculation and these intermediate results should be kept and used for future needs. We will also investigate this issue. References 1. T. K. BAN. Features of a good forecasting system, Jan E. Baralis, S. Paraboschi, and E. Teniente. Materialized views selection in a multidimensional database. In VLDB, pages , S. Biren, R. Karthik, and R. Vijay. A hybrid approach for data warehouse view selection. International Journal of Data Warehousing and Mining(IJDWM), 2:1 37, H. Gupta. Selection of views to materialize in a data warehouse. In ICDT, pages , H. Gupta and I. S. Mumick. Selection of views to materialize under a maintenance cost constraint. In ICDT, pages , V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD Conference, pages , W. H. Inmon. Building the Data Warehouse (3rd Edition). Wiley, R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition). Wiley, Y. Kotidis and N. Roussopoulos. Dynamat: A dynamic view management system for data warehouses. In SIGMOD Conference, pages , A. Shukla, P. Deshpande, and J. F. Naughton. Materialized view selection for multidimensional datasets. In VLDB, pages , P. Vassiliadis and T. K. Sellis. A survey of logical models for olap databases. SIGMOD Record, 28(4):64 69, 1999.
Index Selection Techniques in Data Warehouse Systems
Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES
More informationIndexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,
More informationPredictor Business Model - The Impact of Aggregate Modification on Raw Tuples
Data management in forecasting systems : optimization and maintenance Haitang Feng To cite this version: Haitang Feng. Data management in forecasting systems : optimization and maintenance. Other [cs.oh].
More informationMS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
More informationDimensional Modeling for Data Warehouse
Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or
More informationPerformance Counters. Microsoft SQL. Technical Data Sheet. Overview:
Performance Counters Technical Data Sheet Microsoft SQL Overview: Key Features and Benefits: Key Definitions: Performance counters are used by the Operations Management Architecture (OMA) to collect data
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationInvestigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses
Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Thiago Luís Lopes Siqueira Ricardo Rodrigues Ciferri Valéria Cesário Times Cristina Dutra de
More informationData Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
More informationDATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
More informationAlexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
More informationMethodology Framework for Analysis and Design of Business Intelligence Systems
Applied Mathematical Sciences, Vol. 7, 2013, no. 31, 1523-1528 HIKARI Ltd, www.m-hikari.com Methodology Framework for Analysis and Design of Business Intelligence Systems Martin Závodný Department of Information
More informationA Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
More informationDBMS / Business Intelligence, Business Intelligence / DBMS
DBMS / Business Intelligence, Business Intelligence / DBMS Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the
More informationHybrid OLAP, An Introduction
Hybrid OLAP, An Introduction Richard Doherty SAS Institute European HQ Agenda Hybrid OLAP overview Building your data model Architectural decisions Metadata creation Report definition Hybrid OLAP overview
More informationThe Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.
The Design and the Implementation of an HEALTH CARE STATISTICS DATA WAREHOUSE Dr. Sreèko Natek, assistant professor, Nova Vizija, srecko@vizija.si ABSTRACT Health Care Statistics on a state level is a
More informationData Warehousing Concepts
Data Warehousing Concepts JB Software and Consulting Inc 1333 McDermott Drive, Suite 200 Allen, TX 75013. [[[[[ DATA WAREHOUSING What is a Data Warehouse? Decision Support Systems (DSS), provides an analysis
More informationThe Cubetree Storage Organization
The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction
More informationAutomatic Selection of Bitmap Join Indexes in Data Warehouses
Automatic Selection of Bitmap Join Indexes in Data Warehouses Kamel Aouiche, Jérôme Darmont, Omar Boussaïd, Fadila Bentayeb To cite this version: Kamel Aouiche, Jérôme Darmont, Omar Boussaïd, Fadila Bentayeb
More informationUsing the column oriented NoSQL model for implementing big data warehouses
Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 469 Using the column oriented NoSQL model for implementing big data warehouses Khaled. Dehdouh 1, Fadila. Bentayeb 1, Omar. Boussaid 1, and Nadia
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationEnterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.
Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services By Ajay Goyal Consultant Scalability Experts, Inc. June 2009 Recommendations presented in this document should be thoroughly
More informationBUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
More informationReview. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies
Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized
More informationNew Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
More informationA Dynamic Load Balancing Strategy for Parallel Datacube Computation
A Dynamic Load Balancing Strategy for Parallel Datacube Computation Seigo Muto Institute of Industrial Science, University of Tokyo 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558 Japan +81-3-3402-6231 ext.
More informationSQL Server Analysis Services Complete Practical & Real-time Training
A Unit of Sequelgate Innovative Technologies Pvt. Ltd. ISO Certified Training Institute Microsoft Certified Partner SQL Server Analysis Services Complete Practical & Real-time Training Mode: Practical,
More informationApplication Tool for Experiments on SQL Server 2005 Transactions
Proceedings of the 5th WSEAS Int. Conf. on DATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 2006 30 Application Tool for Experiments on SQL Server 2005 Transactions ŞERBAN
More informationPerformance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability
More informationSAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform
SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform INTRODUCTION Grid computing offers optimization of applications that analyze enormous amounts of data as well as load
More informationCognos Performance Troubleshooting
Cognos Performance Troubleshooting Presenters James Salmon Marketing Manager James.Salmon@budgetingsolutions.co.uk Andy Ellis Senior BI Consultant Andy.Ellis@budgetingsolutions.co.uk Want to ask a question?
More informationlow-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
More informationData Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationIn-Memory Data Management for Enterprise Applications
In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University
More informationOracle BI Suite Enterprise Edition
Oracle BI Suite Enterprise Edition Optimising BI EE using Oracle OLAP and Essbase Antony Heljula Technical Architect Peak Indicators Limited Agenda Overview When Do You Need a Cube Engine? Example Problem
More informationInformation management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse
Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationSQL Server Business Intelligence on HP ProLiant DL785 Server
SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly
More informationPerformance Modeling and Analysis of a Database Server with Write-Heavy Workload
Performance Modeling and Analysis of a Database Server with Write-Heavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of
More informationA Critical Review of Data Warehouse
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin
More informationCHAPTER 5: BUSINESS ANALYTICS
Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse
More informationData Warehousing. Read chapter 13 of Riguzzi et al Sistemi Informativi. Slides derived from those by Hector Garcia-Molina
Data Warehousing Read chapter 13 of Riguzzi et al Sistemi Informativi Slides derived from those by Hector Garcia-Molina What is a Warehouse? Collection of diverse data subject oriented aimed at executive,
More informationUniversity of Gaziantep, Department of Business Administration
University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.
More informationDBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationImprove Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations
www.ijcsi.org 202 Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations Muhammad Saqib 1, Muhammad Arshad 2, Mumtaz Ali 3, Nafees Ur Rehman 4, Zahid
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationMicrosoft Dynamics NAV 2013 R2 Sizing Guidelines for On-Premises Single Tenant Deployments
Microsoft Dynamics NAV 2013 R2 Sizing Guidelines for On-Premises Single Tenant Deployments July 2014 White Paper Page 1 Contents 3 Sizing Recommendations Summary 3 Workloads used in the tests 3 Transactional
More informationPRACTICAL DATA MINING IN A LARGE UTILITY COMPANY
QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,
More informationDWEB: A Data Warehouse Engineering Benchmark
DWEB: A Data Warehouse Engineering Benchmark Jérôme Darmont, Fadila Bentayeb, and Omar Boussaïd ERIC, University of Lyon 2, 5 av. Pierre Mendès-France, 69676 Bron Cedex, France {jdarmont, boussaid, bentayeb}@eric.univ-lyon2.fr
More informationHigh performance ETL Benchmark
High performance ETL Benchmark Author: Dhananjay Patil Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 07/02/04 Email: erg@evaltech.com Abstract: The IBM server iseries
More informationwww.ijreat.org Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org) 1
Data Warehouse Security Akanksha 1, Akansha Rakheja 2, Ajay Singh 3 1,2,3 Information Technology (IT), Dronacharya College of Engineering, Gurgaon, Haryana, India Abstract Data Warehouses (DW) manage crucial
More informationMAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents
More informationColumnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0
SQL Server Technical Article Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0 Writer: Eric N. Hanson Technical Reviewer: Susan Price Published: November 2010 Applies to:
More informationEfficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
More informationData Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
More informationCHAPTER 4: BUSINESS ANALYTICS
Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the
More informationDelivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
More informationDatabase Economic Cost Optimization for Cloud Computing Adam Conrad Department of Computer Science Brown University Providence, RI
Database Economic Cost Optimization for Cloud Computing Adam Conrad Department of Computer Science Brown University Providence, RI ABSTRACT As the demand for cheaper commodity machines and large-scale
More informationUnlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov
Unlock your data for fast insights: dimensionless modeling with in-memory column store By Vadim Orlov I. DIMENSIONAL MODEL Dimensional modeling (also known as star or snowflake schema) was pioneered by
More informationBW-EML SAP Standard Application Benchmark
BW-EML SAP Standard Application Benchmark Heiko Gerwens and Tobias Kutning (&) SAP SE, Walldorf, Germany tobas.kutning@sap.com Abstract. The focus of this presentation is on the latest addition to the
More informationORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationOracle Database 11g Comparison Chart
Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More information2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000
2074 : Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 Introduction This course provides students with the knowledge and skills necessary to design, implement, and deploy OLAP
More informationDevelopment of a distributed recommender system using the Hadoop Framework
Development of a distributed recommender system using the Hadoop Framework Raja Chiky, Renata Ghisloti, Zakia Kazi Aoul LISITE-ISEP 28 rue Notre Dame Des Champs 75006 Paris firstname.lastname@isep.fr Abstract.
More informationIn-memory databases and innovations in Business Intelligence
Database Systems Journal vol. VI, no. 1/2015 59 In-memory databases and innovations in Business Intelligence Ruxandra BĂBEANU, Marian CIOBANU University of Economic Studies, Bucharest, Romania babeanu.ruxandra@gmail.com,
More informationOptimizing the Performance of Your Longview Application
Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not
More informationOLAP Visualization Operator for Complex Data
OLAP Visualization Operator for Complex Data Sabine Loudcher and Omar Boussaid ERIC laboratory, University of Lyon (University Lyon 2) 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France Tel.: +33-4-78772320,
More informationM2074 - Designing and Implementing OLAP Solutions Using Microsoft SQL Server 2000 5 Day Course
Module 1: Introduction to Data Warehousing and OLAP Introducing Data Warehousing Defining OLAP Solutions Understanding Data Warehouse Design Understanding OLAP Models Applying OLAP Cubes At the end of
More informationPerformance evaluation of Web Information Retrieval Systems and its application to e-business
Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,
More informationDesigning and Using Views To Improve Performance of Aggregate Queries
Designing and Using Views To Improve Performance of Aggregate Queries Foto Afrati 1, Rada Chirkova 2, Shalu Gupta 2, and Charles Loftis 2 1 Computer Science Division, National Technical University of Athens,
More informationRecommendations for Performance Benchmarking
Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationPostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework
More informationSpeeding ETL Processing in Data Warehouses White Paper
Speeding ETL Processing in Data Warehouses White Paper 020607dmxwpADM High-Performance Aggregations and Joins for Faster Data Warehouse Processing Data Processing Challenges... 1 Joins and Aggregates are
More informationData Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
More informationSAS BI Course Content; Introduction to DWH / BI Concepts
SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationOptimizing Your Data Warehouse Design for Superior Performance
Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex
More informationTowards the Next Generation of Data Warehouse Personalization System A Survey and a Comparative Study
www.ijcsi.org 561 Towards the Next Generation of Data Warehouse Personalization System A Survey and a Comparative Study Saida Aissi 1, Mohamed Salah Gouider 2 Bestmod Laboratory, University of Tunis, High
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationData Warehousing With DB2 for z/os... Again!
Data Warehousing With DB2 for z/os... Again! By Willie Favero Decision support has always been in DB2 s genetic makeup; it s just been a bit recessive for a while. It s been evolving over time, so suggesting
More informationSizing Logical Data in a Data Warehouse A Consistent and Auditable Approach
2006 ISMA Conference 1 Sizing Logical Data in a Data Warehouse A Consistent and Auditable Approach Priya Lobo CFPS Satyam Computer Services Ltd. 69, Railway Parallel Road, Kumarapark West, Bangalore 560020,
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationAn Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
More informationDx and Microsoft: A Case Study in Data Aggregation
The 7 th Balkan Conference on Operational Research BACOR 05 Constanta, May 2005, Romania DATA WAREHOUSE MANAGEMENT SYSTEM A CASE STUDY DARKO KRULJ Trizon Group, Belgrade, Serbia and Montenegro. MILUTIN
More informationCollege of Engineering, Technology, and Computer Science
College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology
More informationFluency With Information Technology CSE100/IMT100
Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999
More informationUnderstanding Data Locality in VMware Virtual SAN
Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data
More informationThe Methodology Behind the Dell SQL Server Advisor Tool
The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity
More informationData Warehousing. Paper 133-25
Paper 133-25 The Power of Hybrid OLAP in a Multidimensional World Ann Weinberger, SAS Institute Inc., Cary, NC Matthias Ender, SAS Institute Inc., Cary, NC ABSTRACT Version 8 of the SAS System brings powerful
More informationICOM 6005 Database Management Systems Design. Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001
ICOM 6005 Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 August 23, 2001 Readings Read Chapter 1 of text book ICOM 6005 Dr. Manuel
More informationOptimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
More information