# Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Save this PDF as:

Size: px
Start display at page:

Download "Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL"

## Transcription

2 Then the above query will give 350 rows. For these type of analysis the data mining algorithms should require storied as primary key and the remaining columns as non-key. It means that the data mining algorithms require the result in horizontal layout. Also, the data set in horizontal format can be analysed more efficient than data set in vertical format. R Storeid Salesday Salesamt Billno 1 Monday Tuesday Monday Tuesday Tuesday Tuesday Monday Tuesday Wednesday R V Storeid Salesday Salesamt 1 Monday Tuesday Monday Tuesday Tuesday Wednesday 2000 R H Storeid Salesmon Salestue Saleswed Null Null 3 Null Fig.1: Example of Horizontal Aggregation. 3. LITERATURE SURVEY The programming of the clustering algorithm with SQL queries is explored in [2], which shows that the horizontal layout of the data set enables simpler SQL queries. The SPJ method proved that the horizontal aggregations can be evaluated with relational algebra operators, exploiting outer joins, showing this work is connected to traditional query optimization [1]. The optimization of SQL queries with outer joins can be possible. The outer join operations are optimized by reordering the operations and using transformation rules [3]. In the context of data cube computations, the importance of producing an aggregation table with a cross-tabulation of aggregated values is identified in [4]. For the classification algorithms, the scalability of algorithms to large databases can be achieved by observing the most algorithms are driven by a set of sufficient statistics that are significantly smaller than the data. To unpivot the table that produce several rows in the vertical layout and the decision trees are computed are proposed in [5]. The unpivot operator produces many rows in the table with attribute-value pairs for each input row and it is an inverse process of horizontal aggregations. Different SQL primitive operators for transforming data sets for data mining were introduced in [6]; the most similar needed for horizontal aggregation is an operator to transpose a table, based on the chosen column. The TRANSPOSE operator [7] is same as the unpivot operator, that produce several rows for one input row. The important difference between the PIVOT and TRANSPOSE operators is that the TRANSPOSE operator allows two or more columns to be transposed in the same query, and also reducing the number of table scans. Therefore both the UNPIVOT and TRANSPOSE is complementary to each other with respect to horizontal aggregations. Horizontal aggregations are related to horizontal percentage aggregations in [8]. The differences between both approaches are that percentage aggregations require aggregating at the two grouping levels and also require dividing numbers and need taking care of numerical issues (e.g. dividing by zero). Horizontal percentage aggregations provide a starting point to extend standard aggregations to return results in horizontal form. Optimizing vertical percentage queries with different groupings in each term seems similar to association mining using bottom-up search. Reducing the number of comparisons needed to compute horizontal percentage aggregations may lead to changing the algorithm to parse and evaluate a set of aggregations when they are combined with "case" statements with disjoint conditions. A set of percentage queries on the same table may be efficiently evaluated using shared summaries. Horizontal aggregation is first proposed on the [9], in which only the SPJ method and CASE method are proposed. Also in [10] another method called PIVOT method is proposed in addition to SPJ and CASE. In this, the SQL code for evaluating horizontal aggregation is done after the vertical aggregation is performed. But in this article, the horizontal aggregation is performed directly from the table. 4. HORIZONTAL AGGREGATION A new class of aggregations called horizontal aggregations are used to obtain result in horizontal layout. Since the existing SQL aggregation produce tables in vertical layout. The SQL code generation for horizontal aggregation is discussed follows. 4.1 SQL Code Generation To obtain horizontal aggregation, a small syntax extension to aggregate functions called in a select statement. This query will produce a table with y+1 column. The data are grouped based on the unique combination of values n 1,n 2,.,n y, and one aggregated value per group. To execute this query the query optimizer takes three input parameters. 1) Input table, R 33

3 2) list of grouping attributes n 1,n 2,.,n t 3)attribute to aggregate(x). The aim of horizontal aggregation is to transpose aggregate attribute X by a column subset of n 1,n 2,.,n t. Assume that such subset is m 1,m 2,,m v, where v<y. Or the GROUP BY list can be partitioned into two sub lists: one list contain n 1,n 2,.,n t and another list contain m 1,m 2,,m v to transpose aggregated values. So, in a horizontal aggregation there are four input parameters to generate SQL code. 1) Input table F, 2) The list of GROUP BY attribute n 1,n 2,.,n t. 3) The attribute to aggregate(x). 4) The list of transposed columns m 1,m 2,,m v 4.2 Horizontal Evaluation Methods The existing three methods are used to evaluate horizontal aggregations. They are SPJ method, CASE method and PIVOT method. The SPJ method is based on relational operations in SQL. The CASE method uses the CASE construct in SQL. The PIVOT method uses the pivot built-in operator, which transforms rows to columns. Select list of transposing columns Tables R 1,R 2,.R l contain the individual aggregations for each combination of m 1,m 2,..,m V. { n 1,n 2,.,n t } become the key of R I. SQL code for generating R I, SELECT n 1,n 2,.,n t, Agg(X) WHERE m 1 =v 1I AND AND m vi GROUP BY n 1,n 2,.,n t The final SQL code for SPJ method SELECT R 0.n 1,R 0.n 2,,R 0.n t, R 1.X, R 2.X,, R l.x 0 LEFT OUTER JOIN R 1 ON R 0.n 1 = R 1.n 1 and and R 0.n t = R 1.n t LEFT OUTER JOIN R 2 ON R 0.n 1 = R 2.n 1 and and R 0.n t = R 2.n t SPJ Apply left outer joins CASE Execute CASE statements Compute R H Fig.2: Main steps of methods based on F SPJ Method The SPJ method is based on the relational operators such as select, join and project. The idea is to create the aggregated result for each values of the attributes that to be transposed and the result is stored in individual tables say R i (i=1,..p),for each distinct value of m 1,m 2,,m v. These tables are then left outer joined with the table R 0,that contain the distinct values of n 1,n 2,.n t. Left outer join is used to deal with missing information. This operation is performed by taking all tuples in the left relation that did not match with any tuple in the right relation and set the attributes from the right relation to null. Horizontal aggregation queries can be evaluated by aggregating data directly from R based on the distinct values of grouping attributes. So the distinct values of m1,m2,.m v are to identified, that define the matching boolean expression for result columns. These results are then transposed to produce R H. Agg() is the standard vertical aggregation that has aggregate attribute as argument. The SQL code for generating R 0 : SELECT DISTINCT n 1,n 2,.,n t PIVOT p pivoting values Table R 0 identifies the number of rows in the final result and it become the primary key. LEFT OUTER JOIN R l ON R 0.n 1 = R d.n 1 and and R 0.n t = R l.n t ; CASE Method In this method, the CASE construct available in SQL is used. The case statement returns a value selected from a set of values based on Boolean expressions. Horizontal aggregation queries can be evaluated by aggregating data directly from R based on the distinct values of grouping attributes. So the distinct values of m 1,m 2,.m v are to identified, that define the matching boolean expression for result columns. These results are then transposed to produce R H. If there are no qualifying rows for the specific aggregate group then result must set to null as in the case of SPJ method. Agg() in the SQL code is the standard SQL aggregation that contain the case statement as the argument. SQL Code for getting each distinct values of transposing columns, SELECT DISTINCT m 1,m 2,,m v ; SQL Code for getting R H, SELECT n 1,n 2,.,n t,agg(case WHEN m 1 = v 11 and and m v = v v1 THEN X ELSE null END)...,Agg(CASE WHEN m 1 = v 1l and and m v = v vl THEN X ELSE null END) 34

4 GROUP BY n 1,n 2,.,n t ; PIVOT Method This method uses the built-in PIVOT operator in a Commercial DBMS, which transforms rows to columns. This operator can perform transposition, so it can be used for evaluating the horizontal aggregation. This method needs to determine how many columns are required to store the transposed table and it is combined with the GROUP BY clause. Horizontal aggregation queries can be evaluated by aggregating data directly from R based on the distinct values of grouping attributes. So the distinct values of m 1,m 2,.m v are to identified, that define the matching boolean expression for result columns. These results are then transposed to produce R H. If there are no qualifying rows for the specific aggregate group then result must set to null as in the case of SPJ method. SQL code for getting the distinct values of transposing columns, SELECT DISTINCT m 1 ; SQL code for generating R H SELECT n 1,n 2,.,n t,v 1, v 2,,v l INTO R H FROM ( SELECT n 1,n 2,.,n t,m 1, X ) R u PIVOT( Agg (X) FOR m 1 in (v 1, v 2,., v l ) ) AS P; 4.3 Advantages The proposed horizontal aggregation has several advantages. By using the horizontal aggregation the manual work in preparation of data is reduced because there is no need to write the long SQL code by the data mining practitioner. Another advantage is, the SQL queries are written by person who has well knowledge about DBMS. So it is well efficient than the queries written by an end user. Next advantage is that the datasets are created in less time. By using the most efficient method that is, the CASE method reduces the time and space needed for processing the SQL query. 5. EXPERIMENTAL EVALUATION The SQL code generator was programmed in the C#.NET language and the SQL server 2005 was used that provide the build-in operators in SQL. The CASE, PIVOT operators are available in the SQL language implementation by the DBMS. The horizontal aggregation methods are evaluated with a real data set and a synthetic data set. The real data set came from the UCI Machine Learning Repository. This data set contained a collection of records from the US Census. The synthetic data set was generated as follows. The attributes are generated whose cardinalities reflect a typical transaction table from a data warehouse. Analyse SQL queries having only one horizontal aggregation that have different grouping and horizontalization columns. Each method is executed with these datasets and return the average time in milliseconds. Different column combinations are taken to get different d and n. 5.1 Analysis The time complexity and the space complexity of each method are analysed. Consider that N= R, n= R H and l is the dataset dimensionality. In the SPJ method, the p tables are to generated for each distinct values of m 1,..,m v. These tables are then joined with the table R 0. So the time complexity of SPJ method is high because of generating p tables and the left outer join operation. Also the space is needed for storing p tables and for R 0 table. And as the dimensionality increases, the space complexity also increases. In the CASE method the time complexity and space complexity is less compared to SPJ method. The v comparisons are needed to get the aggregated result and there is no intermediate tables are needed to get the aggregated result. So the space complexity is less. As the comparisons are done in parallel, the time required for computing the case statement is less. So the time complexity of CASE method is less. In the PIVOT method, the parallel step does not appear when evaluating the query. So the time required for executing the PIVOT method is more compared to the CASE method. In the PIVOT method the space is needed for the intermediate table that stores the grouping attributes and the aggregate attribute. From this table the values of the transposing columns are taken. Due to this table, the space complexity is more compared to the CASE method. By analysing the three methods, the SPJ method is the slowest method. Also the CASE method is more efficient than the PIVOT method in terms of time complexity. The time complexity of each method grows as the n grows. Also, the high variation is occurring at the SPJ method. Time taken for executing each method is shown in table 1. The most important difference between CASE and PIVOT method is that the CASE method has a parallel step which allows evaluating aggregations in parallel. This parallelism does not appear when PIVOT operator evaluates the query from R. SPJ method takes more time than PIVOT and CASE method because it has more operations to be performed. Tables are to be generated for each distinct combination of transposing columns. Also, the table R 0 to be generated that contain distinct values of the grouping columns. Then left outer join is to be performed between these tables. SPJ method also requires more space than CASE and PIVOT. From these observations it is found that the CASE method is more efficient than CASE and PIVOT method. Table 1. Comparison of three horizontal evaluation methods Number of records Dimensionality Time in milliseconds Method CASE PIVOT 35

5 SPJ CASE PIVOT SPJ CASE PIVOT SPJ Fig 3: Time complexity of three horizontal aggregation methods 6. CONCLUSION A new class of aggregate functions in SQL called horizontal aggregation is used that helps for making datasets for the data mining projects. These functions are used for creating data sets in horizontal layout, because most of the data mining algorithms require datasets in horizontal layout. Mainly, the existing SQL aggregations return results in one column per aggregated group. But in horizontal aggregation, it returns a set of numbers instead of a single number for each group. Three query evaluation methods are proposed. The first method focuses on the relational operators in SQL. The second method focuses on the SQL case construct. The third method focuses on the PIVOT built-in operator in a commercial DBMS. These three methods are then analysed using the large dataset and observe the time and space complexity of each method. By analysing, it is observed that the CASE method is the most efficient method. Horizontal aggregation produces tables with fewer rows but with more columns. So the traditional query optimization techniques are inappropriate for the new class of aggregations. So the next plan is to develop the most appropriate query optimization technique for the horizontal aggregation. 7. REFERENCES [1] H. Garcia-Molina, J.D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 1st edition, [2] C.Ordonez Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 18(2): , [3] C. Galindo-Legaria and A. Rosenthal. Outer join simplification and reordering for query optimization. ACM TODS, 22(1):43.73, [4] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross- tab and subtotal. In ICDE Conference, pages , [5] G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In proc.acm KDD Conference, pages , [6] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman. Non-stop SQL/MX primitives for knowledge discovery. In ACM KDD Conference, pages , [7] C. Ordonez. Vertical and horizontal percentage aggregations. In Proc.ACM SIGMOD Conference, pages , [8] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages , [9] C. Ordonez. Horizontal aggregations for building tabular data sets. In Proc. ACM SIGMOD Data Mining and Knowledge Discovery Workshop, pages 35.42, [10] C. Ordonez, Zhibo Chen. Horizontal aggregations in SQL to prepare Data Sets for Data Mining Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE), IJCA TM : 36

### Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

### CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

### Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

### Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

### Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and

### Horizontal Aggregations Based Data Sets for Data Mining Analysis: A Review

Horizontal Aggregations Based Data Sets for Data Mining Analysis: A Review 1 Mr.Gaurav J.Sawale and 2 Prof. Dr.S. R.Gupta 1 Department of computer science & engineering, PRMIT&R, Badnera, Amravati, Maharashtra,

### International Journal of Advanced Research in Computer Science and Software Engineering

Volume 2, Issue 8, August 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Aggregations

### Improving Analysis Of Data Mining By Creating Dataset Using Sql Aggregations

International Refereed Journal of Engineering and Science (IRJES) ISSN (Online) 2319-183X, (Print) 2319-1821 Volume 1, Issue 3 (November 2012), PP.28-33 Improving Analysis Of Data Mining By Creating Dataset

### Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets

Vol.3, Issue.4, Jul - Aug. 2013 pp-1861-1871 ISSN: 2249-6645 Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets B. Susrutha 1, J. Vamsi Nath 2, T. Bharath Manohar 3, I. Shalini

### International Journal of Advanced Research in Computer Science and Software Engineering

Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach

### Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL

International Journal of Computer Applications in Engineering Sciences [VOL III, ISSUE I, MARCH 2013] [ISSN: 2231-4946] Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in

### Prepare and Optimize Data Sets for Data Mining Analysis

Prepare and Optimize Data Sets for Data Mining Analysis Bhaskar. N 1 1 Assistant Professor, KCG College of Technology Abstract Getting ready a data set for examination is usually the tedious errand in

### Data sets preparing for Data mining analysis by SQL Horizontal Aggregation

Data sets preparing for Data mining analysis by SQL Horizontal Aggregation V.Nikitha 1, P.Jhansi 2, K.Neelima 3, D.Anusha 4 Department Of IT, G.Pullaiah College of Engineering and Technology. Kurnool JNTU

### Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

1 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis Carlos Ordonez, Zhibo Chen University of Houston Houston, TX 77204, USA Abstract Preparing a data set for analysis is generally

### Optimization in SQL by Horizontal Aggregation

International Journal of Engineering and Technical Research (IJETR) Optimization in SQL by Horizontal Aggregation Krupali R. Dhawale, M.S.Gayathri Abstract Data mining is similar to discover a new idea.

### Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

### Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam

2009 IEEE International Advance Computing Conference (IACC 2009) Patiala, India, 6 7 March 2009 Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam

### Programming the K-means Clustering Algorithm in SQL

Programming the K-means Clustering Algorithm in SQL Carlos Ordonez Teradata, NCR San Diego, CA 92127, USA carlos.ordonez@teradata-ncr.com ABSTRACT Using SQL has not been considered an efficient and feasible

### Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters

### A Brief Tutorial on Database Queries, Data Mining, and OLAP

A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)

### CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing

CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday

### A Dynamic Load Balancing Strategy for Parallel Datacube Computation

A Dynamic Load Balancing Strategy for Parallel Datacube Computation Seigo Muto Institute of Industrial Science, University of Tokyo 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558 Japan +81-3-3402-6231 ext.

### DATA MINING ANALYSIS TO DRAW UP DATA SETS BASED ON AGGREGATIONS

DATA MINING ANALYSIS TO DRAW UP DATA SETS BASED ON AGGREGATIONS Paparao Golusu 1, Nagul Shaik 2 1 M. Tech Scholar (CSE), Nalanda Institute of Tech, (NIT), Siddharth Nagar, Guntur, A.P, (India) 2 Assistant

### CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes

CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices

### Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

### Indexing Techniques for Data Warehouses Queries. Abstract

Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 sirirut@cs.ou.edu gruenwal@cs.ou.edu Abstract Recently,

### Mauro Sousa Marta Mattoso Nelson Ebecken. and these techniques often repeatedly scan the. entire set. A solution that has been used for a

Data Mining on Parallel Database Systems Mauro Sousa Marta Mattoso Nelson Ebecken COPPEèUFRJ - Federal University of Rio de Janeiro P.O. Box 68511, Rio de Janeiro, RJ, Brazil, 21945-970 Fax: +55 21 2906626

### AMIS 7640 Data Mining for Business Intelligence

The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session

### Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data

Volume 39 No10, February 2012 Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data Rajesh V Argiddi Assit Prof Department Of Computer Science and Engineering,

### Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

### CubeView: A System for Traffic Data Visualization

CUBEVIEW: A SYSTEM FOR TRAFFIC DATA VISUALIZATION 1 CubeView: A System for Traffic Data Visualization S. Shekhar, C.T. Lu, R. Liu, C. Zhou Computer Science Department, University of Minnesota 200 Union

### Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

### Building Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu

Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the

### Analysis of Query Optimization Techniques in Databases

Analysis of Query Optimization Techniques in Databases Jyoti Mor M. Tech Student, CSE Dept. Indu Kashyap Assistant Professor, CSE Dept. R. K. Rathy, PhD. Professor, CSE Department ABSTRACT Query optimization

### PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

### Scalable Classification over SQL Databases

Scalable Classification over SQL Databases Surajit Chaudhuri Usama Fayyad Jeff Bernhardt Microsoft Research Redmond, WA 98052, USA Email: {surajitc,fayyad, jeffbern}@microsoft.com Abstract We identify

### Fig. 1 A typical Knowledge Discovery process [2]

Volume 4, Issue 7, July 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on Clustering

### International Journal of Innovative Research in Computer and Communication Engineering

FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

### OLTP Compared With OLAP

OLTP Compared With OLAP On Line Transaction Processing OLTP Maintains a database that is an accurate model of some realworld enterprise. Supports day-to-day operations. Characteristics: Short simple transactions

### KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

### TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

### Data Warehousing and Decision Support. Introduction. Three Complementary Trends. Chapter 23, Part A

Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

### Abstract. Keywords: Data Warehouse, Views, Fragmentation, Performance benefit

Optimizing Partition-Selection Scheme for Warehouse Aggregate Views * C.I. Ezeife School of Computer Science University of Windsor Windsor, Ontario Canada N9B 3P4 cezeife@cs.uwindsor.ca Tel: (519) 253-3000

### Microsoft Excel 2010 Pivot Tables

Microsoft Excel 2010 Pivot Tables Email: training@health.ufl.edu Web Page: http://training.health.ufl.edu Microsoft Excel 2010: Pivot Tables 1.5 hours Topics include data groupings, pivot tables, pivot

### Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

### Search and Information Retrieval

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities

### DATA VERIFICATION IN ETL PROCESSES

KNOWLEDGE ENGINEERING: PRINCIPLES AND TECHNIQUES Proceedings of the International Conference on Knowledge Engineering, Principles and Techniques, KEPT2007 Cluj-Napoca (Romania), June 6 8, 2007, pp. 282

### MOC 20461C: Querying Microsoft SQL Server. Course Overview

MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server

### MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

### ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

### International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

### The Study on Data Warehouse Design and Usage

International Journal of Scientific and Research Publications, Volume 3, Issue 3, March 2013 1 The Study on Data Warehouse Design and Usage Mr. Dishek Mankad 1, Mr. Preyash Dholakia 2 1 M.C.A., B.R.Patel

### Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Johann Eder 1, Heinz Frank 1, Tadeusz Morzy 2, Robert Wrembel 2, Maciej Zakrzewicz 2 1 Institut für Informatik

### AN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS

AN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS Maddela Pradeep 1, V. Nagi Reddy 2 1 M.Tech Scholar(CSE), 2 Assistant Professor, Nalanda Institute Of Technology(NIT), Siddharth Nagar, Guntur,

### An Algorithm to Evaluate Iceberg Queries for Improving The Query Performance

INTERNATIONAL OPEN ACCESS JOURNAL ISSN: 2249-6645 OF MODERN ENGINEERING RESEARCH (IJMER) An Algorithm to Evaluate Iceberg Queries for Improving The Query Performance M.Laxmaiah 1, A.Govardhan 2 1 Department

### Turkish Journal of Engineering, Science and Technology

Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server

### Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

### MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

### Dynamic Data in terms of Data Mining Streams

International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

### An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

### New Approach of Computing Data Cubes in Data Warehousing

International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of

### AMIS 7640 Data Mining for Business Intelligence

The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2014, Session

### INDEXING BIOMEDICAL STREAMS IN DATA MANAGEMENT SYSTEM 1. INTRODUCTION

JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 9/2005, ISSN 1642-6037 Michał WIDERA *, Janusz WRÓBEL *, Adam MATONIA *, Michał JEŻEWSKI **,Krzysztof HOROBA *, Tomasz KUPKA * centralized monitoring,

### Relational Division and SQL

Relational Division and SQL Soulé 1 Example Relations and Queries As a motivating example, consider the following two relations: Taken(,Course) which contains the courses that each student has completed,

### The basic data mining algorithms introduced may be enhanced in a number of ways.

DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

### Various Data-Mining Techniques for Big Data

Various Data-Mining Techniques for Big Data Manisha R. Thakare Mtech (CSE) B.D. College of Engineering, Sevagram, Wardha, India S. W. Mohod Assistant Professor, Department of C.E. B.D. College of EngineeringWardha,

### Integration of Data Mining Systems using Sequence Process

Integration of Data Mining Systems using Sequence Process Ram Prasad Chakraborty, Assistant Professor, Computer Science & Engineering, Amrapali Group of Institutes, Haldwani, India. ---------------------------------------------------------------------***----------------------------------------------------------------------

### Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

### Advances in Natural and Applied Sciences

AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Clustering Algorithm Based On Hadoop for Big Data 1 Jayalatchumy D. and

### Application Tool for Experiments on SQL Server 2005 Transactions

Proceedings of the 5th WSEAS Int. Conf. on DATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 2006 30 Application Tool for Experiments on SQL Server 2005 Transactions ŞERBAN

### Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students

### Oracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.

Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse

### SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

### Graphical Web based Tool for Generating Query from Star Schema

Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604

### MapReduce for Data Warehouses

MapReduce for Data Warehouses Data Warehouses: Hadoop and Relational Databases In an enterprise setting, a data warehouse serves as a vast repository of data, holding everything from sales transactions

### DATA WAREHOUSING AND OLAP TECHNOLOGY

DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are

### IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria

### A Database Perspective on Knowledge Discovery

Tomasz Imielinski and Heikki Mannila A Database Perspective on Knowledge Discovery The concept of data mining as a querying process and the first steps toward efficient development of knowledge discovery

### Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

### An Architecture for Using Tertiary Storage in a Data Warehouse

An Architecture for Using Tertiary Storage in a Data Warehouse Theodore Johnson Database Research Dept. AT&T Labs - Research johnsont@research.att.com Motivation AT&T has huge data warehouses. Data from

### Infrastructures for big data

Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

### Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

### Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil

### The Role of Metadata for Effective Data Warehouse

ISSN: 1991-8941 The Role of Metadata for Effective Data Warehouse Murtadha M. Hamad Alaa Abdulqahar Jihad University of Anbar - College of computer Abstract: Metadata efficient method for managing Data

### Discretization and grouping: preprocessing steps for Data Mining

Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic

### A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

### A Lightweight Solution to the Educational Data Mining Challenge

A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

### Cluster analysis and Association analysis for the same data

Cluster analysis and Association analysis for the same data Huaiguo Fu Telecommunications Software & Systems Group Waterford Institute of Technology Waterford, Ireland hfu@tssg.org Abstract: Both cluster

### Decision Support. Chapter 23. Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1

Decision Support Chapter 23 Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful

### Enhanced data mining analysis in higher educational system using rough set theory

African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review

### A Critical Review of Data Warehouse

Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### OLAP Visualization Operator for Complex Data

OLAP Visualization Operator for Complex Data Sabine Loudcher and Omar Boussaid ERIC laboratory, University of Lyon (University Lyon 2) 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France Tel.: +33-4-78772320,

### ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti

### Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

### Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301)

Evaluation of view maintenance with complex joins in a data warehouse environment (HS-IDA-MD-02-301) Kjartan Asthorsson (kjarri@kjarri.net) Department of Computer Science Högskolan i Skövde, Box 408 SE-54128

### Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski