Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL
|
|
|
- Alexia Walton
- 10 years ago
- Views:
Transcription
1 International Journal of Computer Applications in Engineering Sciences [VOL III, ISSUE I, MARCH 2013] [ISSN: ] Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL Mohd Abdul Samad 1, Md. Riazur Rahman 2, Syed Zahed 3, Mohd Abdul Fattah 4 Nizam Institute of Engineering & Technology Hyderabad, Andhra Pradesh, India 1 [email protected] 2 [email protected] 3 [email protected] 4 [email protected] Abstract Data mining is widely used domain for extracting trends or patterns from historical data. In a relational database, especially with normalized tables, a significant effort is required to prepare a summary data set that can be used as input for a data mining algorithm. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. The existing aggregations performed through SQL functions such as MIN, MAX, COUNT, SUM, AVG return a single value output which is not suitable for making datasets meant for data mining. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. The result of the queries is the data which is suitable for data mining operations. It does mean that this paper achieves horizontal aggregations through some constructs builds that includes SQL queries as well. We propose three fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the programming phase construct; SPJ: Based on standard relational algebra operators (SPJ Queries); PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Keywords Aggregations, SQL, data sets and data mining I. INTRODUCTION In a relational database, especially with normalized tables, a significant effort is required to prepare a summary data set that can be used as input for a data mining or statistical algorithm. Most algorithms require as input a data set with a horizontal layout, with several Records and one variable or dimension per column. That is the case with models like clustering, classification, regression and PCA; consult. Each research discipline uses different terminology to describe the data set. In data mining the common terms are point-dimension. Statistics literature generally uses observation-variable. Machine learning research uses instance-feature. This article introduces a new class of aggregate functions that can be used to build data sets in a horizontal layout (denormalized with aggregations), automating SQL query writing and extending SQL capabilities. We show evaluating horizontal aggregations is a challenging and interesting problem and we introduced alternative methods and optimizations for their efficient evaluation. A. Motivation As mentioned above, building a suitable data set for data mining purposes is a time- consuming task. This task generally requires writing long SQL statements or customizing SQL Code if it is automatically generated by some tool. There are two main ingredients in such SQL code: joins and aggregations; we focus on the second one. The most widely-known aggregation is the sum of a column over groups of rows. Some other aggregations return the average, maximum, minimum or row count over groups of rows. There exist many aggregations functions and operators in SQL. Unfortunately, all these aggregations have limitations to build data sets for data mining purposes. The main reason is that, in general, data sets that are stored in a relational database (or a data warehouse) come from On- Line Transaction Processing (OLTP) systems where database schemas are highly normalized. But data mining, statistical or machine learning algorithms generally require aggregated data in summarized form. Based on current available functions and clauses in SQL, a significant effort is required to compute aggregations when they are desired in a cross tabular (Horizontal) form, suitable to be used by a data mining algorithm. Such effort is due to the amount and complexity of SQL code that needs to be written, optimized and tested. There are further practical reasons to return aggregation results in a horizontal (crosstabular) layout. Standard aggregations are hard to interpret when there are many result rows, especially when grouping attributes have high cardinalities. To perform analysis of exported tables into spreadsheets it may be more convenient to have aggregations on the same group in one row (e.g. to produce graphs or to compare data sets with repetitive information). OLAP tools generate SQL code to transpose results (sometimes called PIVOT). Transposition can be more efficient if there are mechanisms combining aggregation and 46 P a g e
2 Samad et. al. transposition together. With such limitations in mind, we propose a new class of aggregate functions that aggregate numeric expressions and transpose results to produce a data set with a horizontal layout. Functions belonging to this class are called horizontal aggregations. Horizontal aggregations represent an extended form of traditional SQL aggregations, which return a set of values in a horizontal layout (somewhat similar to a multidimensional vector), instead of a single value per row. This article explains how to evaluate and optimize horizontal aggregations generating standard SQL code. The proposed horizontal aggregations can be used to generate data sets for the purpose of data mining analysis. II. DEFINITIONS This section defines the table that will be used to explain SQL queries throughout this work. In order to present definitions and concepts in an intuitive manner, we present our definitions in OLAP terms. Let F be a table having a simple primary key Represented by an integer, p discrete attributes and one numeric attribute: F(K,D1,..., Dp,A). Our definitions can be easily generalized to multiple numeric attributes. In OLAP terms, F is a fact table with one column used as primary key, p dimensions and one measure column passed to standard SQL aggregations. That is, table F will be manipulated as a cube with p dimensions [4]. Subsets of dimension columns are used to group rows to aggregate the measure column.f are assumed to have a star schema to simplify exposition. Column K will not be used to compute aggregations. Dimension lookup tables will be based on simple foreign keys. That is, one dimension column Dj will be a foreign key linked to a lookup table that has Dj as primary key. Input table F size is called N (not to be confused with n, the size of the answer set). That is, F = N. Table F represents a temporary table or a view based on a star join query on several tables. We now explain tables FV (vertical) and FH (horizontal) that are used throughout the article. Consider a standard SQL aggregation (e.g. sum ())with the GROUP BY clause, which returns results in a vertical layout. Assume there are j + k GROUP BY columns and the aggregated attribute is A. The results are stored on table FV having j + k columns making up the primary key and A as a non-key attribute. Table FV has a vertical layout. The goal of a horizontal aggregation is to transform FV into a table FH with a horizontal layout having n rows and j+d columns, where each of the d columns represents a unique combination of the k grouping columns. Table FV maybe more efficient than FH to handle sparse matrices (having many zeroes), but some DBMSs like SQLServer [1] can handle sparse columns in a horizontal layout. The n rows represent records for analysis and the d columns represent dimensions or features for analysis. Therefore, n is data set size and d is dimensionality. In other words, each aggregated column represents a numeric variable as defined in statistics research or a numeric feature as typically defined in machine learning research. A. Example Figure 1 gives an example showing the input table F, a traditional vertical sum ( ) aggregation stored in FVand a horizontal aggregation stored in FH. The basic SQL aggregation query is: SELECT D1, D2, sum (A) GROUP BY D1, D2 ORDER BY D1, D2; Notice table FV has only five rows because D1=3and D2=Y do not appear together. Also, the first row in FV has null in A following SQL evaluation semantics. On the other hand, table FH has three rows and two (d = 2) non-key columns, effectively storing six aggregated values. In FH it is necessary to populate the last row with null. Therefore, nulls may come from F or may be introduced by the horizontal layout. TABLE F K D 1 D 2 A 1 3 X Y Y Y X X Null 7 3 X X 7 TABLE FV D 1 D 2 A 1 X Null 1 Y 10 2 X 8 2 Y 6 3 X 17 TABLE FH D 1 D 2 A 1 X Null 1 Y 10 2 X 8 2 Y 6 3 X 17 Fig. 1: Example of F, FV and FH. 47 P a g e
3 Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL III. HORIZONTAL AGGREGATIONS We introduce a new class of aggregations that have similar behavior to SQL standard aggregations, but which produce tables with a horizontal layout. In contrast, we call standard SQL aggregations vertical aggregations since they produce tables with a vertical layout. Horizontal aggregations just require a small syntax extension to aggregate functions called in a SELECT statement. Alternatively, horizontal aggregations can be used to generate SQL code from a data mining tool to build data sets for data mining analysis. We start by explaining how to automatically generate SQL code. A. SQL Code generation Our main goal is to define a template to generate SQL code Combining aggregation and transposition (pivoting).a second goal is to extend the SELECT statement with a clause that combines transposition with aggregation. Consider the following GROUP BY query in standard SQL that takes a subset L1,...,Lm From D1..Dp: SELECT L1...Lm, sum (A) GROUP BY L1... Lm; This aggregation query will produce a wide table with m+1 columns (automatically determined), with one group for each unique combination of values L1,..., Lm and one aggregated value per group (sum(a) in this case). In order to evaluate this query the query optimizer takes three input parameters: (1) the imputable F, (2) the list of grouping columns L1,..., Lm, (3) the column to aggregate (A). The basic goal of a horizontal aggregation is to transpose (pivot) the aggregated column A by a column subset of L1,...,Lm; for simplicity assume such subset is R1,..., Rkwhere k < m. In other words, we partition the GROUP BY list into two subsists: one list to produce each group (j columns L1... Lj) and another list (kcolumns R1,...,Rk) to transpose aggregated values, where {L1,..., Lj} {R1... Rk} =. Each distinct combination of {R1,..., Rk} will automatically produce an output column. In particular, if k = 1 then there are πr1 (F) columns (i.e. each value in R1becomes a column storing one aggregation).therefore, in a horizontal aggregation there are four input parameters to generate SQL code: 1) The input table F, 2) The list of GROUP BY columns L1,..., Lj, 3) The column to aggregate (A), 4) The list of transposing columns R1,...,Rk. Horizontal aggregations preserve evaluation semantics of standard (vertical) SQL aggregations. The main difference will be returning a table with a horizontal layout, possibly having extra nulls. B. Proposed Syntax in Extended SQL We now turn our attention to a small syntax extension to the SELECT statement, which allows understanding our proposal in an intuitive manner. We must point out the proposed extension represents nonstandard SQL because the columns in the output table are not known when the query is parsed. We assume F does not change while a horizontal aggregation is evaluated because new values may create new result columns. Conceptually, we extend standard SQL aggregate functions with a transposing BY clause followed by a list of columns (i.e. R1,..., Rk), to produce a horizontal set of numbers instead of one number. Our proposed syntax is as follows. SELECT L1,..,Lj, H(A BY R1,...,Rk) GROUP BY L1... Lj; We believe the subgroup columns R1...Rk should be a parameter associated to the aggregation itself. That is why they appear inside the parenthesis as arguments, but alternative syntax definitions are feasible. In the context of our work, H() represents some SQL aggregation (e.g. sum(),count(), min(),max(), avg()). The function H() must have at least one argument represented by A, followed by a list of columns. The result rows are determined by columns L1... Lj in the GROUP BY clause if present. Result columns are determined by all potential combinations of columns R1...Rk, where k=1 is the default. Also, {L1,..., Lj} {R1,..., Rk} = Ø We intend to preserve standard SQL evaluation semantics as much as possible. Our goal is to develop sound and efficient evaluation mechanisms. Thus we propose the following rules. (1) the GROUP BY clause is optional, like a vertical aggregation. That is, the list L1,..., Lj may be empty. When the GROUPBY clause is not present then there is only one result row. Equivalently, rows can be grouped by a constant value (e.g. L1 = 0) to always include a GROUP BY clause in code generation. (2) When the clause GROUP BY is present there should not be a HAVING clause that may produce cross tabulation of the same group (i.e. multiple rows with aggregated values per group). (3) the transposing BY clause is optional. When BY is not present then a horizontal aggregation reduces to a vertical aggregation. (4) When the BY clause is present the list R1,...,Rk is required, where k = 1 is the default. (5) horizontal aggregations can be combined with vertical aggregations or other horizontal aggregations on the same query, provided all use the same GROUP BY columns {L1,..., Lj}.(6) As long as F does not change during query processing horizontal aggregations can be freely combined. Such restriction requires locking [6], which we will explain later. (7) the argument to aggregate represented by A is required; A can be a 48 P a g e
4 Samad et. al. column name or an arithmetic expression. In the particular case of count( ) A can be the DISTINCT keyword followed by the list of columns. (8) When H() is used more than once, in different terms, it should be used with different sets of BY columns C. SQL Code Generation: Query Evaluation Methods We propose three methods to evaluate horizontal aggregations. The first method relies only on relational operations. That is, only doing select, project, join and aggregation queries; we call it the SPJ method. The second form relies on the SQL case construct; we call it the CASE method. Each table has an index on its primary key for efficient join processing. We do not consider additional indexing mechanisms to accelerate query evaluation. The third method uses the built-in PIVOT operator, which transforms rows to columns (e.g. transposing). D. SPJ method The SPJ method is interesting from a theoretical point of view because it is based on relational operators only. The basic idea is to create one table with a vertical aggregation for each result column, and then join all those tables to produce FH. We aggregate from F into d projected tables with d Select- Project-Join-Aggregation queries (selection, projection, join, aggregation). Each table FI corresponds to one sub grouping combination and has. {L1,..., Lj} as primary key and an aggregation on A as the only non-key column. It is necessary to introduce an additional table F0 that will be outer joined with projected tables to get a complete result set. Let FV be a table containing the vertical aggregation, based on L1... Lj, R1... Rk. Let V ( )represent the corresponding vertical aggregation for H ( ). The statement to compute FV gets a cube: INSERT INTO FV SELECT L1... Lj, R1,..., Rk, V (A) GROUP BY L1... Lj, R1,..., Rk; Table F0 defines the number of result rows, and builds the primary key. F0 is populated so that itcontains every existing combination of L1,..., Lj.Table F0 has {L1,..., Lj} as primary key and it does not have any non-key column. INSERT INTO F0 SELECT DISTINCT L1,..., Lj FROM {F FV }; In the following discussion I _ {1, 2... d}: we use h to make writing clear, mainly to define Boolean expressions. We need to get all distinct combinations of sub grouping columns R1... Rk, to create the name of dimension columns, to get d, the number of dimensions, and to generate the Boolean expressions for WHERE clauses. Each WHERE clause consists of a conjunction of k equalities based on R1...Rk. SELECT DISTINCT R1,...,Rk FROM {F FV }; Tables F1,..., Fd contain individual aggregations for each combination of R1,..., Rk. The primary key of table FI is {L1,..., Lj}. INSERT INTO FI SELECT L1,..., Lj, V (A) FROM {F FV} WHERE R1 = v1i AND AND Rk = vki GROUP BY L1,..., Lj; Then each table FI aggregates only those rows that correspond to the Ith unique combination of R1,..., Rk, given by the WHERE clause. A possible optimization is synchronizing table scans to compute the d tables in one pass. Finally, to get FH we need d left outer joins with the d + 1 tables so that all individual aggregations are properly assembled as a set of d dimensions for each group. Outer joins set result columns to null for missing combinations for the given group. In general, nulls should be the default value for groups with missing combinations. We believe it would be incorrect to set the result to zero or some other number by default if there is no qualifying rows. Such approach should be considered on a per-case basis. SELECT F0.L1, F0.L2.... F0.Lj, F1.A, F2.A... Fd.A 0 LEFT OUTER JOIN F1 ON F0.L1 = F1.L1 and.... and F0.Lj =F1.Lj LEFT OUTER JOIN F2 ON F0.L1 = F2.L1 and... and F0.Lj = F2.Lj... LEFT OUTER JOIN Fd ON F0.L1 = Fd.L1 and... and F0.Lj = Fd.Lj; This statement may look complex, but it is easy to see that each left outer join is based on the same columns L1... Lj. To avoid ambiguity in column references, L1... Lj are qualified with F0. Result column I is qualified with table FI. Since F0 has n rows each left outer join produces a partial table with n rows and one additional column. Then at the end, FH will have n rows and d aggregation columns. The statement above is equivalent to an update-based strategy. Table FH can be initialized inserting n rows with key L1... Lj and nulls on the d dimension aggregation columns. Then FH is iteratively updated from FI joining on L1... Lj. This strategy basically incurs twice I/O doing updates instead of insertion. Reordering the d projected tables to join cannot accelerate processing because each partial table has n rows. 49 P a g e
5 Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL E. CASE method For this method we use the case programming construct available in SQL. The case statement returns a value selected from a set of values based on Boolean expressions. From a relational database theory point of view this is equivalent to doing a simple projection/aggregation query where each non key value is given by a function that returns a number based on some conjunction of conditions. Horizontal aggregation queries are evaluated by aggregating from F and transposing rows at the same time to produce FH. First, we need to get the unique combinations of R1... Rk that define the matching Boolean expression for result columns. The SQL code to compute horizontal aggregations directly from F is as follows. Observe V () is a standard (vertical) SQL aggregation that has a case statement as argument. Horizontal aggregations need to set the result to null when there are no qualifying rows for the specific horizontal group to be consistent with the SPJ method and also with the extended relational model [2]. SELECT DISTINCT R1..Rk ; SELECT L1... Lj, V (CASE WHEN R1 = v11 and... and Rk = vk1 THEN A ELSE null END).., V (CASE WHEN R1 = v1d and... and Rk = vkd THEN A ELSE null END) GROUP BY L1, L2... Lj; F. PIVOT method We consider the PIVOT operator which is a built in operator in a commercial DBMS. Since this operator can perform transposition it can help evaluating horizontal aggregations. The PIVOT method internally needs to determine how many columns are needed to store the transposed table and it can be combined with the GROUPBY clause. The basic syntax to exploit the PIVOT operator to compute a horizontal aggregation assuming one BY column for the right key columns (i.e. k = 1) is as follows: SELECT DISTINCT R1 ; /* produces v1,..., vd */ SELECTL1, L2,..., Lj, v1, v2,..., vd INTO FH FROM ( SELECT L1, L2,...,Lj, R1, A ) Ft PIVOT (V (A) FOR R1 in (v1, v2,..., vd) ) AS P; Example of Generated SQL Queries We now show actual SQL code for our small example. This SQL code produces FH in Figure 1. The SPJ method code is as follows: /* SPJ method */ INSERT INTO F0 SELECT DISTINCT D1 ; INSERT INTO F1 SELECT D1, sum (A) AS A WHERE D2= X GROUP BY D1; INSERT INTO F2 SELECT D1, sum (A) AS A WHERE D2= Y GROUP BY D1; SELECT F0.D1,F1.A AS D2_X,F2.A AS D2_Y 0 LEFT OUTER JOIN F1 on F0.D1=F1.D1 LEFT OUTER JOIN F2 on F0.D1=F2.D1; The CASE method code is as follows: /* CASE method */ SELECT D1, SUM (CASE WHEN D2= X THEN A ELSE null END) as D2_X, SUM (CASE WHEN D2= Y THEN A ELSE null END) as D2_Y GROUP BY D1; The PIVOT method SQL is as follows: /* PIVOT method */ SELECT D1, [X] as D2_X, [Y] as D2_Y FROM (SELECT D1, D2, A ) as p PIVOT ( SUM (A) FOR D2 IN ([X], [Y]) ) as pvt; IV. ADVANTAGES The SQL code reduces manual work in the data preparation phase in a data mining project. The SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user. The data sets can be created in less time. The data set can be created entirely inside the DBMS. 50 P a g e
6 Samad et. al. Fig. 2 shows steps on all methods based on input table As can be seen in fig. 2, for all methods such as SPJ, CASE and PIVOT steps are given. For every method the procedure starts with SELECT query. Afterwards, corresponding operator through underlying construct is applied. Then horizontal aggregation is computed. [1] C. Ordonez, Data Set Preprocessing and Transformation in a Database System, Intelligent Data Analysis, vol. 15, no. 4, pp [2] 631, [3] C. Ordonez, Statistical Model Computation with UDFs, IEEE Trans. Knowledge and Data Eng., vol. 22, no. 12, pp , [4] Dec [5] C. Ordonez and S. Pitchaimalai, Bayesian Classifiers Programmed in SQL, IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1, pp , Jan [6] J. Han and M. Kamber, Data Mining: Concepts and Techniques, first ed. Morgan Kaufmann, [7] C. Ordonez, Integrating K-Means Clustering with a Relational DBMS Using SQL, IEEE Trans. Knowledge and Data Eng., vol.18, no. 2, pp , Feb [8] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C.Kleinerman..NET database programmability and extensibility in Microsoft SQL Server. In Proc. ACM SIGMOD Conference, pages , [9] E.F. Codd. Extending the database relational model to capture more meaning. ACM TODS, 4(4): , [10] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria. PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages , [11] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Datacube: A relational aggregation operator generalizing group by, crosstab and subtotal. In ICDE Conference, pages , [12] H. Wang, C. Zaniolo, and C.R. Luo. ATLaS: A small but complete SQL extension for data mining and data streams. In Proc. VLDB Conference. Fig. 3 shows steps on all methods based on table containing results of vertical aggregations V. CONCLUSION In this paper we extended three aggregate functions such as CASE, SPJ and PIVOT. These are known as horizontal aggregations. We have achieved it by writing underlying constructs for each operator. When they are used, internally the corresponding construct gets executed and the resultant data set is meant for OLAP(Online Analytical Processing). Vertical aggregations such as SUM, MIN, MAX, COUNT, and AVG return a single value output. However, that output can t be used for data mining operations. In order to prepare real world datasets that are very much suitable for data mining operations, we explored horizontal aggregations by developing constructs in the form of operators such as CASE, SPJ and PIVOT. Instead of single value, the horizontal aggregations return a set of values in the form of a row. The result resembles a multidimensional vector. We have implemented SPJ using standard relational query operations. The CASE construct is developed extending SQL CASE. The PIVOT makes use of built in operator provided by RDBMS for pivoting data. To evaluate these operators, we have developed a web based prototype application and results reveal that the proposed horizontal aggregations are capable of preparing data sets for real world data mining operations. REFERENCES 51 P a g e
Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner
24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India
International Journal of Advanced Research in Computer Science and Software Engineering
Volume 2, Issue 8, August 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Aggregations
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
International Journal of Advanced Research in Computer Science and Software Engineering
Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach
CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS
CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
1 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis Carlos Ordonez, Zhibo Chen University of Houston Houston, TX 77204, USA Abstract Preparing a data set for analysis is generally
Data sets preparing for Data mining analysis by SQL Horizontal Aggregation
Data sets preparing for Data mining analysis by SQL Horizontal Aggregation V.Nikitha 1, P.Jhansi 2, K.Neelima 3, D.Anusha 4 Department Of IT, G.Pullaiah College of Engineering and Technology. Kurnool JNTU
Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam
2009 IEEE International Advance Computing Conference (IACC 2009) Patiala, India, 6 7 March 2009 Role of Materialized View Maintenance with PIVOT and UNPIVOT Operators A.N.M. Bazlur Rashid, M. S. Islam
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
A Brief Tutorial on Database Queries, Data Mining, and OLAP
A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
Data Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
LearnFromGuru Polish your knowledge
SQL SERVER 2008 R2 /2012 (TSQL/SSIS/ SSRS/ SSAS BI Developer TRAINING) Module: I T-SQL Programming and Database Design An Overview of SQL Server 2008 R2 / 2012 Available Features and Tools New Capabilities
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
DATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
LEARNING SOLUTIONS website milner.com/learning email [email protected] phone 800 875 5042
Course 20467A: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Length: 5 Days Published: December 21, 2012 Language(s): English Audience(s): IT Professionals Overview Level: 300
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification
Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Outline More Complex SQL Retrieval Queries
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
OLAP Systems and Multidimensional Expressions I
OLAP Systems and Multidimensional Expressions I Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master
New Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
DATA MINING ANALYSIS TO DRAW UP DATA SETS BASED ON AGGREGATIONS
DATA MINING ANALYSIS TO DRAW UP DATA SETS BASED ON AGGREGATIONS Paparao Golusu 1, Nagul Shaik 2 1 M. Tech Scholar (CSE), Nalanda Institute of Tech, (NIT), Siddharth Nagar, Guntur, A.P, (India) 2 Assistant
ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012
MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Description: This five-day instructor-led course teaches students how to design and implement a BI infrastructure. The
SQL Server 2008 Core Skills. Gary Young 2011
SQL Server 2008 Core Skills Gary Young 2011 Confucius I hear and I forget I see and I remember I do and I understand Core Skills Syllabus Theory of relational databases SQL Server tools Getting help Data
Application of Data Warehouse and Data Mining. in Construction Management
Application of Data Warehouse and Data Mining in Construction Management Jianping ZHANG 1 ([email protected]) Tianyi MA 1 ([email protected]) Qiping SHEN 2 ([email protected])
A Critical Review of Data Warehouse
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 95-103 Research India Publications http://www.ripublication.com A Critical Review of Data Warehouse Sachin
Efficient Integration of Data Mining Techniques in Database Management Systems
Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database
Graphical Web based Tool for Generating Query from Star Schema
Graphical Web based Tool for Generating Query from Star Schema Mohammed Anbar a, Ku Ruhana Ku-Mahamud b a College of Arts and Sciences Universiti Utara Malaysia, 0600 Sintok, Kedah, Malaysia Tel: 604-2449604
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
DBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
Programming the K-means Clustering Algorithm in SQL
Programming the K-means Clustering Algorithm in SQL Carlos Ordonez Teradata, NCR San Diego, CA 92127, USA [email protected] ABSTRACT Using SQL has not been considered an efficient and feasible
Turkish Journal of Engineering, Science and Technology
Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server
Capturing Database Transformations for Big Data Analytics
Capturing Database Transformations for Big Data Analytics David Sergio Matusevich University of Houston Organization Introduction Classification Program Case Study Conclusions Extending ER Models to Capture
University of Gaziantep, Department of Business Administration
University of Gaziantep, Department of Business Administration The extensive use of information technology enables organizations to collect huge amounts of data about almost every aspect of their businesses.
CHAPTER 5: BUSINESS ANALYTICS
Chapter 5: Business Analytics CHAPTER 5: BUSINESS ANALYTICS Objectives The objectives are: Describe Business Analytics. Explain the terminology associated with Business Analytics. Describe the data warehouse
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
Optimizing Your Data Warehouse Design for Superior Performance
Optimizing Your Data Warehouse Design for Superior Performance Lester Knutsen, President and Principal Database Consultant Advanced DataTools Corporation Session 2100A The Problem The database is too complex
Instant SQL Programming
Instant SQL Programming Joe Celko Wrox Press Ltd. INSTANT Table of Contents Introduction 1 What Can SQL Do for Me? 2 Who Should Use This Book? 2 How To Use This Book 3 What You Should Know 3 Conventions
CHAPTER 4: BUSINESS ANALYTICS
Chapter 4: Business Analytics CHAPTER 4: BUSINESS ANALYTICS Objectives Introduction The objectives are: Describe Business Analytics Explain the terminology associated with Business Analytics Describe the
Microsoft Excel 2010 Pivot Tables
Microsoft Excel 2010 Pivot Tables Email: [email protected] Web Page: http://training.health.ufl.edu Microsoft Excel 2010: Pivot Tables 1.5 hours Topics include data groupings, pivot tables, pivot
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Creating BI solutions with BISM Tabular. Written By: Dan Clark
Creating BI solutions with BISM Tabular Written By: Dan Clark CONTENTS PAGE 3 INTRODUCTION PAGE 4 PAGE 5 PAGE 7 PAGE 8 PAGE 9 PAGE 9 PAGE 11 PAGE 12 PAGE 13 PAGE 14 PAGE 17 SSAS TABULAR MODE TABULAR MODELING
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Content Problems of managing data resources in a traditional file environment Capabilities and value of a database management
When to consider OLAP?
When to consider OLAP? Author: Prakash Kewalramani Organization: Evaltech, Inc. Evaltech Research Group, Data Warehousing Practice. Date: 03/10/08 Email: [email protected] Abstract: Do you need an OLAP
Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries
Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters
Algorithms and Optimizations for Big Data Analytics: Cubes. Carlos Ordonez University of Houston USA
Algorithms and Optimizations for Big Data Analytics: Cubes Carlos Ordonez University of Houston USA Goals of talk State of the art in large-scale analytics, including big data Contrast SQL/UDFs and MapReduce
BUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
Building Data Cubes and Mining Them. Jelena Jovanovic Email: [email protected]
Building Data Cubes and Mining Them Jelena Jovanovic Email: [email protected] KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Designing Business Intelligence Solutions with Microsoft SQL Server 2012
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Course 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
DATABASE DESIGN AND IMPLEMENTATION II SAULT COLLEGE OF APPLIED ARTS AND TECHNOLOGY SAULT STE. MARIE, ONTARIO. Sault College
-1- SAULT COLLEGE OF APPLIED ARTS AND TECHNOLOGY SAULT STE. MARIE, ONTARIO Sault College COURSE OUTLINE COURSE TITLE: CODE NO. : SEMESTER: 4 PROGRAM: PROGRAMMER (2090)/PROGRAMMER ANALYST (2091) AUTHOR:
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing
1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing 2. What is a Data warehouse a. A database application
SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days
SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days SSIS Training Prerequisites All SSIS training attendees should have prior experience working with SQL Server. Hands-on/Lecture
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
SAS BI Course Content; Introduction to DWH / BI Concepts
SAS BI Course Content; Introduction to DWH / BI Concepts SAS Web Report Studio 4.2 SAS EG 4.2 SAS Information Delivery Portal 4.2 SAS Data Integration Studio 4.2 SAS BI Dashboard 4.2 SAS Management Console
Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
DATA CUBES E0 261. Jayant Haritsa Computer Science and Automation Indian Institute of Science. JAN 2014 Slide 1 DATA CUBES
E0 261 Jayant Haritsa Computer Science and Automation Indian Institute of Science JAN 2014 Slide 1 Introduction Increasingly, organizations are analyzing historical data to identify useful patterns and
SQL Server An Overview
SQL Server An Overview SQL Server Microsoft SQL Server is designed to work effectively in a number of environments: As a two-tier or multi-tier client/server database system As a desktop database system
LVQ Plug-In Algorithm for SQL Server
LVQ Plug-In Algorithm for SQL Server Licínia Pedro Monteiro Instituto Superior Técnico [email protected] I. Executive Summary In this Resume we describe a new functionality implemented
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives
Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved
Tracking System for GPS Devices and Mining of Spatial Data
Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja
SQL Server. 2012 for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach
TRAINING & REFERENCE murach's SQL Server 2012 for developers Bryan Syverson Joel Murach Mike Murach & Associates, Inc. 4340 N. Knoll Ave. Fresno, CA 93722 www.murach.com [email protected] Expanded
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Business Intelligence, Analytics & Reporting: Glossary of Terms
Business Intelligence, Analytics & Reporting: Glossary of Terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Ad-hoc analytics Ad-hoc analytics is the process by which a user can create a new report
1. INTRODUCTION TO RDBMS
Oracle For Beginners Page: 1 1. INTRODUCTION TO RDBMS What is DBMS? Data Models Relational database management system (RDBMS) Relational Algebra Structured query language (SQL) What Is DBMS? Data is one
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Data Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
Unit -3. Learning Objective. Demand for Online analytical processing Major features and functions OLAP models and implementation considerations
Unit -3 Learning Objective Demand for Online analytical processing Major features and functions OLAP models and implementation considerations Demand of On Line Analytical Processing Need for multidimensional
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days
or 2008 Five Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY. Maria Kowal, Galina Setlak
174 No:13 Intelligent Information and Engineering Systems IMPLEMENTATION OF DATA WAREHOUSE SAP BW IN THE PRODUCTION COMPANY Maria Kowal, Galina Setlak Abstract: in this paper the implementation of Data
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
