An Algorithm to Evaluate Iceberg Queries for Improving The Query Performance
|
|
|
- Conrad Mosley
- 10 years ago
- Views:
Transcription
1 INTERNATIONAL OPEN ACCESS JOURNAL ISSN: OF MODERN ENGINEERING RESEARCH (IJMER) An Algorithm to Evaluate Iceberg Queries for Improving The Query Performance M.Laxmaiah 1, A.Govardhan 2 1 Department of Computer Science and Engineering, CMR Engineering College, Kandlakoya (V) Medchal (M), Hyderabad, Telangana, India , [email protected] 2 School of Information Technology, Jawaharlal Nehru Technological University, Hyderabad, Telangana, India , [email protected] ABSTRACT: In recent years, the Iceberg query evaluation is attracted more number of researchers, scientists, and decision makers. The reason behind this is demand of scalability and efficiency. Always researchers try to find the best methods of computation which takes limited computing resources for large databases. The Iceberg Query (IB) analysis showing that the IB queries are consuming more time than association rule formation from the datasets. So researchers are working continuously to solve the problem of the IB query evaluation in the domain of data warehousing, data mining, and information retrieval systems. As the result, many novel ideas and techniques have been generated in IB query evaluation process. In this paper, we proposed a novel algorithm to evaluate iceberg queries in order to improve performance of iceberg queries. Keywords: Data warehouse, Iceberg query, Threshold, Sanpling I. INTRODUCTION The rapid increase of the data warehouse (DW) repositories causes evolution special type of queries.these queries have the feature of the very large input when compared to the output. In recent times IB queries are identified as an important query. These are belongs to many categories of different domain of applications [1]. These applications can be found in web mining, data mining [2], and information retrieval systems [12]. Queries later have been extended to data cubes[9,10]. IB queries are executed on very large databases of several petabyte in size. The main important methods of course count algorithm includes: For every target array of counters are maintained in memory. For large datasets it cannot scale well for user specified threshold value T on the sorted relation R on the disk. The Sampling and Bucket Counting are the building blocks for the proposed algorithm Sampling - This technique [15] samples a small number of records from the relation R, aggregates and retrieves the records that relatively go above the T. Bucket counting - Rather than allocating a counter for each value, a counter is allocated for a group of different values. High frequency is needed to split the values into groups. These building blocks generates False Positive Values that are considered as candidates for the final result but do not exceed the T. Sampling also causes False Negatives Values that should be in the final result but are not found as candidates. The common goal of all the algorithms [8] is to reduce the number of false positives. They can generally be described using these stages: Compute a candidate set f using sampling. Execute bucket counting over R with some dependency on f. Select the values that exceed the threshold using another scan of R. 1.1 THE PROPERTIES OF IB QUERY ALGORITHMS The choice of an algorithm for solving the IB query depends on its properties. The desired properties of the IB query algorithms are: Number of passes: The number of passes required over the data must be less so that the evaluation time of the query is also becomes less. Determinism: The running time of the algorithm can be deterministic or randomized and it can be deterministic in nature. Accuracy: This represents the error of the output of the algorithm. Some algorithms produce exact result 66 PAGE
2 (i.e. zero error) and some algorithms produce approximate results with bounded or unbounded error. Sensitivity to the data order: Some algorithms are sensitive to the order of the data and that may produce very inaccurate result for some orders. Memory requirement: When memory is consumed, small memory requirement is preferred. Small requirements are preferable. Also, we can derive a tight upper bound on this requirement for some algorithms. Management: Ease of tuning the hard to manage large parameters of the algorithm. Disk space: The extendable disk space required by an algorithm. Data distribution: Some algorithms are designed for certain distributions and they may fail for other distributions. Parallelization property: Some algorithms are inherently highly parallelizable. In contrast, other algorithms are hard to parallelize. Materialization: Some algorithms must work on materialized relations, which may incur additional disk space and increase the overall cost of the algorithm. 1.2 DISTRIBUTED DATABASE SYSTEMS In recent years the demands for distributed database systems (DDSs) have been increased. The basic reason for this is decreasing of hardware cost and increasing of software implementation cost. Some of the important advantages presented by distributed systems are reliability, availability, easily maintainable, and cheap. Some of these advantages raised other difficulties in distributed systems (DSs).The main difficulty is to process the queries capably; where the data needed to processes the queries are stored at multiple locations. Retrieving data from multiple locations involves transmission of data through networks. The challenging part in DS is to design an efficient query processing techniques to decrease the overall cost including communication cost. The query optimization difficulty can be categorized in to two main approaches: Reduce the transfer cost across the network by decreasing the data transfer rate across it. Use parallel processing method in order to reduce query response time. 1.3 CENTRALIZED DATABASE SYSTEMS The storage space and processing time are the major constraints in centralized database [18].Conventional data cubes compute the complete group-by partitions for every combination of possible group attributes. The major problem with the cube operator is the computation and storage of large size data. The major drawback of the cube computation [3, 4 5, and 6] is it has exponential number of dimensions. The more number of the algorithms are proposed by researchers to handle the space complexity problem are performing pre computation of group by operation on subset attributes or by using online aggregation. Computer analysts are often responsible for computing more number of attribute values by performing aggregate operations on larger databases. An IB query is a unique type aggregation query which computes aggregate values to a user specified T (Threshold) value. The users can extract highly important aggregate values that often carry more significant information for their business. the general representation of an ib query on a relation s (d 1, d 2 d n ) is specified below: select d i, d j d m, agg (*) from r group by d i, d j d m having agg (*) > = t // agg represents aggregation functions Where d i, d j d m are representing a subset of attributes in Relation r and these are treated as aggregate attributes. The greater than or equal to (>=) symbol is used to represent a comparison predicate. The main focus is on IB query with aggregation function like count, max, min, avg, sum and rank contains a property called as an anti-monotone property. The IB queries have in a fascinating an anti-monotone property for all its aggregate functions and predicates. Suppose the count of a main group is below Threshold values T then the count of any super group also can be below Threshold value T. With increase in datasets size the efficiency of the IB Query goes down drastically in other words IB Queries fail to scale to large datasets and this has been created necessity to discover new type of efficient query evaluation methods in order to process them easily. An IB query is used for extracting data from a particular database under some specified conditional parameter. Consider the following query, SELECT A, B CONT (*) FROM R GROUP BY COUNT (*) >=2 The above listed query used to retrieve count of A and B from relation R using some conditional parameters, which specifies the count of A, B greater than 2 should be selected. The above listed types of queries are called IB queries. An IB query reduces the time of fetching data from large databases. Priority is given to IB Queries because of the aggregate function used with the Query. The use of IB queries becomes so frequent in new database management systems, because of their precise data extraction in a limited time. 67 PAGE
3 The remaining paper is organized as follows; section 2 presents the brief over view of the related work. Section 3 presents evaluation of iceberg queries followed by description of random 1 Bit algorithm in section 4. The Section 5 concludes the paper. II. RELATED WORK In the recent past, the compressed BMP index technique is proposed for evaluation of IB queries [12]. The BMP index is built based on bitmap vector which consists of attribute values. The approach is gaining more popularity for column and row oriented databases [13]. In this approach, they are exposed the property of BMP index and invented efficient evaluation strategy of eliminating the scanning and processing of bigger database tables. This approach definitely speeds up IB query evaluation. They finished that compressed Bitmap indexing technique is more capable than existing algorithms of IB Query evaluation. The IB queries are special type of aggregate queries which computes aggregate functions over columns based on user defined threshold value T. These IB queries results give very useful business information to decision makers. These queries can give an opportunity to minimize its processing and execution time [14, 15]. III. ICEBERG QUERY EVALUATION The Iceberg Query Evaluation Fong et al firstly proposed probabilistic algorithms and followed by it, hybrid and bucket algorithm are developed for evaluation of IB queries efficiently [16, 17 ]. The column oriented databases are growing vertically to accommodate required space for user data, whereas row oriented databases are arranging the data in row wise [6, 7]. The column oriented databases are more powerful over row oriented databases. The sampling and course counting algorithms are basic building blocks for evaluation of IB queries and the query size is measured in order to predict IB results. This reduces their query execution time and memory size [9] but the results produced false positives and false negatives. To avoid the problems in above algorithms, hybrid strategies have been developed to achieve optimized query execution. Bin He et al [16] proposed a method to evaluate IB queries by reducing CPU and memory requirements greatly by using compressed bitmap indexing technique. A bitmap for an attribute is represented as m x n matrix. The m is a row that represents number of tuples in the database and n is the number of columns that denotes the different attribute values. The presence of an attribute in the database is represented as 1 and 0 otherwise. The bitmap original vectors are stored in the existing free memory by using compressed bitmap indexing technique. The partition algorithms, Jinuk Bae[18] et al are proposed a partition algorithm that selects the candidates by logically partitioning the given relation R and delay the partitioning to use more efficiently and buckets are filled with all useful candidates.it is observed that this algorithm is suffering from memory size, data order and data distribution. IV. THE RANDOM BIT OPERATION (RBO) ALGORITHM This algorithm takes two bitmap vectors according to count of 1 Bit values and priority Queue is created with help of the vectors. Based on the priority value, two vectors are picked up and performed AND operation between them. Algorithm: Random 1bit operation in the BMP vector. Input: Random1bitoperator (BMP vector BMP1, BMP vector BMP2, pos n) Output: IB_Results Step 1: Select BMP1, BMP2 Step 2: Select n Step 3: Scan for n bits on BMP1, BMP2 Step 4: Set n value Step 5: Execute AND operation R t BMP1 AND BMP2: n Step 6: Reassign BMP1 and BMP2 BMP1 BMP1 XOR R t ; BMP2 BMP2 XOR R t Step 7: if (Rt_1 bitcount > W1) R t IB _Results Step 8: End Figure 4.1: Algorithm for Random 1 bit calculation The proposed method also uses a bitwise AND action between the BMP vectors. The values of the BMP vector are also changed by performing XOR operation on the BMP vectors with the result of AND operation. 68 PAGE
4 R t = P AND Q; P = P MINUS R t ; Q = Q MINUS R t In above, R t is a resultant vector values and P and Q Bitmap vectors. As specified above, the method uses a random 1 bit operation before performing AND operation on Bitmap vectors. The Bitmap vectors are retrieved from the above method. Firstly, pick the n arbitrary 1 bit from the two vectors, the W1 contains n values which are related bitmap vectors (threshold value) and the count 1 bit threshold. But, if W1 1 bit count is greater than many vectors, those vectors whose 1bit count is less can be rejected from BMP vectors. No AND operation is performed on them. Relevant vectors are the vectors towards the random assignment are possible in order to perform AND operations.w1 is the minimum n random value, since if the vectors that are not in the range of W1are not considered as relevant vectors. These irrelevant vectors are pruned from BMP indices. Consider the example Q Q P q 1[ ] p 1 : [01011] p 2 :[ ] q 2 : [ q 2 [11001] p 3 : [ ] q 3 : [ q 3 : [000110] Priority queue is produced as, P Q p1 :[ ] q1: [ ] q 2 : [ q 2 : [11001] p2 :[01011] p 3 :[ ] 0110 q 3 : [000110] W 1 = 2; In the above example p 3 and q 3 are removed from the BMP indices, since they are not satisfying condition specified as W 1. The priority queue p 2 and q 1 are measured as MSB vectors and primarily AND operation is performed on those two vectors. In above, 3 is the random n bit value for the vectors p 1 and q1. Therefore AND operation can be represented as below, Q p 2 P1 :[ ] q 1 : [ ] P p 2 :[01011] q 2 : [11001] p 3 P3 :[ ] q 3 : [000110] Table 4.1: Attributes with their corresponding bit positions 69 PAGE
5 If R t has 1 bit count greater than the W 1, then it is added to the IB results. The R t is used for performing XOR operation on both P and Q attributes. If several vectors possess 1 bit count than the T after XOR operation, then that vector is added to the BMP table. The AND operation is performed again on the resultant Table.The same process is repeated until any one of the vector become empty. By the observation from the above example it can be stated that, with use of random 1bit calculation method the AND operations are drastically reduced. V. CONCLUSION In this research work, we presented an algorithm to evaluate iceberg queries using bitmap indices. The proposed approach is dynamic in nature and flexible enough to reduce the vector values with respect to dimensions. The outcome of the current research is the efficient Iceberg Query Evaluation algorithm that is capable of handling massive datasets. Empirical results imply the practical benefits of the proposed algorithm are giving better performance in evaluating iceberg queries. REFERENCES [1] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J.D.Ullman, Computing IB Queries Efficiently. In Gupta et al. [GSW98], pp [2] S. Chaudhuri and U. Dayal, An overview of data warehousing and OLAP technology, ACM SIGMOD Record, 26(1):65 74, March [3] J. Han, J. Pei, G. Dong, K. Wang, Efficient Computation of IB Cubes with Complex Measures,SIGMOD 01 [4] K.Ross and D. Srivastava, Fast computation of sparse datacubes, VLDB 97. [5] K.S. Beyer and R. Ramakrishna, Bottom-Up Computation of Sparse and IB CUBES, Proceedings of ACM SIGMOD International Conference Management of Data, pp , [6] Papadis, D, Kalnis P, Zhang, J and Tao Y Efficient OLAP Operations in spatial data warehouses, Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases, Springer, pp , [7] S. Chaudhuri and L. Gravano, Evaluating Top-k Selection Queries. In Atkinson et al. [AOV+99], pp [8] F. Deiege and T.B. Pedersen, Position List Word Aligned Hybride: Optimizing Space and Performance for Compressed BMPs, Proceedings of International Conference Extending Database Technology (EDBT), pp , [9] Amr El-Helw, Kenneth A.Ross, Bishwaranjan Bhattacharjee, ChristianA. Lang, and George Mihaila, Column-oriented query processing for row stores, In Proceedings of the International Workshop on Data Warehousing and OLAP, pages 67 74, [10] H.E. Blok, Database Optimization Aspects for Information Retrieval, Ph.D. thesis, University of Twente, Enschede, The Netherlands, April 2002, ISBN X. [11] R.S. Choenni, M.L. Kersten, A. Saad, and J. van den Akker, A Framework for Multi-Query Optimization, Report CS-R9638,CWI, Centre for Mathematics and Computer Science, Amsterdam, The Netherlands, [12] M.-C. Wu and A. P. Buchmann, Encoded BMP indexing for Data Warehouses. In Proceedings of ICDE 1998, pages , IEEE Computer Society, [13] P. Flajolet and G.N. Martin, Probabilistic counting algorithms for database applications. Journal of Computer System Sciences, 31: , [14] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. and Uthurusam, R. (Eds.), Advances in Knowledge Discovery and Data Mining, The AAAI Press/the MIT Press, [15] K.Stockinger, D. Duellmann, W. Hoschek, and E. Schikuta, Improving the performance of high energy physics analysis through BMP indices. In Proceedings of DEXA 2000, September [16] K.-L. Wu and P. Yu, Range-based BMP indexing for high cardinality attributes with skew, Technical Report RC 20449, Watson Research Division, Yorktown Heights, New York, May [17] P.E.O Neil and D.Quass, Improved Query Performance with Variant Indexes. In SIGMOD Conference, pp.38 49, [18] Jinuk Bae and Sukho Lee, Partitioning Algorithms for the Computation of Average IB Queries, Springer-Verlag, ISBN: , pp , PAGE
6 1 Mr.M.Laxmaiah is a Research Scholar in Jawaharlal Nehru Technological University, Kukatpally, Hyderabad. He is currently working as Professor of Department of Computer Science and Engineering in CMR Engineering College, Kandlakoya (V), Medchal (M), Hyderabad, Telangana, India. He has 18 Years of experience in Teaching and 5 Years of experience in Research field. He has 12 research Publications at International Journals and Conferences. His areas of interest include Database Management Systems, Cloud Computing, Compiler Design and Data warehousing & Data Mining. 2 Dr.A.Govardhan did his BE in Computer Science and Engineering from Osmania University College of Engineering, Hyderabad in 1992.M.Tech from Jawaharlal Nehru University, Delhi in 1994 and PhD from Jawaharlal Nehru Technological University, Hyderabad in 2003.He is presently Director of School of Information Technology, JNTUH, Kukatpally, Hyderabad. He has awarded many awards. He has guided more than 130 M.Tech projects and number of MCA and B.Tech projects. He has 200+ research publications at International / National Journals and Conferences. His areas of interest include Databases, Data Warehousing &Data Mining, Information Retrieval, Computer Networks, Image processing and Object Oriented Technologies. 71 PAGE
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries
Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries Kale Sarika Prakash 1, P. M. Joe Prathap 2 1 Research Scholar, Department of Computer Science and Engineering, St. Peters
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS
CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
Indexing Techniques for Data Warehouses Queries. Abstract
Indexing Techniques for Data Warehouses Queries Sirirut Vanichayobon Le Gruenwald The University of Oklahoma School of Computer Science Norman, OK, 739 [email protected] [email protected] Abstract Recently,
International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
Data Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: [email protected] 1 Introduction The promise of decision support systems is to exploit enterprise
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse
Bitmap Index as Effective Indexing for Low Cardinality Column in Data Warehouse Zainab Qays Abdulhadi* * Ministry of Higher Education & Scientific Research Baghdad, Iraq Zhang Zuping Hamed Ibrahim Housien**
DATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
Evaluation of Bitmap Index Compression using Data Pump in Oracle Database
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 3, Ver. III (May-Jun. 2014), PP 43-48 Evaluation of Bitmap Index Compression using Data Pump in Oracle
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
Turkish Journal of Engineering, Science and Technology
Turkish Journal of Engineering, Science and Technology 03 (2014) 106-110 Turkish Journal of Engineering, Science and Technology journal homepage: www.tujest.com Integrating Data Warehouse with OLAP Server
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
How To Encrypt Data With A Power Of N On A K Disk
Towards High Security and Fault Tolerant Dispersed Storage System with Optimized Information Dispersal Algorithm I Hrishikesh Lahkar, II Manjunath C R I,II Jain University, School of Engineering and Technology,
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases
Survey On: Nearest Neighbour Search With Keywords In Spatial Databases SayaliBorse 1, Prof. P. M. Chawan 2, Prof. VishwanathChikaraddi 3, Prof. Manish Jansari 4 P.G. Student, Dept. of Computer Engineering&
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
Multi-dimensional index structures Part I: motivation
Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Data Warehouse: Introduction
Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,
New Approach of Computing Data Cubes in Data Warehousing
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 14 (2014), pp. 1411-1417 International Research Publications House http://www. irphouse.com New Approach of
QOS Based Web Service Ranking Using Fuzzy C-means Clusters
Research Journal of Applied Sciences, Engineering and Technology 10(9): 1045-1050, 2015 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2015 Submitted: March 19, 2015 Accepted: April
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis
Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis Rajesh Reddy Muley 1, Sravani Achanta 2, Prof.S.V.Achutha Rao 3 1 pursuing M.Tech(CSE), Vikas College of Engineering and
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
The Cubetree Storage Organization
The Cubetree Storage Organization Nick Roussopoulos & Yannis Kotidis Advanced Communication Technology, Inc. Silver Spring, MD 20905 Tel: 301-384-3759 Fax: 301-384-3679 {nick,kotidis}@act-us.com 1. Introduction
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
SQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
Application of Predictive Analytics for Better Alignment of Business and IT
Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD [email protected] July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker
Topics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1
Slide 29-1 Chapter 29 Overview of Data Warehousing and OLAP Chapter 29 Outline Purpose of Data Warehousing Introduction, Definitions, and Terminology Comparison with Traditional Databases Characteristics
Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets
Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Pramod S. Reader, Information Technology, M.P.Christian College of Engineering, Bhilai,C.G. INDIA.
EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM
INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM Macha Arun 1, B.Ravi Kumar 2 1 M.Tech Student, Dept of CSE, Holy Mary
Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
2009 Oracle Corporation 1
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner
24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
Clustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin [email protected] [email protected] [email protected] Introduction: Data mining is the science of extracting
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
ENHANCEMENTS TO SQL SERVER COLUMN STORES. Anuhya Mallempati #2610771
ENHANCEMENTS TO SQL SERVER COLUMN STORES Anuhya Mallempati #2610771 CONTENTS Abstract Introduction Column store indexes Batch mode processing Other Enhancements Conclusion ABSTRACT SQL server introduced
Data Warehousing and OLAP Technology for Knowledge Discovery
542 Data Warehousing and OLAP Technology for Knowledge Discovery Aparajita Suman Abstract Since time immemorial, libraries have been generating services using the knowledge stored in various repositories
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
MUTI-KEYWORD SEARCH WITH PRESERVING PRIVACY OVER ENCRYPTED DATA IN THE CLOUD
MUTI-KEYWORD SEARCH WITH PRESERVING PRIVACY OVER ENCRYPTED DATA IN THE CLOUD A.Shanthi 1, M. Purushotham Reddy 2, G.Rama Subba Reddy 3 1 M.tech Scholar (CSE), 2 Asst.professor, Dept. of CSE, Vignana Bharathi
A Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Whitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com
Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing
Review. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies
Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized
Fast Contextual Preference Scoring of Database Tuples
Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation
Dimensional Modeling for Data Warehouse
Modeling for Data Warehouse Umashanker Sharma, Anjana Gosain GGS, Indraprastha University, Delhi Abstract Many surveys indicate that a significant percentage of DWs fail to meet business objectives or
Domain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
A Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor
PostgreSQL Business Intelligence & Performance Simon Riggs CTO, 2ndQuadrant PostgreSQL Major Contributor The research leading to these results has received funding from the European Union's Seventh Framework
Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB Badal K. Kothari 1, Prof. Ashok R. Patel 2 1 Research Scholar, Mewar University, Chittorgadh, Rajasthan, India 2 Department of Computer
NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
An Application of Visual Cryptography To Financial Documents
An Application of Visual Cryptography To Financial Documents L. W. Hawkes, A. Yasinsac, C. Cline Security and Assurance in Information Technology Laboratory Computer Science Department Florida State University
In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
CS54100: Database Systems
CS54100: Database Systems Date Warehousing: Current, Future? 20 April 2012 Prof. Chris Clifton Data Warehousing: Goals OLAP vs OLTP On Line Analytical Processing (vs. Transaction) Optimize for read, not
Enhanced data mining analysis in higher educational system using rough set theory
African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review
Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in SQL
International Journal of Computer Applications in Engineering Sciences [VOL III, ISSUE I, MARCH 2013] [ISSN: 2231-4946] Creation of Datasets for Data Mining Analysis by Using Horizontal Aggregation in
Jagir Singh, Greeshma, P Singh University of Northern Virginia. Abstract
224 Business Intelligence Journal July DATA WAREHOUSING Ofori Boateng, PhD Professor, University of Northern Virginia BMGT531 1900- SU 2011 Business Intelligence Project Jagir Singh, Greeshma, P Singh
DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM Itishree Boitai 1, S.Rajeshwar 2 1 M.Tech Student, Dept of
IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS
CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS In today's scenario data warehouse plays a crucial role in order to perform important operations. Different indexing techniques has been used and analyzed using
Rethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar
A Comprehensive Study on Data Warehouse, OLAP and OLTP Technology Namrata 1, Dr. Saket Bihari Singh 2 Research scholar (PhD), Professor Computer Science, Magadh University, Gaya, Bihar Abstract: Data warehouse
