Improved Data mining approach to find Frequent Itemset Using Support count table

Size: px
Start display at page:

Download "Improved Data mining approach to find Frequent Itemset Using Support count table"

Transcription

1 Improved Data mining approach to find Frequent Itemset Using Support count table Ramratan Ahirwal 1, Neelesh Kumar Kori 2 and DrYK Jain 3 1 Samrat Ashok Technological Institute Vidisha (M P) India 2 Samrat Ashok Technological Vidisha (M P) India 3 Samrat Ashok Technological Institute Vidisha (M P) India Abstract: Mining frequent item sets has been widely studied over the last decade Past research focuses on mining frequent itemsets from static database In many of the new applications mining time series and data stream is an important task now Last decade, there are mainly two kinds of algorithms on frequent pattern mining One is Apriori based on generating and testing, the other is FP-growth based on dividing and conquering, which has been widely used in static data mining But with the new requirements of data mining, mining frequent pattern is not restricted in the same scenario In this paper we focus on the new miming algorithm, where we can find frequent pattern in single scan of the database and no candidate generation is required To achieve this goal our algorithm employ one table which retain the information about the support count of the itemset and the table is virtual for static database means generated whenever required to generate frequent items and may be useful for time series database So our algorithm is suitable for static as well as for dynamic data mining Result shows that the algorithm is useful in today s data mining environment Keywords: Apriori, Association Rule, Frequent Pattern, Data Mining 1 INTRODUCTION Mining data streams is a very important research topic and has recently attracted a lot of attention, because in many cases data is generated by external sources so rapidly that it may become impossible to store it and analyze it offline Moreover, in some cases streams of data must be analyzed in real time to provide information about trends, outlier values or regularities that must be signaled as soon as possible The need for online computation is a notable challenge with respect to classical data mining algorithms [1], [2] Important application fields for stream mining are as diverse as financial applications, network monitoring, security problems, telecommunication networks, Web applications, sensor networks, analysis of atmospheric data, etc The innovation in computer science have made it possible to acquire and store enormous amounts of data digitally in databases, currently giga or terabytes in a single database and even more in the future Many fields and systems of human activity have become increasingly dependent on collected, stored, and processed information However, the abundance of the collected data makes it laborious to find essential information in it for a specific purpose Data mining is the analysis of (often large) observational datasets from the database, data warehouse or other large repository incomplete, noisy, ambiguous, the practical application of random data to find unsuspected relationships and summarize the data that are both understandable and useful to the data owner It is a means that data extraction, cleaning and transformation, analysis, and other treatment models, and automatically discovers the patterns and interesting knowledge hidden in large amounts of data, this helps us make decisions based on a wealth of data Information communication mode of software development lies in how to collection, analysis, and mine out the hidden useful information in the various data from information communication between developers and the staff interaction with manages, and then used the knowledge to make decision oustead College uses database technology to manage the library currently Its main purpose is to facilitate the procurement of books, cataloging, and circulation management In order to better satisfy the needs of readers, we must to explore the needs of readers, to provide the information which they need initiatively Most current library evaluation techniques focus on frequencies and aggregate measures; these statistics hide underlying patterns Discovering these patterns is the key that use library services [3] Data mining is applied to library operations [4]With the fast development of the technology and the more requirements of the users, the dynamic elements in data mining are becoming more important, including dynamic databases and the knowledge bases, users' interestingness and the data varying with time and space I order to solve the problems such as low effectiveness; high randomness and hard implementation in dynamic mining, more research on dynamic data mining have been done In [5][6], an evolutionary immune mechanism was proposed based on the fact that the elements involved in the domains could be modeled as the ones in immune models It focused on how to utilize the relationship between antigens and antibodies in a dynamic data mining such as an Volume 1, Issue 2 July-August 2012 Page 195

2 incremental mining However, the sole immune mechanism and relative algorithm runs more effectively only on incremental situations rather than on others Its performance and function have to be improved when used in more complex and dynamic environments like Web We provide here an overview of executing data mining services and association rule The rest of this paper is arranged as follows: Section 2 introduces Data Mining and KDD; Section 3 describes about Literature review Section 4 shows the description of proposed work Section 5 result analysis of the algorithm and proposed work Section 6 describes the Conclusion and outlook 2 DATA MINING AND KDD Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both Data mining software is one of a number of analytical tools for analyzing data It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases There are several algorithm are devised for this[5]the process is shown in Figure 1 Although data mining is a relatively new term, the technology is not Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost At an abstract level, the KDD field is concerned with the development of methods and techniques for making sense of data The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easily) into other forms that might be more compact (for example, a short report), more abstract approximation or model of the process that generated the data), or more useful (for example, a predictive model for estimating the value of future cases) At the core of the process is the application of specific data-mining methods for pattern discovery and extraction The traditional method of turning data into knowledge relies on manual analysis and interpretation For example, in the health-care industry, it is common for specialists to periodically analyze current trends and changes in health-care data, say, on a quarterly basis The specialists then provide a report detailing the analysis to the sponsoring health-care organization; this report becomes the basis for future decision making and planning for health-care management In a totally different type of application, planetary geologists sift through remotely sensed images of planets and asteroids, carefully locating and cataloging such geologic objects of interest as impact craters Be it science, marketing, finance, health care, retail, or any other field, the classical approach to data analysis relies fundamentally on one or more analysts becoming intimately familiar with the data and serving as an interface between the data and the users and products For these (and many other) applications, this form of manual probing of a data set is slow, expensive, and highly subjective In fact, as data volumes grow dramatically, this type of manual data analysis is completely impractical in many domains Databases are increasing in size in two ways: (1) The number N of records or objects in the database and (2) The number d of fields or attributes to an object Figure 1: Data Mining Algorithm Databases containing on the order of N = 109 objects are becoming increasingly common, for example, in the astronomical sciences Similarly, the number of fields d can easily be on the order of 102 or even 103, for example, in medical diagnostic applications Who could be expected to digest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially The need to scale up human analysis capabilities to handling the large number of bytes that we can collect is both economic and scientific Businesses use data to gain competitive advantage, increase efficiency, and provide more valuable services Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in Because computers have enabled humans to gather more data than we can digest, it is only natural to turn to computational techniques to help us Volume 1, Issue 2 July-August 2012 Page 196

3 unearth meaningful pattern and structure from the massive volumes of data Hence, KDD is an attempt to address a problem that the digital information era made a fact of life for all of us: data overload 3 LITERATURE REVIEW In 2011, jinwei Wang et al [12] proposed to conquer the shortcomings and deficiencies of the existing interpolation technique of missing data, an interpolation technique for missing context data based on Time-Space Relationship and Association Rule Mining (TSRARM) is proposed to perform spatiality and time series analysis on sensor data, which generates strong association rules to interpolate missing data Finally, the simulation experiment verifies the rationality and efficiency of TSRARM through the acquisition of temperature sensor data In 2011, M Chaudhary et al [13] proposed new and more optimized algorithm for online rule generation The advantage of this algorithm is that the graph generated in our algorithm has less edge as compared to the lattice used in the existing algorithm The Proposed algorithm generates all the essential rulesalso and no rule is missing The use of non redundant association rules help significantly in the reduction of irrelevant noise in the data mining process This graph theoretic approach, called adjacency lattice is crucial for online mining of data The adjacency lattice could be stored either in main memory or secondary memory The idea of adjacency lattice is to pre store a number of large item sets in special format which reduces disc I/O required in performing the query In 2011,Fu et al [14] analyzes Real-time monitoring data mining has been a necessary means of improving operational efficiency, economic safety and fault detection of power plant Based on the data mining arithmetic of interactive association rules and taken full advantage of the association characteristics of real-time test-spot data during the power steam turbine run, the principle of mining quantificational association rule in parameters is put forward among the real-time monitor data of steam turbine Through analyzing the practical run results of a certain steam turbine with the data mining method based on the interactive rule, it shows that it can supervise stream turbine run and condition monitoring, and afford model reference and decision-making supporting for the fault diagnose and condition-based maintenance In 2011,Xin et al [15] analyzes that use association rule learning to process statistical data of private economy and analyze the results to improve the quality of statistical data of private economy Finally the article provides some exploratory comments and suggestions about the application of association rule mining in private economy statistics 4 PROPOSED WORK AND ALGORITHM The frequent itemset mining is introduced in [2] by Agrawal and Srikant To facilitate our discussion; we give the formal definitions as follows Let I = (i 1, i 2, i 3, i m ) be a set of items An itemset X is a subset of I X is called k-itemset if X = k; where k is the size (or length) of the itemset A transaction T is a pair (tid; X), where tid is a unique identifier of a transaction and X is an itemset A transaction (tid;x) is said to contain an itemset Y iff Y X: A dataset D is a set of transactions Given a dataset D, the support of an itemset X, denoted as Supp(X), is the fraction of transactions in D that contain X An itemset X is frequent if Supp (X) is no less than a given threshold S 0 An important property of the frequent itemsets, called the Apriori property, is that every nonempty subset of a frequent itemset must also be frequent The problem of finding frequent itemsets can be specified as: given a dataset D and a support threshold S 0 ; to find any itemset whose support in D is no less than S 0 It is clear that the Apriori algorithm needs at most l + 1,scans of database D if the maximum size of frequent itemset is l:on the context of data streams, to avoid disk access, previous studies focus on finding the approximation of frequent itemsets with a bound of space complexity Mining frequent itemsets in static databases, all the frequent itemsets and their support counts derived from the original database are retained When the transactions are added or expired, the support counts of the frequent itemsets contained in them are recomputed By resuing the frequent itemsets and their counts retained, the number of candidate itemsets generated during the mining process can be reduced Later to rescan the original database is required because non-frequent itemsets can be frequent after the database is updated Therefore they cannot work without seeing the entire database and cannot be applied to data stream In our approach we introduce new method in which we required only single scan of database D to count the support of each itemset and no candidate generation and pruning is required to find the frequent itemsets So our algorithms reduce the disk access time and directly find the frequent itemset by using support count table This method is application for static database as well as for dynamic database if the table is created at the initial stage 41 Support Cont Table: As state previous that every itemset X of transaction T is a subset of I (X I) and a set of such transactions is the database D So in database D every transaction itemset X will be an element of 2 I -1, where 2 I is a power set of I Power set of I contain all the subsets of I that may be in the form of transactions itemset in the transaction database D except Hence our algorithm employ one table that s name is support count table That table Volume 1, Issue 2 July-August 2012 Page 197

4 assumes as virtual and created when required to finding frequent itemset The Length of the table is (2 I -1) 2 Two field of attributes are itemset and support count In this table we make entries of frequency count of each itemset that are observed in transaction database The frequency count of each itemset is the count of the occurrence of such itemset in transactional database D This table is generated and may be stored in cache memory till the frequent itemset are not found Generated table may be used for stationary database as well as for time series database Table can be given as follows given below 42 Entries in Support count table: Support count table is a table that may be useful to find frequent itemset from static datasets as well as from stream line dataset where we used windowing concept In static database this table may be created when we want to analyze the database by single scan of the database and make entries in the table for every transaction In support counts table initially all the entries of support count of each itemsets are set to zero If we are using database D that is static, fixed then we update the table by single scanning of the database D and make entries of each itemset in the table For each transaction itemset X in D find the corresponding itemset in table and increment the count of that itemset In this way for each T we make entries Later may retain the table in memory till the observation not complete So the added or expired transactions only required to update the table If we consider the database D as random or stream line database then the table may be more useful because every incoming or expired transaction only required to update the table by incrementing or decrementing the corresponding itemset and this table may be stored in efficient way so we can use it to find the frequent item sets or association rules In this approach we are not required to save the database in the disk memory only necessary to save the table and used whenever necessary to find frequent itemset Table 1: Support count table S T NO Itemset (A) support count (S count ) 1 2 I -1 For example Let I=(i 1,i 2,i 3,i 4 ) be the set of items and the different types of itemset that may be generated from the I are {i 1 },{i 2 },{i 3 } {i 1,i 2,i 3,i 4 }Then all transaction itemset X that may occur in database D are all will be any subset of I and equal to itemset Now table created initially as given below Table 2: Initial support count table for I=(i 1,i 2,i 3,i 4 ) No Itemset (A) Support count(s count ) 1 {i1} 0 2 {i2} 0 3 {i3} 0 4 {i4} 0 5 {i1,i2} 0 6 {i1,i3} 0 7 {i1,i4} 0 8 {i2,i3} 0 9 {i2,i4} 0 10 {i3,i4} 0 11 {i1,i2,i3} 0 12 {i1,i2,i4} 0 13 {i1,i3,i4} 0 14 {i2i3,i4} 0 15 {i1,i2,i3,i4} 0 43 Proposed Method to find frequent itemset In our proposed work we are giving the method that may be useful for static as well as for stream line database to find frequent itemset In our proposed work we employ the support count table that required only to scaning the database once to make the entries in the table for each transaction the table retains the information till the observation not complete or frequent itemset not found When the trasactions are added into dataset or expired from the dataset simultaneously update the table The updated support count table has the frequency count of each itemset To find the frequent itemset for any threshold value we scan the table not the database As in A-priori we are required l+1 scan of the dataset and generate the candidates to find frequent set Our approach has only single scan of database and no candidate generation is required Table has entries of frequency count of every itemset but not the total support count of that itemset The frequency count of each itemset is the count of the occurrence of such itemset in transactional database D so to find frequent itemset we are required to find the total support count of that itemset, Total support count of an itemset is the count of the occurrence of total items of that itemset in the no of transactions in D This total count in our scheme is calculated by scanning the table and then found total support count compared with the threshold S 0 if the count is greater than the threshold then itemset is included in frequent set This procedure is repeated for every itemset to find frequent them Algorithm: To find frequent itemset Input: A database D and the support threshold S 0 Output: frequent itemsets F itemset Method Volume 1, Issue 2 July-August 2012 Page 198

5 Step:1 Scan the transaction database D and update the Support count table S T As given in sec42, F itemset={ } Step:2 for ( i=1; i<2 I ; i++) //for each itemset A in S T repeat the steps //2 I gives total element in power set of I T Count =0; //Total count Step3: for (j=1; j< 2 I ; j++) // Repeat step3 to find total count Step:31 If Ai Aj Step:4 If (Tcount S 0) Step:5 Go to step 2 Step:6 End T Count = T Count +Scount(j) Then F itemset = F itemset U Ai To better explain our algorithm, now we consider one example: Let I= (10, 20, 30, 40) be the set of four items & value assumed for the threshold is 2Total transactions in D are considered 15Table of transactions of D is given below: ti transactions d 1 {10} 2 {10,20} 3 {30,40} 4 {10,20,30,40 } 5 {10,30} 6 {10,30} 7 {30,40} 8 {20,30,40} 9 {20,30,40} 10 {10,20,30} 11 {20,30} 12 {40} 13 {20,30} 14 {10,20,30} 15 {10} Step1: By scanning the database the table of support count will be as follows: Given in table3 Step2: To find frequent itemset we make use of support count table given below as follows: Table 3: Frequency count for above example No Itemset (A) Supportcount(Scount) 1 {10} 2 2 {20} 0 3 {30} 0 4 {40} 1 5 {10,20} 1 6 {10,30} 2 7 {10,40} 0 8 {20,30} 2 9 {20,40} 0 10 {30,40} 2 11 {10,20,30} 2 12 {10,20,40} 0 13 {10,30,40} 0 14 {2030,40} 2 15 {10,20,30,40 } 1 To check itemset {10} is frequent or not, we obtain the total support count by scaning the support count table for {10}, so from the table total support of {10} is 8This value of total support count is compared with threshold value 2, since threshold value is 2 and less than the total count, so the itemset {10} is frequent itemset and included in F itemset This process is repeated for every itemset In such a way we get every frequent itemset using support count table Frequent itemset for the given dataset is: F itemset ={{10},{20},{30},{40},{10,20},{10,30},{20,30},{ 20,40},{30, 40}, {10,20,30},{20,30,40}} 5 RESULT ANALYSIS To study the performance of our proposed algorithm, we have done several experiments The experimental environment is intel core processor with operating system is window XP The algorithm is implemented with java netbeans 71The meaning of used parameters are as follows D for transaction database, I for no of items in transactions and S 0 for MIN support Table 4 shows the results for execution time in sec when I=5 and transactional database D scale-up from 50 to 1000 and MINsupport S scale-up from 2 to 8We see from the table Volume 1, Issue 2 July-August 2012 Page 199

6 that when in rows we scale-up the MIN support time for exection is linearly decreasing and scale-up the database D time is increasing but not in some linear way Table 4: Execution time(s)-when D scale-up from 50 to 1000 & S scale-up from 2 to 8 No of Different Minimum Support(S) transactions Figure 4: Comparison of execution time (s) for MIN support (S 0 =2) with algorithm given in reference [16] Figure4 shows the comparison of our proposed algorithm execution time with S 0 =2 and database D scale-up from 50 to 175 Comparison result shows that our approach gives some better performance than the method proposed in reference [16] Execution Time in Sec Figure 2: Execution time(s), MIN support (S 0 =2); Figure 2 shows the algorithm execution time {for MIN support (S 0 =2), I=5} is increasing almost linearly with the increasing of dataset size It can be concluded our algorithm has a good scalable performance Now later to examine the scalability performance of our algorithm we increased the dataset D from 1000 to 6000 with same parameter MIN support (S 0 =2), I=5, result is given in figure 5 Figure 3: Execution time(s), Transaction database (D=200); No of Transactions Figure 5: Scale-up: Number of transactions 6 CONCLUSION AND OUTLOOK Data mining, which is the exploration of knowledge from the large set of data, generated as a result of the various data processing activities Frequent Pattern Mining is a very important task in data mining The previous approaches applied to generate frequent set generally adopt candidate generation and pruning techniques for the satisfaction of the desired objective In this paper we present an algorithm which is useful in data mining task and knowledge discovery without candidate generation and our approach reduce the disk access time and directly find the frequent itemset by using support count table The proposed method work well with static dataset by using support count table as well as for mining streams requires fast, real-time processing in order to keep up with the high data arrival rate and mining results are expected to be available within short response timewe also proof the algorithm for static dataset by the concerning graph results Volume 1, Issue 2 July-August 2012 Page 200

7 In this paper we improve the performance by without candidate values The experiment indicates that the efficiency of the algorithm is faster and some efficient than presented algorithm of itemset mining REFERENCES [1] M M Gaber, A Zaslavsky, and S Krishnaswamy, Mining data streams: A review, ACM SIGMOD Record, vol Vol 34,no 1, 2005 [2] C C Aggarwal, Data Streams: models and algorithms Springer, 2007 [3] Nicholson, S The Bibliomining Process: Data Warehousing and Data Mining for Library Decision- Making Information Technology and Libraries 2003, 22(4): [4] Jiann-Cherng Shieh, Yung-Shun Lin Bibliomining User Behaviors in the Library Journal of Educational Media & Library Sciences2006, 44(1):36-60 [5] Yiqing Qin, Bingru Yang, Guangmei Xu, et al Research on Evolutionary Immune Mechanism in KDD [A] In: Proceedings of Intelligent Systems and Knowledge Engineering 2007 (ISKE2007) [C], Cheng Du, China, October, 2007, [6] Yang B R Knowledge discovery based on inner mechanism: construction, realization and application [M] USA: Elliott & Fitzpatrick Inc 2004 [7] Binesh Nair, Amiya Kumar Tripathy, Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions, Journal of Emerging Trends in Computing and Information Sciences, Volume 2 No7, JULY 2011, pp [8] ERamaraj and NVenkatesan, Bit Stream Mask- Search Algorithm in Frequent Itemset Mining, European Journal of Scientific Research ISSN X Vol27 No2 (2009), pp [9] Shilpa and Sunita Parashar, Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data, International Journal of Computer Applications ( ) Volume 31 No1, October 2011, pp [10] G Cormode and M Hadiieleftheriou, Finding frequent items in data streams, In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), pages , Auckland, New Zealand, 2008 [11] DY Chiu, YH Wu, and AL Chen, Efficient frequent sequence mining by a dynamic strategy switching algorithm, The International Journal on Very Large Data Bases (VLDB Journal), 18(1): , 2009 [12] Jinwei Wang and Haitao Li, An Interpolation Approach for Missing Context Data Based on the Time- Space Relationship and Association Rule Mining,Multimedia Information Networking and Security (MINES), 2011,IEEE [13] Chaudhary, M,Rana, A, Dubey, G, Online Mining of data to generate association rule mining in large databases, Recent Trends in Information Systems (ReTIS), 2011 International Conference on Dec 2011,IEEE [14] Fu Jun,Yuan Wen-hua, Tang Wei-xin,Peng Yu, study on Monitoring Data Mining of Steam Turbine Based on Interactive Association Rules,IEEE 2011, Computer Distributed Control and Intelligent Environmental Monitoring (CDCIEM) [15] Jinguo, Xin; Tingting, Wei, The application of association rules mining in data processing of private economy statistics, E -Business and E -Government (ICEE), 2011 IEEE [16] Weimin Ouyang and Qinhua Huang, Discovery Algorithm for mining both Direct and Indirect weighted Association Rules, Internatinal conference on Artificial Intelligence and Computational Intelligence, pages ,IEEE 2009 AUTHORS Mr Ram Ratan Ahirwal has received his BE(First) degree in Computer Science & Engineering from GEC Bhopal University RGPV Bhopal in 2002 During 2003, August he joined Samrat Ashok Technological Institute Vidisha (M P) as a lecturer in computer Science & engg Dept and complete his MTech Degree (with hons) as sponsored candidate in CSE from SATI (Engg College), Vidisha University RGPV Bhopal, (MP) India in 2009Currently he is working as assistant professor in CSE dept, SATI Vidisha He has more than 12 publications in various referred international jouranal and in international conferences to his credit His areas of interests are data mining, image processing, computer network, network security and natural language processing Neelesh Kumar Kori received his BE (First) degrees in Information Technology from UIT, BU Bhopal (MP) India in 2008 and currently he is pursuing M Tech from SATI Vidisha (MP), India in Computer Science & Engineering DrYKJain, Head CSE Deptt, SATI (Degree) Engg College Vidisha, (MP), India He has more than publications in various referred international jouranal and in international conferences to his credit His areas of interests are image processing, computer network Volume 1, Issue 2 July-August 2012 Page 201

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,

More information

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc.

Customer Analysis - Customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles, etc. Data Warehouses Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Improving Apriori Algorithm to get better performance with Cloud Computing

Improving Apriori Algorithm to get better performance with Cloud Computing Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),

More information

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH 205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Clinic + - A Clinical Decision Support System Using Association Rule Mining

Clinic + - A Clinical Decision Support System Using Association Rule Mining Clinic + - A Clinical Decision Support System Using Association Rule Mining Sangeetha Santhosh, Mercelin Francis M.Tech Student, Dept. of CSE., Marian Engineering College, Kerala University, Trivandrum,

More information

Mining the Most Interesting Web Access Associations

Mining the Most Interesting Web Access Associations Mining the Most Interesting Web Access Associations Li Shen, Ling Cheng, James Ford, Fillia Makedon, Vasileios Megalooikonomou, Tilmann Steinberg The Dartmouth Experimental Visualization Laboratory (DEVLAB)

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical

More information

Comparison of Data Mining Techniques for Money Laundering Detection System

Comparison of Data Mining Techniques for Money Laundering Detection System Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS

KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS KNOWLEDGE DISCOVERY and SAMPLING TECHNIQUES with DATA MINING for IDENTIFYING TRENDS in DATA SETS Prof. Punam V. Khandar, *2 Prof. Sugandha V. Dani Dept. of M.C.A., Priyadarshini College of Engg., Nagpur,

More information

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI Yudho Giri Sucahyo, Ph.D, CISA (yudho@cs.ui.ac.id) Faculty of Computer Science, University of Indonesia Objectives

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

An Empirical Study of Application of Data Mining Techniques in Library System

An Empirical Study of Application of Data Mining Techniques in Library System An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor

Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Engineering, Business and Enterprise

More information

Social Innovation through Utilization of Big Data

Social Innovation through Utilization of Big Data Social Innovation through Utilization of Big Data Hitachi Review Vol. 62 (2013), No. 7 384 Shuntaro Hitomi Keiro Muro OVERVIEW: The analysis and utilization of large amounts of actual operational data

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil

More information

Discovery of Maximal Frequent Item Sets using Subset Creation

Discovery of Maximal Frequent Item Sets using Subset Creation Discovery of Maximal Frequent Item Sets using Subset Creation Jnanamurthy HK, Vishesh HV, Vishruth Jain, Preetham Kumar, Radhika M. Pai Department of Information and Communication Technology Manipal Institute

More information

Optimization of ETL Work Flow in Data Warehouse

Optimization of ETL Work Flow in Data Warehouse Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu

More information

Healthcare Big Data Exploration in Real-Time

Healthcare Big Data Exploration in Real-Time Healthcare Big Data Exploration in Real-Time Muaz A Mian A Project Submitted in partial fulfillment of the requirements for degree of Masters of Science in Computer Science and Systems University of Washington

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Abstract. 1. Introduction

Abstract. 1. Introduction A REVIEW-LOAD BALANCING OF WEB SERVER SYSTEM USING SERVICE QUEUE LENGTH Brajendra Kumar, M.Tech (Scholor) LNCT,Bhopal 1; Dr. Vineet Richhariya, HOD(CSE)LNCT Bhopal 2 Abstract In this paper, we describe

More information

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

More information

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps Yu Su, Yi Wang, Gagan Agrawal The Ohio State University Motivation HPC Trends Huge performance gap CPU: extremely fast for generating

More information

Privacy Preserving Outsourcing for Frequent Itemset Mining

Privacy Preserving Outsourcing for Frequent Itemset Mining Privacy Preserving Outsourcing for Frequent Itemset Mining M. Arunadevi 1, R. Anuradha 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1 Assistant

More information

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

More information

AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP

AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP Asst.Prof Mr. M.I Peter Shiyam,M.E * Department of Computer Science and Engineering, DMI Engineering college, Aralvaimozhi.

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

More information

Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm

Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm 1 Sanjeev Rao, 2 Priyanka Gupta 1,2 Dept. of CSE, RIMT-MAEC, Mandi Gobindgarh, Punjab, india Abstract In this paper we

More information

CAS CS 565, Data Mining

CAS CS 565, Data Mining CAS CS 565, Data Mining Course logistics Course webpage: http://www.cs.bu.edu/~evimaria/cs565-10.html Schedule: Mon Wed, 4-5:30 Instructor: Evimaria Terzi, evimaria@cs.bu.edu Office hours: Mon 2:30-4pm,

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

More information

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

Multi-table Association Rules Hiding

Multi-table Association Rules Hiding Multi-table Association Rules Hiding Shyue-Liang Wang 1 and Tzung-Pei Hong 2 1 Department of Information Management 2 Department of Computer Science and Information Engineering National University of Kaohsiung

More information

A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments

A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments incent S. Tseng 1, Eric Hsueh-Chan Lu 1, Chia-Ming Tsai 1, and Chun-Hung Wang 1 Department of Computer Science and Information

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 3, Issue 5 (August 2012), PP. 56-61 A Fast and Efficient Method to Find the Conditional

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan

A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan , pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak

More information

A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers

A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers , pp.155-164 http://dx.doi.org/10.14257/ijunesst.2015.8.1.14 A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers Yunhua Gu, Bao Gao, Jin Wang, Mingshu Yin and Junyong Zhang

More information

Statistical Learning Theory Meets Big Data

Statistical Learning Theory Meets Big Data Statistical Learning Theory Meets Big Data Randomized algorithms for frequent itemsets Eli Upfal Brown University Data, data, data In God we trust, all others (must) bring data Prof. W.E. Deming, Statistician,

More information

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI

More information

E-Banking Integrated Data Utilization Platform WINBANK Case Study

E-Banking Integrated Data Utilization Platform WINBANK Case Study E-Banking Integrated Data Utilization Platform WINBANK Case Study Vasilis Aggelis Senior Business Analyst, PIRAEUSBANK SA, aggelisv@winbank.gr Abstract we all are living in information society. Companies

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.

Comparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool. International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm

More information

AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING(27-32) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING

AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING(27-32) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING Ravindra Kumar Tiwari Ph.D Scholar, Computer Sc. AISECT University, Bhopal Abstract-The recent advancement in data mining technology

More information

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network

Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

Framework model on enterprise information system based on Internet of things

Framework model on enterprise information system based on Internet of things International Journal of Intelligent Information Systems 2014; 3(6): 55-59 Published online December 22, 2014 (http://www.sciencepublishinggroup.com/j/ijiis) doi: 10.11648/j.ijiis.20140306.11 ISSN: 2328-7675

More information

DATA MINING AND WAREHOUSING CONCEPTS

DATA MINING AND WAREHOUSING CONCEPTS CHAPTER 1 DATA MINING AND WAREHOUSING CONCEPTS 1.1 INTRODUCTION The past couple of decades have seen a dramatic increase in the amount of information or data being stored in electronic format. This accumulation

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset P.Abinaya 1, Dr. (Mrs) D.Suganyadevi 2 M.Phil. Scholar 1, Department of Computer Science,STC,Pollachi

More information

Business Lead Generation for Online Real Estate Services: A Case Study

Business Lead Generation for Online Real Estate Services: A Case Study Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Data Outsourcing based on Secure Association Rule Mining Processes

Data Outsourcing based on Secure Association Rule Mining Processes , pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim

More information

PRIVACY PRESERVING ASSOCIATION RULE MINING

PRIVACY PRESERVING ASSOCIATION RULE MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Big Data. Introducción. Santiago González <sgonzalez@fi.upm.es>

Big Data. Introducción. Santiago González <sgonzalez@fi.upm.es> Big Data Introducción Santiago González Contenidos Por que BIG DATA? Características de Big Data Tecnologías y Herramientas Big Data Paradigmas fundamentales Big Data Data Mining

More information

Open Access Research and Realization of the Extensible Data Cleaning Framework EDCF

Open Access Research and Realization of the Extensible Data Cleaning Framework EDCF Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 2039-2043 2039 Open Access Research and Realization of the Extensible Data Cleaning Framework

More information

Significant Interval and Frequent Pattern Discovery in Web Log Data

Significant Interval and Frequent Pattern Discovery in Web Log Data Significant Interval and Frequent Pattern Discovery in Web Log Data 29 Dr. Kanak Saxena 1 and Mr. Rahul Shukla 2 1 Professor in Computer Application Department. R.G.P.V., S.A.T.I. Vidisha, M.P., India

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Integrated Data Mining and Knowledge Discovery Techniques in ERP

Integrated Data Mining and Knowledge Discovery Techniques in ERP Integrated Data Mining and Knowledge Discovery Techniques in ERP I Gandhimathi Amirthalingam, II Rabia Shaheen, III Mohammad Kousar, IV Syeda Meraj Bilfaqih I,III,IV Dept. of Computer Science, King Khalid

More information