Preference Mining and Data Stream Mining. Sandra de Amo IT4BI Data Mining Advanced Topics

Size: px
Start display at page:

Download "Preference Mining and Data Stream Mining. Sandra de Amo IT4BI Data Mining Advanced Topics"

Transcription

1 Preference Mining and Data Stream Mining Sandra de Amo IT4BI Data Mining Advanced Topics

2 Mining Contextual Object Preferences Mining Data Streams 5/14/13 MASTER IT4BI - UNIV-TOURS

3 Our Agenda Seminar 1 q Preference Mining: two different problems q q Label Ranking Mining (a group of users) Preference Object Mining (one unique user) Label Ranking can be solved by a set of binary classifiers Preference Object Mining Is not a Classification task! Preference Object Mining: non-contextual and contextual An algorithm for Preference Object Mining non-contextual Seminar 2 q q Mining Contextual Object Preferences Mining Data Streams: Main challenges The VFDT algorithm (or Hoeffding Decision Tree Algorithm) 5/14/13 MASTER IT4BI - UNIV-TOURS

4 Contextual and non-contextual Preferences User preferences may or may not depend on the user context: q Non contextual preferences: Lower prices are preferred than higher ones Hotels located in the city center is preferred than hotels located far away from the city center. q Contextual preferences: If I travel with my family I prefer staying in a hotel in a calm neighborhood. If I travel with my friends I prefer staying in a hotel not very far from the seashore and near nice bars and cafes. 5/14/13 MASTER IT4BI - UNIV-TOURS

5 Two techniques for Mining Contextual Preferences INPUT : A set of pairs of tuples PROFMINER ALGORITHM OUTPUT : A set of preference rules INPUT : A set of pairs of tuples DaWaK 2012 CPREFMINER ALGORITHM OUTPUT : A Bayesian Preference Network ICTAI /14/13 MASTER IT4BI - UNIV-TOURS

6 In this Seminar 1) We will present ProfMiner q Adapts known algorithms for Association Rule Mining (Apriori, Eclat) for the preference mining scenario 2) The other technique: CPrefMiner q Adapts the Bayesian Network technique for the preference mining scenario. 3) In this seminar: We will focus only on the first technique: ProfMiner 5/14/13 MASTER IT4BI - UNIV-TOURS

7 The preference data Drama Steve Spielberg War Action Johnny Depp James Cameron Tom Hanks Thriller Leonardo di Caprio Action, Tom Hanks, War Action, Stieve Spieberg, War 5/14/13 MASTER IT4BI - UNIV-TOURS

8 The preference data Notation: A: Action B: Tom Hanks C: Steve Spielberg D: War E: Leonardo di Caprio. 5/14/13 MASTER IT4BI - UNIV-TOURS

9 The preference data 5/14/13 MASTER IT4BI - UNIV-TOURS

10 Objetive Given a set of pairs of transactions (provided by the user) Find rules allowing to decide the user preferences over any pair of transaction. In the example: q A transaction corresponds to a collection of films having some common features. q For instance: transaction t1=(a,c,d) corresponds to the collection of films directed by Spielberg and whose genre contains "Action and "War" and directed by Spielberg 5/14/13 MASTER IT4BI - UNIV-TOURS

11 The Mining Problem: formalization Items (tags) Itemset (or transaction)= set of items A preference bituple is a pair (t 1,t 2 ), where t 1, t 2 are itemsets A Preference Database : a finite set of preference bituples provided by the user by clicking on tags. Contextual Preference Rules: q q Syntax : i+ > i- X i+, i- are distinct items X is a itemset i+ and i- do not appear in X X = the rule context Semantics : A preference rule r induces a preference order >r between transactions: t1 >r t2 : if t1 and t2 both contains X, t1 contains i+ and not i-, and t2 contains i- and not i+ 5/14/13 MASTER IT4BI - UNIV-TOURS

12 Example t1 = A C D t2 = A B C E r: D > E A Then: t1 >r t2 t 1 is preferred to t 2 according to rule r 5/14/13 MASTER IT4BI - UNIV-TOURS

13 Satisfaction and Contradiction Let t 1 and t 2 be transactions r = a preference rule We say that the bituple (t 1,t 2 ) satisfies r if t1 >r t2 We say that the bituple (t 1,t 2 ) contradicts r if t2 >r t1 Example: t1 = A C D, t2 = A B C E, t3 = A D E r: D > E A (t1,t2) satisfies r, since t1 >r t2 (t2, t1) contradicts r, since t1 >r t2 (t1,t3) doesn t satisfy nor contradict r. 5/14/13 MASTER IT4BI - UNIV-TOURS

14 Utility measures for preference rules Support of a rule r with respect to a set of preference bituples P q Sup(r,P) = percentage of bituples in P satisfying r Confidence of a rule r with respect to a set of preference bituples P q Conf(r,P) = percentage of bituples in P satisfying r among those who satisfy or contradict r. 5/14/13 MASTER IT4BI - UNIV-TOURS

15 Example r: D > E A Sup(r,P) = 2/5 (supported by p1 and p2) Conf(r,P) = 2/2 = 100% 5/14/13 MASTER IT4BI - UNIV-TOURS

16 Minimality If Y X then sup(i+ > i- Y, P) sup(i+ > i- X, P) and A rule i+ > i- X is said to be minimal with respect to a preference database P if there is no Y X such that : sup(i+ > i- Y, P) = sup(i+ > i- X, P) and conf(i+ > i- Y, P) = conf(i+ > i- X, P) 5/14/13 MASTER IT4BI - UNIV-TOURS

17 Important Properties (Antimonotonie) If Y X and sup(i+ > i- Y, P) N then sup(i+ > i- X, P) N (since sup(i+ > i- Y, P) sup(i+ > i- X, P) So, if a rule i+ > i- Y has a bad support all rules derived from r by increasing its contexts will also have bad support If Y X and i+ > i- Y is not minimal then i+ > i- X is not minimal. So, if a rule i+ > i- Y is not minimal all rules derived from r by increasing its contexts will not be minimal. 5/14/13 MASTER IT4BI - UNIV-TOURS

18 Mining Problem (1) Input: A preference database P σ: a minimal support threshold (0 < σ 1) κ: a minimal confidence threshold (0 < κ 1) Output: All minimal preference rules r, with support σ and confidence κ The ContPrefMiner: Adaptation of the Apriori algorithm for mining association rules (we can use any association rule mining algorithm) 5/14/13 MASTER IT4BI - UNIV-TOURS

19 Algoritmo ContPrefMiner 5/14/13 MASTER IT4BI - UNIV-TOURS

20 Problems to solve: How to use the set of rules returned by ContPrefMiner in order to predict the user preference over two transactions t1 and t2? Each preference rule give us an opinion about transactions t1 and t2 (or maybe no opinion at all!) Opinions (when they exist) may be contradictory. An ordering of transactions by considering a specific rule may not be transitive. The set of rules can be too large. 5/14/13 MASTER IT4BI - UNIV-TOURS

21 So, how to define a preference order from a set of preference rules? What does it mean "two transactions t1, t2 are comparable by a set S of preference rules"? An authority police: the best rule decides! t1, t2 are comparable by S if t1 >r t2 and r = the best preference rule in S. 5/14/13 MASTER IT4BI - UNIV-TOURS

22 How to rank the preference rules? This is a total order in the set of preference rules: irreflexive, transitive and total (all rules can be compared between each other) 5/14/13 MASTER IT4BI - UNIV-TOURS

23 Example : minsup = 0.2, minconf= 0.6 5/14/13 MASTER IT4BI - UNIV-TOURS

24 How to evaluate a preference order provided by a set of preference rules? S = set of preference rules P = preference database Precision(S,P) = percentage of bituples (t,u) in P with t >S u among those bituples which are comparable by S. Recall(S,P) = percentage of bituples (t,u) in P with t >S u. 5/14/13 MASTER IT4BI - UNIV-TOURS

25 Mining Problem (2) Input: A preference database P A set of preference rules S, an integer k > 0 Output: A subset R of S maximazing the precision and such that R k. This problem is NP-Complete! (No polynomial time algorithm so far ) Our solution: Algorithm ProfMiner an heuristic approach the solution is not exact 5/14/13 MASTER IT4BI - UNIV-TOURS

26 General Idea R := Φ (initialized as the empty set. R will be the subset of rules returned by ProfMiner) At each iteration q R := R U { r 0 }, r 0 = the best rule of S q P := P - {(t,u) (t,u) is covered by some rule of R} q S := set of rules in S which are satisfied by at least k pairs of transactions of P, Repeat until S is empty. q The parameter k : controls the size of the set R returned by ProfMiner q The set of rules returned = the user profile 5/14/13 MASTER IT4BI - UNIV-TOURS

27 Algorithm ProfMiner 5/14/13 MASTER IT4BI - UNIV-TOURS

28 Example (k = 1) r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 10 Result : R = {r 1, r 3, r 4, r 9 } 5/14/13 MASTER IT4BI - UNIV-TOURS

29 Experimental Results Three different preference databases about movies (imdb.com and MovieLens) q q P301, P3000, P30000 Attributes: Genre, Actor, Director, Year, Language q ContPrefMiner: executed with minsup= 0,001 and minconf = 0,5 CPU Intel 3 GHz, 1 GB de RAM, Windows XP 5/14/13 MASTER IT4BI - UNIV-TOURS

30 Discussion ProfMiner drastically reduces the set of rules returned by ContPrefMiner. The number of rules returned decreases as k increases Even for k = 1 there is an important reduction in the number of rules returned by the algorithm P301 : from 5319 à 108 P3000 : from 4833 à 432 P30000: from 4913 à 925 5/14/13 MASTER IT4BI - UNIV-TOURS

31 How the number of rules varies with k 5/14/13 MASTER IT4BI - UNIV-TOURS

32 Reduction of the Profile Let R k : profile returned for k Q k = Reduction coefficient for R k Q k = ( R 1 - R k ) / R 1 5/14/13 MASTER IT4BI - UNIV-TOURS

33 Precision versus Q 5/14/13 MASTER IT4BI - UNIV-TOURS

34 Recall versus Q 5/14/13 MASTER IT4BI - UNIV-TOURS

35 Some Ongoing Research Other techniques to improve the recall : many pairs of transactions cannot be compared by ProfMiner q Ranging Voting Other techniques to improve the precision: q Replace the preference database by a preference fuzzy matrix P q position (i,j) of P contains a number d, 0 d 1, standing for how much the user prefers object i to object j. 5/14/13 MASTER IT4BI - UNIV-TOURS

36 Mining Data Streams HOEFFDING DECISION TREES FOR ONLINE CLASSIFICATION and FOR BIG DATA CLASSIFICATION

37 Characteristics of Data streams Continuous flow of data EXAMPLES Network traffic Sensor data Call center records

38 Challenges Infinite length Concept-drift Concept-evolution Feature Evolution

39 Infinite Length Impractical to store and use all historical data q Requires infinite storage q And running time

40 Concept-Drift Current hyperplane Previous hyperplane A data chunk Negative instance Positive instance Instances victim of concept-drift

41 Concept-Evolution y y A D x 1 C B x y y 1 y A D Novel class X X X X X X X X X X XX X X X X X XX X X X X X X X X X X X X X X X XX X X X X X X X X X X X x 1 C B x y 2 Classification rules: R1. if (x > x 1 and y < y 2 ) or (x < x 1 and y < y 1 ) then class = + R2. if (x > x 1 and y > y2) or (x < x 1 and y > y 1 ) then class = - Existing classification models misclassify novel class instances

42 Dynamic Features Why new features evolving q Infinite data stream Normally, global feature set is unknown New features may appear q Concept drift As concept drifting, new features may appear q Concept evolution New type of class normally holds new set of features Different chunks may have different feature sets

43 Batch versus Stream Learning Settings Batch Setting: Training data are available anytime One can scan data anytime and as often one desires Amount of time for creating the model is not an important issue since models are created offline Amount of memory required to create the model is not a problematical issue Stream Setting: Only one example is processed at a time and inspected only at most once Use a very limited amount of memory The learning process must be accomplished in a limited amount of time: algorithms must be linear in the number of examples The learning algorithm must be capable of working in real-time The learned model must be ready to be used at any point

44 In this seminar We will present the method VFDT (Very Fast Decision Tree Learning) (Domingos Hulten) The algorithm do not treat conceft drift The algorithm is focused on: q q q q Learning from infinite datasets (or very, very big data sets) Learning with a very small amount of memory Learning in real time Classification The method VFDT can be generalized to other mining tasks The method VFDT has been extended to CVFDT algorithm to deal with a concept drifting scenario.

45 Past Research Scaling up decision tree learning q SPRINT(1996), RAINFOREST(2000) q q Perform batch learning of decision trees from large data sources in limited memory by performing multiple passes over the data and using external storage Such operations are not suitable for high speed streaming processing. Incremental Systems designed to work in a single pass q q q q ID5R (1989), ITI(1997) Systems like this were considered for data stream But, in some cases these methods require more effort to update the model incrementally than to rebuild the model from scratch. ITI: all the previous training data must be retained in order to revisit decisions not suitable for large data sources!!

46 General idea of the VFDT Method Tuples are not stored! As a tuple enter the system essential information (sufficient statistics) is extracted from it the tuple is discarded The Decision Tree is build incrementally As a tuple arrives, its sufficient statistics is used to update the statistics stored at the leaves of the Decision Tree built so far. After a chunck of n tuples has entered into a leaf l, a decision is made if the leaf I will be split and which attribute will be used in the splitting process.

47 Sufficient Statistics at time t Attributes: A1, A2, A3 DomA1) = {A,B}, Dom(A2) = {C,D,E}, Dom(A3) = {F,G,H,I} Number of times the value C for attribute A2 has been seen up to instant t

48 When to decide to split a leaf and how to split? Split or not to split, that is the question! In the batch scenario: q The attribute to test at a node is chosen by comparing all the available attributes and choosing the best one according to some heuristic criteria G (for instance: the information gain). q The decision to split or not to split: Compute G(X) for each attribute X1,,Xn Compute G(X0): the gain obtained by not splitting the leaf l = Entropy(l) Order the attributes (X0,X1,,Xn) according to G (in decreasing order): 1Best, 2Best, X0 = 1Best? If so, do not split the leaf If X0 1Best : split the leaf using the attribute 1Best

49 When to decide to split a leaf and how to split it? In the stream scenario: The attribute to test at a node is chosen by comparing all the available attributes and choosing the best one according to some heuristic criteria G. The decision to split or not to split: q q q q q q Compute G(X) for each attribute X1,,Xn Compute G(X0) = gain obtained by considering the majority class (without splitting the leaf) Order the attributes (X0,X1,,Xn) according to G (in decreasing order): 1Best, 2Best, X0 = 1Best? If so, do not split the leaf Can one has some guarantee that G(2Best) will not be too close to G(1Best) in the future and so 1Best would not be considered the best choice? If one can have such guarantee: split the leaf using the attribute 1Best

50 General Problem Let X a random variable (in our case, X = G(1Best) G(2Best)) X ranges from 0 to R Exercice: If X = G(1Best) G(2Best), where G = information gain (difference between entropy before and entropy after the splitting) show that R = log 2 c, where c = number of classes. The true mean of X after an infinite set of independent observations is = r The estimated mean of X after n independent observations = e We would like to affirm (with a degree of confidence (1 δ) ) that r and e are very closed if n is sufficiently large Are r, e, δ and n related??

51 The Hoeffding Bound r e ε where ε is given by ε is called the Hoeffding Bound. The Hoeffding Bound states with probability 1 δ that the true mean of a random variable of range R will not differ from the estimated mean after n independent observations by more than ε. Introduced in: W. Hoeffding Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association (1963).

52 The Hoeffding Bound Problem: How many examples we have to observe in order to decide between attribute 1Best and 2Best AND be confident that G(1Best) and G(2Best) will be reasonably distant from each other in the future? For instance: ε If after 10 observations: X 10 = G(1Best) G(2Best) = 0.3 Let ε = 0.1 In the future: X fut 0 X fut X 10 The maximum change between X 10 and X fut is 0.1: X now X fut 0.1 Thus X fut = 0.2 We are sure (with probability of 1- δ) that G(1Best) G(2Best) will be at least 0.2 in the future (an acceptable difference)

53 Experiments The Split Confidence δ = 10-7 (so 1 δ is very, very high) Number of classes = 2. So R = log 2 2 = 1

54 The VFDT Algorithm Input S: a infinite sequence of examples (a data stream) over attributes X1,,Xm G = a split evaluation function δ = one minus the desired probability of choosing the right attribute at any given node n min = grace period τ = tie-breaking limit Output: a decision tree HT

55

56 Experimental Results: Accuracy x Training Instances processed Synthetic Data : 50 numeric attributes, 2 classes Grace period = 200 tuples Each tree (with and without grace period) was allowed 10 hours to grow No grace period grace period

57 Experimental Results: Accuracy x Training Instances processed grace period 5/14/13 MASTER IT4BI - UNIV-TOURS

58 Experimental Results: Training Instances processed x Training Time grace period No grace period

59 Experimental Results: Accuracy x Training Time grace period No grace period 5/14/13 MASTER IT4BI - UNIV-TOURS

60 References A. Giacometti, A. Soulet, S. de Amo, H. Li : Mining Contextual Preference Rules for Building User Profiles - Dawak Lecture Notes in Computer Science Volume 7448, 2012, pp What you can find here related to this seminar : Details on the algorithm ProfMiner for Preference Contextual Object Mining. S. de Amo, M. L. Bueno, G. Alves: CPrefMiner: An Algorithm for Mining User Contextual Preferences based on Bayesian Networks -ICTAI IEEE 24th International Conference on (Volume:1 ) pages What you can find here related to this seminar : Details on the algorithm CPrefMiner for Preference Contextual Object Mining. This article presents another approach based on Bayesian Networks and Genetic Programming. 5/14/13 MASTER IT4BI - UNIV-TOURS

61 References Albert Bifet/Richard Kirkby: Data Stream Mining A practical Approach August 2009 What you can find here related to this seminar : A complete survey on Data Stream Mining and the MOA tool. Domingos / Hulten: Mining High-Speed Data Streams KDD 2000 Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining - Pages What you can find here related to this seminar : Details on the VFDT algorithm Domingos / Hulten: Mining Time-Changing Data Streams - KDD '01 Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining - Pages What you can find here related to this seminar : Details on the CVFDT algorithm, for mining decision trees with concept drift 5/14/13 MASTER IT4BI - UNIV-TOURS

62 References Domingos / Hulten: Mining Complex Models from Arbitrarily Large Databases in Constant Time - SIGKDD 2002 Edmonton, Alberta, Canada What you can find here related to this seminar : this article proposes a scaling-up general method that is applicable to essentially any induction algorithm based on discrete search. Ruoming Jin/ Gagan Agrawal: Efficient decision tree construction on streaming data. In Knowledge Discovery and Data Mining, pages , What you can find here related to this seminar : Another approach for decision tree construction on streaming data. Uses a different and more accurate bound than the Hoeffding bound presented in this seminar. 5/14/13 MASTER IT4BI - UNIV-TOURS

Data Mining on Streams

Data Mining on Streams Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs

More information

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams

Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Adaptive Classification Algorithm for Concept Drifting Electricity Pricing Data Streams Pramod D. Patil Research Scholar Department of Computer Engineering College of Engg. Pune, University of Pune Parag

More information

Data Mining & Data Stream Mining Open Source Tools

Data Mining & Data Stream Mining Open Source Tools Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.

More information

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Tatsuya Minegishi 1, Ayahiko Niimi 2 Graduate chool of ystems Information cience,

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

An Adaptive Regression Tree for Non-stationary Data Streams

An Adaptive Regression Tree for Non-stationary Data Streams An Adaptive Regression Tree for Non-stationary Data Streams ABSTRACT Data streams are endless flow of data produced in high speed, large size and usually non-stationary environments. These characteristics

More information

Predictive Analytics. Omer Mimran, Spring 2015. Challenges in Modern Data Centers Management, Spring 2015 1

Predictive Analytics. Omer Mimran, Spring 2015. Challenges in Modern Data Centers Management, Spring 2015 1 Predictive Analytics Omer Mimran, Spring 2015 Challenges in Modern Data Centers Management, Spring 2015 1 Information provided in these slides is for educational purposes only Challenges in Modern Data

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

Performance and efficacy simulations of the mlpack Hoeffding tree

Performance and efficacy simulations of the mlpack Hoeffding tree Performance and efficacy simulations of the mlpack Hoeffding tree Ryan R. Curtin and Jugal Parikh November 24, 2015 1 Introduction The Hoeffding tree (or streaming decision tree ) is a decision tree induction

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Inductive Learning in Less Than One Sequential Data Scan

Inductive Learning in Less Than One Sequential Data Scan Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 {weifan,haixun,psyu}@us.ibm.com Shaw-Hwa Lo Statistics Department,

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

How To Classify Data Stream Mining

How To Classify Data Stream Mining JOURNAL OF COMPUTERS, VOL. 8, NO. 11, NOVEMBER 2013 2873 A Semi-supervised Ensemble Approach for Mining Data Streams Jing Liu 1,2, Guo-sheng Xu 1,2, Da Xiao 1,2, Li-ze Gu 1,2, Xin-xin Niu 1,2 1.Information

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Research Article www.ijptonline.com EFFICIENT TECHNIQUES TO DEAL WITH BIG DATA CLASSIFICATION PROBLEMS G.Somasekhar 1 *, Dr. K.

Research Article www.ijptonline.com EFFICIENT TECHNIQUES TO DEAL WITH BIG DATA CLASSIFICATION PROBLEMS G.Somasekhar 1 *, Dr. K. ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com EFFICIENT TECHNIQUES TO DEAL WITH BIG DATA CLASSIFICATION PROBLEMS G.Somasekhar 1 *, Dr. K.Karthikeyan 2 1 Research

More information

Mining Concept-Drifting Data Streams

Mining Concept-Drifting Data Streams Mining Concept-Drifting Data Streams Haixun Wang IBM T. J. Watson Research Center haixun@us.ibm.com August 19, 2004 Abstract Knowledge discovery from infinite data streams is an important and difficult

More information

Evaluating Algorithms that Learn from Data Streams

Evaluating Algorithms that Learn from Data Streams João Gama LIAAD-INESC Porto, Portugal Pedro Pereira Rodrigues LIAAD-INESC Porto & Faculty of Sciences, University of Porto, Portugal Gladys Castillo University Aveiro, Portugal jgama@liaad.up.pt pprodrigues@fc.up.pt

More information

Massive Online Analysis Manual

Massive Online Analysis Manual Massive Online Analysis Manual Albert Bifet and Richard Kirkby August 2009 Contents 1 Introduction 1 1.1 Data streams Evaluation..................... 2 2 Installation 5 3 Using the GUI 7 4 Using the command

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm. Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

Scoring the Data Using Association Rules

Scoring the Data Using Association Rules Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet)

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet (@abifet) HUAWEI Advanced Data Science with Spark Streaming Albert Bifet (@abifet) Huawei Noah s Ark Lab Focus Intelligent Mobile Devices Data Mining & Artificial Intelligence Intelligent Telecommunication Networks

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm. Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining

Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Using One-Versus-All classification ensembles to support modeling decisions in data stream mining Patricia E.N. Lutu Department of Computer Science, University of Pretoria, South Africa Patricia.Lutu@up.ac.za

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Big Data Mining Services and Knowledge Discovery Applications on Clouds Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades

More information

Lecture 4 Online and streaming algorithms for clustering

Lecture 4 Online and streaming algorithms for clustering CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

A comparative study of data mining (DM) and massive data mining (MDM)

A comparative study of data mining (DM) and massive data mining (MDM) A comparative study of data mining (DM) and massive data mining (MDM) Prof. Dr. P K Srimani Former Chairman, Dept. of Computer Science and Maths, Bangalore University, Director, R & D, B.U., Bangalore,

More information

EFFICIENT CLASSIFICATION OF BIG DATA USING VFDT (VERY FAST DECISION TREE)

EFFICIENT CLASSIFICATION OF BIG DATA USING VFDT (VERY FAST DECISION TREE) EFFICIENT CLASSIFICATION OF BIG DATA USING VFDT (VERY FAST DECISION TREE) Sourav Roy 1, Brina Patel 2, Samruddhi Purandare 3, Minal Kucheria 4 1 Student, Computer Department, MIT College of Engineering,

More information

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati tnpatil2@gmail.com, ss_sherekar@rediffmail.com

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

New Ensemble Methods For Evolving Data Streams

New Ensemble Methods For Evolving Data Streams New Ensemble Methods For Evolving Data Streams Albert Bifet UPC-Barcelona Tech Barcelona, Catalonia abifet@lsi.upc.edu Richard Kirkby University of Waikato Hamilton, New Zealand rkirkby@cs.waikato.ac.nz

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

Mining High-Speed Data Streams

Mining High-Speed Data Streams Mining High-Speed Data Streams Pedro Domingos Dept. of Computer Science & Engineering University of Washington Box 352350 Seattle, WA 98195-2350, U.S.A. pedrod@cs.washington.edu Geoff Hulten Dept. of Computer

More information

SYNTASA DATA SCIENCE SERVICES

SYNTASA DATA SCIENCE SERVICES SYNTASA DATA SCIENCE SERVICES A 3 : Advanced Attribution Analysis A Data Science Approach Joseph A. Marr, Ph.D. Oscar O. Olmedo, Ph.D. Kirk D. Borne, Ph.D. February 11, 2015 The content and the concepts

More information

Inductive Learning in Less Than One Sequential Data Scan

Inductive Learning in Less Than One Sequential Data Scan Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Research Hawthorne, NY 10532 weifan,haixun,psyu @us.ibm.com Shaw-Hwa Lo Statistics Department,

More information

Numerical Matrix Analysis

Numerical Matrix Analysis Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Discretization and grouping: preprocessing steps for Data Mining

Discretization and grouping: preprocessing steps for Data Mining Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic

More information

1 Review of Newton Polynomials

1 Review of Newton Polynomials cs: introduction to numerical analysis 0/0/0 Lecture 8: Polynomial Interpolation: Using Newton Polynomials and Error Analysis Instructor: Professor Amos Ron Scribes: Giordano Fusco, Mark Cowlishaw, Nathanael

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

A Review of Online Decision Tree Learning Algorithms

A Review of Online Decision Tree Learning Algorithms A Review of Online Decision Tree Learning Algorithms Johns Hopkins University Department of Computer Science Corbin Rosset June 17, 2015 Abstract This paper summarizes the most impactful literature of

More information

Efficient Decision Tree Construction for Mining Time-Varying Data Streams

Efficient Decision Tree Construction for Mining Time-Varying Data Streams Efficient Decision Tree Construction for Mining Time-Varying Data Streams ingying Tao and M. Tamer Özsu University of Waterloo Waterloo, Ontario, Canada {y3tao, tozsu}@cs.uwaterloo.ca Abstract Mining streaming

More information

Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Pramod S. Reader, Information Technology, M.P.Christian College of Engineering, Bhilai,C.G. INDIA.

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Lecture 6 Online and streaming algorithms for clustering

Lecture 6 Online and streaming algorithms for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

Some Research Challenges for Big Data Analytics of Intelligent Security

Some Research Challenges for Big Data Analytics of Intelligent Security Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014 Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

Mining the Most Interesting Web Access Associations

Mining the Most Interesting Web Access Associations Mining the Most Interesting Web Access Associations Li Shen, Ling Cheng, James Ford, Fillia Makedon, Vasileios Megalooikonomou, Tilmann Steinberg The Dartmouth Experimental Visualization Laboratory (DEVLAB)

More information

Mining Multi Level Association Rules Using Fuzzy Logic

Mining Multi Level Association Rules Using Fuzzy Logic Mining Multi Level Association Rules Using Fuzzy Logic Usha Rani 1, R Vijaya Praash 2, Dr. A. Govardhan 3 1 Research Scholar, JNTU, Hyderabad 2 Dept. Of Computer Science & Engineering, SR Engineering College,

More information

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier G.T. Prasanna Kumari Associate Professor, Dept of Computer Science and Engineering, Gokula Krishna College of Engg, Sullurpet-524121,

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Considering Currency in Decision Trees in the Context of Big Data

Considering Currency in Decision Trees in the Context of Big Data Considering Currency in Decision Trees in the Context of Big Data Completed Research Paper Diana Hristova Department of Management Information Systems University of Regensburg Universitätsstraße 31 93053

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,

More information

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases

A Fast and Efficient Method to Find the Conditional Functional Dependencies in Databases International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 3, Issue 5 (August 2012), PP. 56-61 A Fast and Efficient Method to Find the Conditional

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Distributed forests for MapReduce-based machine learning

Distributed forests for MapReduce-based machine learning Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication

More information

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

More information

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:

More information

The Graphical Method: An Example

The Graphical Method: An Example The Graphical Method: An Example Consider the following linear program: Maximize 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2 0, where, for ease of reference,

More information

SPMF: a Java Open-Source Pattern Mining Library

SPMF: a Java Open-Source Pattern Mining Library Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger philippe.fournier-viger@umoncton.ca Department

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Using Adaptive Random Trees (ART) for optimal scorecard segmentation A FAIR ISAAC WHITE PAPER Using Adaptive Random Trees (ART) for optimal scorecard segmentation By Chris Ralph Analytic Science Director April 2006 Summary Segmented systems of models are widely recognized

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

Implementation of Data Mining Techniques to Perform Market Analysis

Implementation of Data Mining Techniques to Perform Market Analysis Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information

Improving Apriori Algorithm to get better performance with Cloud Computing

Improving Apriori Algorithm to get better performance with Cloud Computing Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become

More information

Building an Iris Plant Data Classifier Using Neural Network Associative Classification

Building an Iris Plant Data Classifier Using Neural Network Associative Classification Building an Iris Plant Data Classifier Using Neural Network Associative Classification Ms.Prachitee Shekhawat 1, Prof. Sheetal S. Dhande 2 1,2 Sipna s College of Engineering and Technology, Amravati, Maharashtra,

More information