Preference Mining and Data Stream Mining. Sandra de Amo IT4BI Data Mining Advanced Topics

Transcription

1 Preference Mining and Data Stream Mining Sandra de Amo IT4BI Data Mining Advanced Topics

2 Mining Contextual Object Preferences Mining Data Streams 5/14/13 MASTER IT4BI - UNIV-TOURS

3 Our Agenda Seminar 1 q Preference Mining: two different problems q q Label Ranking Mining (a group of users) Preference Object Mining (one unique user) Label Ranking can be solved by a set of binary classifiers Preference Object Mining Is not a Classification task! Preference Object Mining: non-contextual and contextual An algorithm for Preference Object Mining non-contextual Seminar 2 q q Mining Contextual Object Preferences Mining Data Streams: Main challenges The VFDT algorithm (or Hoeffding Decision Tree Algorithm) 5/14/13 MASTER IT4BI - UNIV-TOURS

4 Contextual and non-contextual Preferences User preferences may or may not depend on the user context: q Non contextual preferences: Lower prices are preferred than higher ones Hotels located in the city center is preferred than hotels located far away from the city center. q Contextual preferences: If I travel with my family I prefer staying in a hotel in a calm neighborhood. If I travel with my friends I prefer staying in a hotel not very far from the seashore and near nice bars and cafes. 5/14/13 MASTER IT4BI - UNIV-TOURS

5 Two techniques for Mining Contextual Preferences INPUT : A set of pairs of tuples PROFMINER ALGORITHM OUTPUT : A set of preference rules INPUT : A set of pairs of tuples DaWaK 2012 CPREFMINER ALGORITHM OUTPUT : A Bayesian Preference Network ICTAI /14/13 MASTER IT4BI - UNIV-TOURS

6 In this Seminar 1) We will present ProfMiner q Adapts known algorithms for Association Rule Mining (Apriori, Eclat) for the preference mining scenario 2) The other technique: CPrefMiner q Adapts the Bayesian Network technique for the preference mining scenario. 3) In this seminar: We will focus only on the first technique: ProfMiner 5/14/13 MASTER IT4BI - UNIV-TOURS

7 The preference data Drama Steve Spielberg War Action Johnny Depp James Cameron Tom Hanks Thriller Leonardo di Caprio Action, Tom Hanks, War Action, Stieve Spieberg, War 5/14/13 MASTER IT4BI - UNIV-TOURS

8 The preference data Notation: A: Action B: Tom Hanks C: Steve Spielberg D: War E: Leonardo di Caprio. 5/14/13 MASTER IT4BI - UNIV-TOURS

9 The preference data 5/14/13 MASTER IT4BI - UNIV-TOURS

10 Objetive Given a set of pairs of transactions (provided by the user) Find rules allowing to decide the user preferences over any pair of transaction. In the example: q A transaction corresponds to a collection of films having some common features. q For instance: transaction t1=(a,c,d) corresponds to the collection of films directed by Spielberg and whose genre contains "Action and "War" and directed by Spielberg 5/14/13 MASTER IT4BI - UNIV-TOURS

11 The Mining Problem: formalization Items (tags) Itemset (or transaction)= set of items A preference bituple is a pair (t 1,t 2 ), where t 1, t 2 are itemsets A Preference Database : a finite set of preference bituples provided by the user by clicking on tags. Contextual Preference Rules: q q Syntax : i+ > i- X i+, i- are distinct items X is a itemset i+ and i- do not appear in X X = the rule context Semantics : A preference rule r induces a preference order >r between transactions: t1 >r t2 : if t1 and t2 both contains X, t1 contains i+ and not i-, and t2 contains i- and not i+ 5/14/13 MASTER IT4BI - UNIV-TOURS

12 Example t1 = A C D t2 = A B C E r: D > E A Then: t1 >r t2 t 1 is preferred to t 2 according to rule r 5/14/13 MASTER IT4BI - UNIV-TOURS

13 Satisfaction and Contradiction Let t 1 and t 2 be transactions r = a preference rule We say that the bituple (t 1,t 2 ) satisfies r if t1 >r t2 We say that the bituple (t 1,t 2 ) contradicts r if t2 >r t1 Example: t1 = A C D, t2 = A B C E, t3 = A D E r: D > E A (t1,t2) satisfies r, since t1 >r t2 (t2, t1) contradicts r, since t1 >r t2 (t1,t3) doesn t satisfy nor contradict r. 5/14/13 MASTER IT4BI - UNIV-TOURS

14 Utility measures for preference rules Support of a rule r with respect to a set of preference bituples P q Sup(r,P) = percentage of bituples in P satisfying r Confidence of a rule r with respect to a set of preference bituples P q Conf(r,P) = percentage of bituples in P satisfying r among those who satisfy or contradict r. 5/14/13 MASTER IT4BI - UNIV-TOURS

15 Example r: D > E A Sup(r,P) = 2/5 (supported by p1 and p2) Conf(r,P) = 2/2 = 100% 5/14/13 MASTER IT4BI - UNIV-TOURS

16 Minimality If Y X then sup(i+ > i- Y, P) sup(i+ > i- X, P) and A rule i+ > i- X is said to be minimal with respect to a preference database P if there is no Y X such that : sup(i+ > i- Y, P) = sup(i+ > i- X, P) and conf(i+ > i- Y, P) = conf(i+ > i- X, P) 5/14/13 MASTER IT4BI - UNIV-TOURS

17 Important Properties (Antimonotonie) If Y X and sup(i+ > i- Y, P) N then sup(i+ > i- X, P) N (since sup(i+ > i- Y, P) sup(i+ > i- X, P) So, if a rule i+ > i- Y has a bad support all rules derived from r by increasing its contexts will also have bad support If Y X and i+ > i- Y is not minimal then i+ > i- X is not minimal. So, if a rule i+ > i- Y is not minimal all rules derived from r by increasing its contexts will not be minimal. 5/14/13 MASTER IT4BI - UNIV-TOURS

18 Mining Problem (1) Input: A preference database P σ: a minimal support threshold (0 < σ 1) κ: a minimal confidence threshold (0 < κ 1) Output: All minimal preference rules r, with support σ and confidence κ The ContPrefMiner: Adaptation of the Apriori algorithm for mining association rules (we can use any association rule mining algorithm) 5/14/13 MASTER IT4BI - UNIV-TOURS

19 Algoritmo ContPrefMiner 5/14/13 MASTER IT4BI - UNIV-TOURS

20 Problems to solve: How to use the set of rules returned by ContPrefMiner in order to predict the user preference over two transactions t1 and t2? Each preference rule give us an opinion about transactions t1 and t2 (or maybe no opinion at all!) Opinions (when they exist) may be contradictory. An ordering of transactions by considering a specific rule may not be transitive. The set of rules can be too large. 5/14/13 MASTER IT4BI - UNIV-TOURS

21 So, how to define a preference order from a set of preference rules? What does it mean "two transactions t1, t2 are comparable by a set S of preference rules"? An authority police: the best rule decides! t1, t2 are comparable by S if t1 >r t2 and r = the best preference rule in S. 5/14/13 MASTER IT4BI - UNIV-TOURS

22 How to rank the preference rules? This is a total order in the set of preference rules: irreflexive, transitive and total (all rules can be compared between each other) 5/14/13 MASTER IT4BI - UNIV-TOURS

23 Example : minsup = 0.2, minconf= 0.6 5/14/13 MASTER IT4BI - UNIV-TOURS

24 How to evaluate a preference order provided by a set of preference rules? S = set of preference rules P = preference database Precision(S,P) = percentage of bituples (t,u) in P with t >S u among those bituples which are comparable by S. Recall(S,P) = percentage of bituples (t,u) in P with t >S u. 5/14/13 MASTER IT4BI - UNIV-TOURS

25 Mining Problem (2) Input: A preference database P A set of preference rules S, an integer k > 0 Output: A subset R of S maximazing the precision and such that R k. This problem is NP-Complete! (No polynomial time algorithm so far ) Our solution: Algorithm ProfMiner an heuristic approach the solution is not exact 5/14/13 MASTER IT4BI - UNIV-TOURS

26 General Idea R := Φ (initialized as the empty set. R will be the subset of rules returned by ProfMiner) At each iteration q R := R U { r 0 }, r 0 = the best rule of S q P := P - {(t,u) (t,u) is covered by some rule of R} q S := set of rules in S which are satisfied by at least k pairs of transactions of P, Repeat until S is empty. q The parameter k : controls the size of the set R returned by ProfMiner q The set of rules returned = the user profile 5/14/13 MASTER IT4BI - UNIV-TOURS

27 Algorithm ProfMiner 5/14/13 MASTER IT4BI - UNIV-TOURS

28 Example (k = 1) r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 r 10 Result : R = {r 1, r 3, r 4, r 9 } 5/14/13 MASTER IT4BI - UNIV-TOURS

29 Experimental Results Three different preference databases about movies (imdb.com and MovieLens) q q P301, P3000, P30000 Attributes: Genre, Actor, Director, Year, Language q ContPrefMiner: executed with minsup= 0,001 and minconf = 0,5 CPU Intel 3 GHz, 1 GB de RAM, Windows XP 5/14/13 MASTER IT4BI - UNIV-TOURS

30 Discussion ProfMiner drastically reduces the set of rules returned by ContPrefMiner. The number of rules returned decreases as k increases Even for k = 1 there is an important reduction in the number of rules returned by the algorithm P301 : from 5319 à 108 P3000 : from 4833 à 432 P30000: from 4913 à 925 5/14/13 MASTER IT4BI - UNIV-TOURS

31 How the number of rules varies with k 5/14/13 MASTER IT4BI - UNIV-TOURS

32 Reduction of the Profile Let R k : profile returned for k Q k = Reduction coefficient for R k Q k = ( R 1 - R k ) / R 1 5/14/13 MASTER IT4BI - UNIV-TOURS

33 Precision versus Q 5/14/13 MASTER IT4BI - UNIV-TOURS

34 Recall versus Q 5/14/13 MASTER IT4BI - UNIV-TOURS

35 Some Ongoing Research Other techniques to improve the recall : many pairs of transactions cannot be compared by ProfMiner q Ranging Voting Other techniques to improve the precision: q Replace the preference database by a preference fuzzy matrix P q position (i,j) of P contains a number d, 0 d 1, standing for how much the user prefers object i to object j. 5/14/13 MASTER IT4BI - UNIV-TOURS

36 Mining Data Streams HOEFFDING DECISION TREES FOR ONLINE CLASSIFICATION and FOR BIG DATA CLASSIFICATION

37 Characteristics of Data streams Continuous flow of data EXAMPLES Network traffic Sensor data Call center records

38 Challenges Infinite length Concept-drift Concept-evolution Feature Evolution

39 Infinite Length Impractical to store and use all historical data q Requires infinite storage q And running time

40 Concept-Drift Current hyperplane Previous hyperplane A data chunk Negative instance Positive instance Instances victim of concept-drift

41 Concept-Evolution y y A D x 1 C B x y y 1 y A D Novel class X X X X X X X X X X XX X X X X X XX X X X X X X X X X X X X X X X XX X X X X X X X X X X X x 1 C B x y 2 Classification rules: R1. if (x > x 1 and y < y 2 ) or (x < x 1 and y < y 1 ) then class = + R2. if (x > x 1 and y > y2) or (x < x 1 and y > y 1 ) then class = - Existing classification models misclassify novel class instances

42 Dynamic Features Why new features evolving q Infinite data stream Normally, global feature set is unknown New features may appear q Concept drift As concept drifting, new features may appear q Concept evolution New type of class normally holds new set of features Different chunks may have different feature sets

43 Batch versus Stream Learning Settings Batch Setting: Training data are available anytime One can scan data anytime and as often one desires Amount of time for creating the model is not an important issue since models are created offline Amount of memory required to create the model is not a problematical issue Stream Setting: Only one example is processed at a time and inspected only at most once Use a very limited amount of memory The learning process must be accomplished in a limited amount of time: algorithms must be linear in the number of examples The learning algorithm must be capable of working in real-time The learned model must be ready to be used at any point

44 In this seminar We will present the method VFDT (Very Fast Decision Tree Learning) (Domingos Hulten) The algorithm do not treat conceft drift The algorithm is focused on: q q q q Learning from infinite datasets (or very, very big data sets) Learning with a very small amount of memory Learning in real time Classification The method VFDT can be generalized to other mining tasks The method VFDT has been extended to CVFDT algorithm to deal with a concept drifting scenario.

45 Past Research Scaling up decision tree learning q SPRINT(1996), RAINFOREST(2000) q q Perform batch learning of decision trees from large data sources in limited memory by performing multiple passes over the data and using external storage Such operations are not suitable for high speed streaming processing. Incremental Systems designed to work in a single pass q q q q ID5R (1989), ITI(1997) Systems like this were considered for data stream But, in some cases these methods require more effort to update the model incrementally than to rebuild the model from scratch. ITI: all the previous training data must be retained in order to revisit decisions not suitable for large data sources!!

46 General idea of the VFDT Method Tuples are not stored! As a tuple enter the system essential information (sufficient statistics) is extracted from it the tuple is discarded The Decision Tree is build incrementally As a tuple arrives, its sufficient statistics is used to update the statistics stored at the leaves of the Decision Tree built so far. After a chunck of n tuples has entered into a leaf l, a decision is made if the leaf I will be split and which attribute will be used in the splitting process.

47 Sufficient Statistics at time t Attributes: A1, A2, A3 DomA1) = {A,B}, Dom(A2) = {C,D,E}, Dom(A3) = {F,G,H,I} Number of times the value C for attribute A2 has been seen up to instant t

48 When to decide to split a leaf and how to split? Split or not to split, that is the question! In the batch scenario: q The attribute to test at a node is chosen by comparing all the available attributes and choosing the best one according to some heuristic criteria G (for instance: the information gain). q The decision to split or not to split: Compute G(X) for each attribute X1,,Xn Compute G(X0): the gain obtained by not splitting the leaf l = Entropy(l) Order the attributes (X0,X1,,Xn) according to G (in decreasing order): 1Best, 2Best, X0 = 1Best? If so, do not split the leaf If X0 1Best : split the leaf using the attribute 1Best

49 When to decide to split a leaf and how to split it? In the stream scenario: The attribute to test at a node is chosen by comparing all the available attributes and choosing the best one according to some heuristic criteria G. The decision to split or not to split: q q q q q q Compute G(X) for each attribute X1,,Xn Compute G(X0) = gain obtained by considering the majority class (without splitting the leaf) Order the attributes (X0,X1,,Xn) according to G (in decreasing order): 1Best, 2Best, X0 = 1Best? If so, do not split the leaf Can one has some guarantee that G(2Best) will not be too close to G(1Best) in the future and so 1Best would not be considered the best choice? If one can have such guarantee: split the leaf using the attribute 1Best

50 General Problem Let X a random variable (in our case, X = G(1Best) G(2Best)) X ranges from 0 to R Exercice: If X = G(1Best) G(2Best), where G = information gain (difference between entropy before and entropy after the splitting) show that R = log 2 c, where c = number of classes. The true mean of X after an infinite set of independent observations is = r The estimated mean of X after n independent observations = e We would like to affirm (with a degree of confidence (1 δ) ) that r and e are very closed if n is sufficiently large Are r, e, δ and n related??

51 The Hoeffding Bound r e ε where ε is given by ε is called the Hoeffding Bound. The Hoeffding Bound states with probability 1 δ that the true mean of a random variable of range R will not differ from the estimated mean after n independent observations by more than ε. Introduced in: W. Hoeffding Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association (1963).

52 The Hoeffding Bound Problem: How many examples we have to observe in order to decide between attribute 1Best and 2Best AND be confident that G(1Best) and G(2Best) will be reasonably distant from each other in the future? For instance: ε If after 10 observations: X 10 = G(1Best) G(2Best) = 0.3 Let ε = 0.1 In the future: X fut 0 X fut X 10 The maximum change between X 10 and X fut is 0.1: X now X fut 0.1 Thus X fut = 0.2 We are sure (with probability of 1- δ) that G(1Best) G(2Best) will be at least 0.2 in the future (an acceptable difference)

53 Experiments The Split Confidence δ = 10-7 (so 1 δ is very, very high) Number of classes = 2. So R = log 2 2 = 1

54 The VFDT Algorithm Input S: a infinite sequence of examples (a data stream) over attributes X1,,Xm G = a split evaluation function δ = one minus the desired probability of choosing the right attribute at any given node n min = grace period τ = tie-breaking limit Output: a decision tree HT

55

56 Experimental Results: Accuracy x Training Instances processed Synthetic Data : 50 numeric attributes, 2 classes Grace period = 200 tuples Each tree (with and without grace period) was allowed 10 hours to grow No grace period grace period

57 Experimental Results: Accuracy x Training Instances processed grace period 5/14/13 MASTER IT4BI - UNIV-TOURS

58 Experimental Results: Training Instances processed x Training Time grace period No grace period

59 Experimental Results: Accuracy x Training Time grace period No grace period 5/14/13 MASTER IT4BI - UNIV-TOURS

60 References A. Giacometti, A. Soulet, S. de Amo, H. Li : Mining Contextual Preference Rules for Building User Profiles - Dawak Lecture Notes in Computer Science Volume 7448, 2012, pp What you can find here related to this seminar : Details on the algorithm ProfMiner for Preference Contextual Object Mining. S. de Amo, M. L. Bueno, G. Alves: CPrefMiner: An Algorithm for Mining User Contextual Preferences based on Bayesian Networks -ICTAI IEEE 24th International Conference on (Volume:1 ) pages What you can find here related to this seminar : Details on the algorithm CPrefMiner for Preference Contextual Object Mining. This article presents another approach based on Bayesian Networks and Genetic Programming. 5/14/13 MASTER IT4BI - UNIV-TOURS

61 References Albert Bifet/Richard Kirkby: Data Stream Mining A practical Approach August 2009 What you can find here related to this seminar : A complete survey on Data Stream Mining and the MOA tool. Domingos / Hulten: Mining High-Speed Data Streams KDD 2000 Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining - Pages What you can find here related to this seminar : Details on the VFDT algorithm Domingos / Hulten: Mining Time-Changing Data Streams - KDD '01 Proceedings of the 7th ACM SIGKDD international conference on Knowledge discovery and data mining - Pages What you can find here related to this seminar : Details on the CVFDT algorithm, for mining decision trees with concept drift 5/14/13 MASTER IT4BI - UNIV-TOURS

62 References Domingos / Hulten: Mining Complex Models from Arbitrarily Large Databases in Constant Time - SIGKDD 2002 Edmonton, Alberta, Canada What you can find here related to this seminar : this article proposes a scaling-up general method that is applicable to essentially any induction algorithm based on discrete search. Ruoming Jin/ Gagan Agrawal: Efficient decision tree construction on streaming data. In Knowledge Discovery and Data Mining, pages , What you can find here related to this seminar : Another approach for decision tree construction on streaming data. Uses a different and more accurate bound than the Hoeffding bound presented in this seminar. 5/14/13 MASTER IT4BI - UNIV-TOURS