Lift-based search for significant dependencies in dense data sets W. Hämäläinen epartment of omputer Science University of Helsinki Finland whamalai@cs.helsinki.fi StReio 09 (K 09) p.1/48
1 Problem Find a good set of rules X A which express positive dependence also in the future data! R = {A 1,...,A k } = set of all attributes, where A i R is binary (binarized), X R and A R 1. P(XA) > P(X)P(A) (positive dependence) 2. dependence is genious (holds in the future data) statistical significance tests cross-validation 3. redundant rules are pruned StReio 09 (K 09) p.2/48
1.1 Positive dependence Lift γ(x,a) = P(XA) P(X)P(A) = P(A X) P(A) > 1 if the rule has high confidence cf = P(A X) > P(A) (in the future data), it suits for prediction Independence rules where P(A X) = P(A) are trivial (useles for predicting A) Negative dependencies P(A X) < P(A) are harmful for predicting A StReio 09 (K 09) p.3/48
If cf is low, rule can still be important for predictive models e.g. reveals (undesired) dependencies between variables. always useful for descriptive purposes Traditional frequency-based methods often find independence rules or even negative dependency rules in dense data sets! StReio 09 (K 09) p.4/48
xample: Most general significant rules in hess 3 30 41 61 67 47 18 29 2 8 11... 19 13 17 4 8 8 8 73 MIN 68 71 13 13 8 MIN 17 8 8 11 R 8 11 R MIN MIN MIN MIN MIN MIN MIN 21 73 8 71 13 MIN MIN MIN MIN MIN MIN 19 1 68 11 R 64 MIN 8 8 11 MIN 6 MIN R 4 17 21 1 26 26 3 73 64 6 64 6 11 11 68 11 64 71 38 46 MIN MIN MIN MIN MIN 68 44 11 11 11 17 32 67 16 68 36 64...... 17... 84 696... 73 23... 20 17 17 20 76 MIN MIN MIN 20 46... 2 3181 318 319 8 3169 8 8 3170 3170 3184 StReio 09 (K 09) p./48
1.2 Pruning rules The number of rules can be too large! computational burden (time & space requirement) the user cannot scan through all rules simple rules avoid over-fitting (Occam s Razor principle) Search only non-redundant rules! StReio 09 (K 09) p.6/48
Redundancy (classically) epends on the goodness measure M. Several definitions! Rule or set is redundant if it contains useles attributes (which at most decrease the goodness). If M is increasing Set X is redundant if Y X such that M(Y ) M(X). Rule X A is redundant if Y X such that M(Y A) M(X A) StReio 09 (K 09) p.7/48
Redundancy (here) efinition 1. Set X is redundant if Y X such that M(estRule(X)) M(estrule(Y )) Rule X \ A A is redundant, if Y X such that M(X \ A A) M(Y \ ). estrule(x) = argmax{m(x \ A A)} (best rule which can be constructed from X) e.g. A can redundant in respect of A or A StReio 09 (K 09) p.8/48
Why this definition?? are the best among classically non-redundant rules! computationally fast & memory friendly significant rules are often permutations of each other the algorithm can be applied to classical definition, but computationally more difficult (not tested yet) StReio 09 (K 09) p.9/48
1.3 Statistical significance Idea: If X A expresses positive dependence in the sample data, what is probability that it has occured by chance? (i.e. that X and A were actually independent) Let m(xa) = n P(XA) (absolute frequency) p-value = probability that (XA) occurs at least m(xa) times in data set r, r = n, if P(XA) = P(X)P(A) (independence) If p is very low, X A is likely to be genuine StReio 09 (K 09) p./48
How to estimate p? inomial probability: p = n ( n i ) (P(X)P(A)) i (1 P(X)P(A)) n i i=m(x,a) prob. that XA occures at least m(xa) times in the whole data of size n StReio 09 (K 09) p.11/48
Alternatively (not suitable) p 2 = m(x) i=m(x,a) ( ) m(x) i (P(A)) i (1 P(A)) m(x) i prob. that A occures at least m(xa) times on rows where X is true rules with different X cannot be compared! StReio 09 (K 09) p.12/48
z-score p is computationally difficult! can be estimated by z-score: z(x,a) = = m(xa) np(x)p(a) np(x)p(a)(1 P(X)P(A)) n(γ(x,a) 1) γ(x,a) P(XA) Now p 1 Φ(z(X,A)), where Φ is the standard normal cumulative distribution function. StReio 09 (K 09) p.13/48
Using z-score z can be used as a ranking function as such! z is monotonically increasing function of m(xa) and γ suits for brach and bound search works well, when expected counts m(x)p(a) are sufficiently large (e.g. ) when m(x)p(a) is small, z is over-optimistic other functions might work better for search purposes the measure function should be monotonically increasing or decreasing function of m(xa) and γ(x,a) StReio 09 (K 09) p.14/48
2 Searching significant rules All possible attribute sets can be listed by an enumeration tree: A 2 2 40 60 1 4 20 30 20 20 1 1 20 1 20 1 20 StReio 09 (K 09) p.1/48
2.1 How to traverse the tree? Given set X we want to know an upperbound for M(estRule(XQ)) m(xq) m(x) always γ(estrule(xq)) 1 P(A min ), where P(A min ) = min{p(a i ), A i XQ}, because γ(xq \ A i A i ) = P(A i ) P(A min ) P(XQ) P(XQ \ A i )P(A i ) StReio 09 (K 09) p.16/48
U(M(estRule(XQ)))! min{p(a i ) A i XQ} min{p(a j ) A j X} U(M(estRule(XQ))) U(M(estRule(X))) 1. if uppebound U(M(estRule(XQ))) < min M, rules of XQ are insignificant 2. if U (M(estRule(XQ))) max{m(estrule(y )) Y X}, rules of XQ are redundant 3. if estrule(x) has maximal lift P(A min ) 1, it is minimal and all more specific rules will be redundant StReio 09 (K 09) p.17/48
Property P S potentially significant (X) U(M(estRule(X))) min M Property is monotonic, if we traverse the tree in certain order! Meaning: if even one from Y s parents is or minimal, Y (or its children) cannot be non-redundant P S. Y can be pruned StReio 09 (K 09) p.18/48
Traversal order attributes are in descending order search top down from right to left both frequencies and maximum lifts can only decrease parent sets X have always better upperbounds than their children XQ have! StReio 09 (K 09) p.19/48
Relations of P S sets t i = sets under A i t i t j = sets under A i A j P(A i )...P(A j 1 ) P(A j ) A i A j (t ) i j 1 A j A j 1 A j (t i j ) ( t ) j... A j 1 A j ( t ) j (t ) j 1 ( t ) j t j (tj 1 ) (t ) i j U StReio 09 (K 09) p.20/48
Frequency counting data itself can be used to initialize the tree later frequencies can be counted from the tree (no need to check original data anymore) StReio 09 (K 09) p.21/48
Frequency tree for data A F 60 30 11 2 1 AF 1 2 A 9 A 1 A 1 A 20 20 F F 1 StReio 09 (K 09) p.22/48
Pruning attributes checking all 2-sets can prune out low frequency attributes maximal U (γ)s are decreased A can be pruned, if for all A i A M(m(AA i ), min{p(a),p(a i )} 1 ) < min M StReio 09 (K 09) p.23/48
3. Simulation A F 60 30 11 2 1 AF 1 2 A 9 A 1 A 1 A 20 20 F F 1 StReio 09 (K 09) p.24/48
Simulation step 1 A F 60 30 11 2 1 2 F 1 m(fa )=1 for all A <>F i i StReio 09 (K 09) p.2/48
Simulation step 2 A 60 2 1 30 20 20 20 2 z=8.2 MIN added StReio 09 (K 09) p.26/48
Simulation step 3 A 60 2 1 2 30 4 1 z=2.1 20 20 20 z=8.2 MIN added StReio 09 (K 09) p.27/48
Simulation step 4 A 60 2 1 2 30 1 z=2.1 4 20 20 1 20 z=8.2 MIN added is not created! StReio 09 (K 09) p.28/48
Simulation step A 60 2 1 20 1 4 1 20 20 20 z=8.2 MIN 2 added z=2.9 StReio 09 (K 09) p.29/48
Simulation step 6 A 60 2 1 20 20 1 4 1 20 20 20 z=8.2 MIN 2 20 removed z=2.9 StReio 09 (K 09) p.30/48
Simulation step 7 A 60 2 1 z=3.8 2 1 30 1 20 20 1 4 1 20 20 20 z=8.2 MIN removed added z=3.8 z=1.2 Rules > and > found StReio 09 (K 09) p.31/48
Simulation step 8 A 2 60 1 30 20 20 1 4 1 20 20 20 z=8.2 MIN 2 1 1 z=3.8 z=3.8 added added oth, z<0 StReio 09 (K 09) p.32/48
Simulation step 9 A 2 40 60 30 20 20 1 4 1 20 20 20 z=8.2 MIN 2 1 1 z=3.8 z=3.8 added oth have z=2.7 StReio 09 (K 09) p.33/48
Simulation step A 2 40 60 30 20 20 1 4 1 20 20 20 z=8.2 MIN 2 0 1 1 z=3.8 z=3.8 added and removed (fr=0) added z=2.3 StReio 09 (K 09) p.34/48
Simulation step 11 A z=4.4 2 2 60 40 30 1 1 20 20 1 4 1 20 20 20 z=8.2 MIN z=2.3 z=3.8 z=3.8 removed Rule A > found z=4.4 StReio 09 (K 09) p.3/48
Simulation step 12 A z=4.4 2 2 60 40 30 1 1 20 20 1 4 1 20 20 20 z=8.2 MIN z=2.3 z=3.8 z=3.8 z=2.3 StReio 09 (K 09) p.36/48
Simulation final result A z=4.4 2 2 60 40 30 1 1 20 20 1 4 1 20 20 20 z=8.2 MIN z=2.3 z=3.8 z=3.8 z=2.3 z = 8.2 cf = 1.0 fr = 0.20 γ =.0 A z = 4.4 cf = 1.0 fr = 0.2 γ =.0 z = 3.8 cf = 0. fr = 0.1 γ = 2. z = 3.8 cf = 0. fr = 0.1 γ = 2. StReio 09 (K 09) p.37/48
4. xperiments: Goals Quality of rules compared to traditional methods what can we gain when minfr is not used? Performance: how fast is it? How complex data sets can we handle? StReio 09 (K 09) p.38/48
Proportions of useful and harmful rules Rule is at least slightly useful, if expresses positive dependency in test data useful, if expresses clear positive dependency (requirement: z 1) at least slightly harmful if expresses negative dependency in test data useful, if expresses clear negative dependency (requirement: z 1) StReio 09 (K 09) p.39/48
ata sets iological + medical + hess as a patological case Set n k Heart 17 23 Hearneg 17 46 (Garden 1340 2372) Plants 1088 70 Mushroom 416 120 hess 2130 76 StReio 09 (K 09) p.40/48
Results 140 120 0 80 60 40 20 0-20 Proportions of useful and harmful rules Slightly useful rules Useful rules Slightly harmful rules Harmful rules cf=0.9 cf=0.9b cf=0.6 cf=0.6b cf=0.0 cf=0.0b cf=0.9 cf=0.9b cf=0.6 cf=0.6b cf=0.0 cf=0.0b cf=0.9 cf=0.9b cf=0.6 cf=0.6b cf=0.0 cf=0.0b cf=0.9 cf=0.9b cf=0.6 cf=0.6b cf=0.0 cf=0.0b cf=0.9 cf=0.9b cf=0.6 cf=0.6b cf=0.0 cf=0.0b Heart HeartNeg Plants Mushroom hess StReio 09 (K 09) p.41/48
Observations Selecting rules with U (ln(p)) improves results (vs. z) Using min cf in the search can distort results if parent has higher z but too low cf, the set is not pruned as redundant if min f is not used in the search (only in the end), the number of rules can be too small often still better approah (smaller prediction error in the test sets) StReio 09 (K 09) p.42/48
omparison to traditional frequency-based search with as low min fr as possible + pruning with different measure functions Set minf r Heart 0.0 Hearneg 0.32 Plants 0.12 Mushroom 0.22 hess 0.7 StReio 09 (K 09) p.43/48
Results Proportions of useful and harmful rules when cf=0.9 10 Slightly useful rules Useful rules Slightly harmful rules Harmful rules 0 0 0-0 χ 2 J Heart z fr χ 2 J z HeartNeg fr χ 2 J z Plants fr χ 2 J z Mushroom fr χ 2 J z hess fr StReio 09 (K 09) p.44/48
Results Proportions of useful and harmful rules when cf=0.6 10 Slightly useful rules Useful rules Slightly harmful rules Harmful rules 0 0 0-0 χ 2 J Heart z fr χ 2 J z HeartNeg fr χ 2 J z Plants fr χ 2 J z Mushroom fr χ 2 J z hess fr StReio 09 (K 09) p.4/48
. onclusions both eeplue and StatApriori are useful, when nothing else works! (dense data) find genious dependencies without minimum frequencies or other restrictions interesting new information eeplue can solve problems which are infeasible with traditional approaches... but the newest version of StatApriori is even faster useful theoretical properties may apply to searching general association rules StReio 09 (K 09) p.46/48
6. Future research non-redundant rules when the consequent is taken into account + comparison negative dependencies X A rules between sets X Y, Y > 1 general association rules A new application areas (have you interesting data?) StReio 09 (K 09) p.47/48
Are you interested in collecting biodiversity data? the goal is to collect a large database of naturally occuring plant combinations location information can be interesting for geographical M just reading and extracting data (plant communities and associations) from texts also technical support (collecting system) is welcome! ontact Wilhelmiina! StReio 09 (K 09) p.48/48