Factor Models for Gender Prediction Based on E-commerce Data

Size: px

Start display at page:

Download "Factor Models for Gender Prediction Based on E-commerce Data"

Irma Douglas
10 years ago
Views:

1 Factor Models for Gender Prediction Based on E-commerce Data Data Mining Competition PAKDD 2015, HoChiMinh City, Vietnam

2 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

3 Outline Hierarchical Basket Model Tree Encoding Factorization Machine Modeling Autocorrelation Sequential Block Voting Results & Implementation

4 Product Hierarchy u1, , , A01/B01/C01/D01/ u2, , , A02/B02/C02/D02/;A02/B02/C03/D03/; u3, , , A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15

5 Path Encoding u3, , , A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15 x i = {2, 0,, 1, 0, 0, 1, 0, 1,, 1, 0, } } {{ } } {{ } } {{ } A B D

6 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

7 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

8 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

9 Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]

10 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)

11 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)

12 Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)

13 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

14 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

15 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

16 Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k

17 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

18 Factoring Joint Probabilities 100 Autocorrelation Lag We can factorize the joint probability by conditioning on features that describe the related samples n p(y 0,, y n x 0,, x n ) := p(y i xi r, x i ) 0

19 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]

20 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]

21 Relational Features u3, , , A01/B01/C05/D11/ u4, , , A02/B01/C01/D02/;A05/B04/C05/D98/; u5, , , A05/B04/C05/D98/; u6, , , A04/B03/C06/D22/;A05/B14/C45/D68/; u7, , , A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]

22 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

23 Identifying Sequential Blocks u1, , , A01/B01/C01/D01/ u2, , , A02/B02/C02/D02/;A02/B02/C03/D03/; u3, , , A02/B02/C02/D02/;A02/B02/C03/D04/; 1: blockid[:] 0 2: count 0 3: for i 1, n do 4: if endtime(i) endtime(i-1) then 5: count ++ 6: end if 7: blockid[i] count 8: end for

24 # wrong labels in block block size

25 Block based Voting 1: if blocksize(i) 10 AND (median(i) 6 OR median(i) 9) then 2: if median(i) 9 then 3: predict female 4: else if median(i) 6 then 5: predict male 6: end if 7: else per sample threshold 8: if y i 82 then 9: predict female 10: else 11: predict male 12: end if 13: end if

26 Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation

27 Results & Implementation Score Place Final Result Full Competition Source Code: Factorization Machine Implementation:

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental