Factor Models for Gender Prediction Based on E-commerce Data Data Mining Competition PAKDD 2015, HoChiMinh City, Vietnam
Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
Outline Hierarchical Basket Model Tree Encoding Factorization Machine Modeling Autocorrelation Sequential Block Voting Results & Implementation
Product Hierarchy u1, 2014-11-13, 2014-11-14, A01/B01/C01/D01/ u2, 2014-11-14, 2014-11-15, A02/B02/C02/D02/;A02/B02/C03/D03/; u3, 2014-11-14, 2014-11-16, A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15
Path Encoding u3, 2014-11-14, 2014-11-16, A01/B01/C01/D02/;A01/B04/C05/D98/; A01 B01 B04 C01 C02 C05 D01 D02 D06 D22 D45 D98 D21 D89 D15 x i = {2, 0,, 1, 0, 0, 1, 0, 1,, 1, 0, } } {{ } } {{ } } {{ } A B D
Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
Factorization Machine FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + x j x j v j, v j j=1 j=1 j =j+1 w 0 R, w R p, V R p k are the model parameters k N is the size/ dimensionality of the latent space the model has one feature vector v i for each variable x i [Rendle, TIST 2012]
Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)
Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)
Linear Part a FM model of order d = 2 ŷ FM (x) := w 0 + w j x j + j=1 x j x j v j, v j j=1 j =j+1 A02 x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 0) } {{ } } {{ } } {{ } A B D B11 D55 p(female x i ) p(female A02) + p(female B11) + p(female D55)
Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
Pairwise Interactions a FM model of order d = 2 ŷ FM (x) := w 0 + A02 w j x j + j=1 B11 j=1 j =j+1 x j x j v j, v j x i = (0, 1,, 0, 0,, 1,, 0, 0,, 1,, 1,, 0) } {{ } } {{ } } {{ } A B D Example: V =, Summer } Swimming {{ } j=d55,, Summer D55 } Swimming {{ } j =D95 D95,, V R p k
Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
Factoring Joint Probabilities 100 Autocorrelation 095 090 085 080 20 15 10 5 0 5 10 15 20 Lag We can factorize the joint probability by conditioning on features that describe the related samples n p(y 0,, y n x 0,, x n ) := p(y i xi r, x i ) 0
Relational Features u3, 2014-11-13, 2014-11-14, A01/B01/C05/D11/ u4, 2014-11-14, 2014-11-16, A02/B01/C01/D02/;A05/B04/C05/D98/; u5, 2014-11-14, 2014-11-16, A05/B04/C05/D98/; u6, 2014-11-14, 2014-11-16, A04/B03/C06/D22/;A05/B14/C45/D68/; u7, 2014-11-14, 2014-11-16, A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]
Relational Features u3, 2014-11-13, 2014-11-14, A01/B01/C05/D11/ u4, 2014-11-14, 2014-11-16, A02/B01/C01/D02/;A05/B04/C05/D98/; u5, 2014-11-14, 2014-11-16, A05/B04/C05/D98/; u6, 2014-11-14, 2014-11-16, A04/B03/C06/D22/;A05/B14/C45/D68/; u7, 2014-11-14, 2014-11-16, A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]
Relational Features u3, 2014-11-13, 2014-11-14, A01/B01/C05/D11/ u4, 2014-11-14, 2014-11-16, A02/B01/C01/D02/;A05/B04/C05/D98/; u5, 2014-11-14, 2014-11-16, A05/B04/C05/D98/; u6, 2014-11-14, 2014-11-16, A04/B03/C06/D22/;A05/B14/C45/D68/; u7, 2014-11-14, 2014-11-16, A01/B01/C01/D03/;A01/B04/C05/D78/; A2 A4 A5 x a1 = [0, 1, 0, 1, 2, ] A1 A2 A4 A5 x a1:2 = [ 3, 1, 0, 1, 2, ] Combining different lags and categories we can describe the sample neighborhood with: x u5 = [x a1, x a1:2, x b1:3, x d1 ]
Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
Identifying Sequential Blocks u1, 2014-11-13, 2014-11-14, A01/B01/C01/D01/ u2, 2014-11-14, 2014-11-15, A02/B02/C02/D02/;A02/B02/C03/D03/; u3, 2014-11-14, 2014-11-16, A02/B02/C02/D02/;A02/B02/C03/D04/; 1: blockid[:] 0 2: count 0 3: for i 1, n do 4: if endtime(i) endtime(i-1) then 5: count ++ 6: end if 7: blockid[i] count 8: end for
# wrong labels in block 8 7 6 5 4 3 2 1 0 0 20 40 60 80 100 120 140 160 block size
Block based Voting 1: if blocksize(i) 10 AND (median(i) 6 OR median(i) 9) then 2: if median(i) 9 then 3: predict female 4: else if median(i) 6 then 5: predict male 6: end if 7: else per sample threshold 8: if y i 82 then 9: predict female 10: else 11: predict male 12: end if 13: end if
Outline Hierarchical Basket Model Modeling Autocorrelation Sequential Block Voting Results & Implementation
Results & Implementation Score Place Final Result 084067348 7 Full Competition Source Code: https://githubcom/ibayer/pakdd2015_competition Factorization Machine Implementation: https://githubcom/ibayer/fastfm