Constrained Clustering of Territories in the Context of Car Insurance

Size: px
Start display at page:

Download "Constrained Clustering of Territories in the Context of Car Insurance"

Transcription

1 Constrained Clustering of Territories in the Context of Car Insurance Samuel Perreault Jean-Philippe Le Cavalier Laval University July 2014 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

2 Introduction / Motivation Context : car insurance in Ontario Important factor used : location of the residence "One of the concerns from a public policy perspective is that if a territory is based on a small geographical area, even though densely populated, socio-economic factors may be influencing loss costs. In addition, drivers may operate their vehicles all over the city, so narrowly defined territories may not be logical." (FCSO Auto Bulletin No. A- 01/05) New regulations imposed by The Financial Services Commission of Ontario 1. a territory must possess a minimal exposure of 7,500 cars over a three-year span ; 2. there must be no more than 55 territories in Ontario, of which no more than 10 should cover a fraction of the greater Toronto ; 3. all territories must be contiguous. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

3 Introduction / Motivation The challenge : The challenge is to group the initial territories in such a way that the resulting territories are as homogeneous as possible. The goal is to minimize intra-group variation : D := i w i(λ i y i ) 2. w i is the exposure associated to territory i ; λ i is the initial premium associated to territory i ; y i is the adjusted premium associated to territory i (final premium). Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

4 A typical k-means algorithm Section 2 A typical k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

5 A typical k-means algorithm Iterative steps [Steinley, 2006] : K groups 1. Initialization : formation of the initial groups on which the algorithm will be applied over and over again. Many existing methods : randomly select K points to be the initial centroids ; randomly assign each point to one of the K groups (Forgy method) ; use a hierarchical method to form K groups. 2. Group centroids (means) are obtained for each group. 3. Points are compared to each centroid and moved to the group whose centroid is closest (optimization). 4. Steps 2 and 3 are repeated until no points can be moved between groups. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

6 A typical k-means algorithm Basic example Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

7 A typical k-means algorithm Basic example Red mean = 14.1 Green mean = 15.4 Blue mean = Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

8 A typical k-means algorithm Basic example Red mean = 7 Green mean = 15.5 Blue mean = Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

9 A typical k-means algorithm Basic example Red mean = 5.8 Green mean = Blue mean = Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

10 A typical k-means algorithm Basic example Red mean = 4.5 Green mean = 14.1 Blue mean = Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

11 A typical k-means algorithm Basic example Red mean = 2.67 Green mean = Blue mean = Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

12 The modified k-means algorithm Section 3 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

13 The modified k-means algorithm Modification 1 : initialization Problem The contiguity constraint does not allow the use of any of the initialization methods presented. Modification 1. Randomly (or not) choose some territory to be the "center" (vs centroid) of a group. 2. Verify that the minimal exposure constraint is satisfied. If it is, go to step 3 ; if not, randomly choose a (unassigned) neighbor of this territory/group, merge it and repeat step Repeat the steps 1 and 2 until the K centers have been chosen. 4. Iteratively assign all territories adjacent to a group to it. If an unassigned territory is adjacent to more than one group, assign it randomly. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

14 The modified k-means algorithm Example 1 Initialization (random selection) Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

15 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

16 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

17 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

18 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

19 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

20 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

21 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

22 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

23 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

24 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

25 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

26 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

27 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

28 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

29 The modified k-means algorithm Example 1 Initialization (expansion) Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

30 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

31 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

32 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

33 The modified k-means algorithm Example 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

34 The modified k-means algorithm Example 1 D = 510, 771, 107 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

35 The modified k-means algorithm Example 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

36 The modified k-means algorithm Example 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

37 The modified k-means algorithm Example 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

38 The modified k-means algorithm Example 2 D = 532, 487, 853 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

39 The modified k-means algorithm Modification 2 : reassignment - optimization Problem During the reassignment of the groups in the k-means algorithm, a territory is placed in the group with the closest centroid. In the present context, such a reassignment would often lead to the violation of the contiguity constraint. Modification The only "moveable" territories are the ones on the borders. Moreover, the number of groups to which they can belong is restrained to groups adjacent to them and, of course, the group to which they belong. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

40 The modified k-means algorithm Optimization Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

41 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

42 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

43 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

44 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

45 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

46 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

47 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

48 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

49 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

50 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

51 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

52 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

53 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

54 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

55 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

56 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

57 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

58 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

59 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

60 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

61 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

62 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

63 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

64 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

65 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

66 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

67 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

68 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

69 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

70 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

71 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

72 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

73 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

74 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

75 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

76 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

77 The modified k-means algorithm Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

78 The modified k-means algorithm D = 69, 066, 748 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

79 The modified k-means algorithm Example 2 (cont d) Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

80 The modified k-means algorithm Example 2 (cont d) Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

81 The modified k-means algorithm Example 2 (cont d) D = 108, 263, 053 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

82 Some other considerations and an additional step Section 4 Some other considerations and an additional step Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

83 Some other considerations and an additional step Complications Complication due to the minimal exposure constraint Sometimes, a group diminishes until it cannot "give" any more territories : switching a territory would lead to the violation of the minimal exposure constraint. It may be advantageous to merge this (potentially negligible) group to a neighbor and add a group somewhere else. Similarity between adjacent groups Sometimes, the map is stable and two adjacent groups are quite similar. It is possible that some other groups are very heterogeneous, i.e. formed by territories different from one another. In that case, it would be advantageous to merge the two similar groups and split the heterogeneous one. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

84 Some other considerations and an additional step Complication due to the minimal exposure constraint Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

85 Some other considerations and an additional step Complication due to the minimal exposure constraint Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

86 Some other considerations and an additional step Similarity between adjacent groups Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

87 Some other considerations and an additional step Similarity between adjacent groups Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

88 Some other considerations and an additional step Additional step : resolving problems with small groups (1/2) 1. Define a counter c 0 and define c max to be the maximum number of consecutive iterations resulting in no change in the map that can be done ; 2. Save the map in its actual form and the value of the total deviance, say D 0, which is associated to it ; 3. If c = c max, stop the algorithm ; if not, choose an exposure threshold, say e s, determining which groups are considered small ; 4. Merge the small groups, that is the groups with an exposure less than e s, to their most similar neighbor ; Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

89 Some other considerations and an additional step Additional step : resolving problems with small groups (2/2) 5. Apply the modified k-means algorithm restraining the database to a particular group and producing only two groups (within the chosen group) ; replicate for all the remaining groups and make the move which is the most profitable ; optimize the map ; repeat until there are k groups ; 6. Compute the new deviance, say D 1 ; if D 1 < D 0, let c 0 and go to step 2 ; otherwise, let c c + 1, replace the actual map by the one saved in 2 and go to step 3 ; Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

90 Some other considerations and an additional step Additional step Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

91 Some other considerations and an additional step e s = 10, 408 # of groups available : 0 D = 69, 066, 748 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

92 Some other considerations and an additional step e s = 10, 408 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

93 Some other considerations and an additional step e s = 10, 408 # of groups available : 0 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

94 Some other considerations and an additional step e s = 10, 408 # of groups available : 0 D = 59, 634, 652 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

95 Some other considerations and an additional step e s = 11, 177 # of groups available : 0 D = 59, 634, 652 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

96 Some other considerations and an additional step e s = 17, 203 # of groups available : 0 D = 59, 634, 652 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

97 Some other considerations and an additional step e s = 17, 203 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

98 Some other considerations and an additional step e s = 17, 203 # of groups available : 0 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

99 Some other considerations and an additional step e s = 17, 203 # of groups available : 0 D = 59, 634, 652 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

100 Some other considerations and an additional step e s = 25, 985 # of groups available : 0 D = 59, 634, 652 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

101 Some other considerations and an additional step e s = 25, 985 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

102 Some other considerations and an additional step e s = 25, 985 # of groups available : 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

103 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

104 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

105 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

106 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

107 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

108 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

109 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

110 Some other considerations and an additional step e s = 25, 985 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

111 Some other considerations and an additional step e s = 25, 985 # of groups available : 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

112 Some other considerations and an additional step e s = 25, 985 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

113 Some other considerations and an additional step e s = 25, 985 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

114 Some other considerations and an additional step e s = 25, 985 # of groups available : 0 D = 47, 877, 278 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

115 Some other considerations and an additional step e s = 24, 692 # of groups available : 0 D = 47, 877, 278 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

116 Some other considerations and an additional step e s = 24, 692 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

117 Some other considerations and an additional step e s = 24, 692 # of groups available : 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

118 Some other considerations and an additional step e s = 24, 692 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

119 Some other considerations and an additional step e s = 24, 692 # of groups available : 4 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

120 Some other considerations and an additional step e s = 24, 692 # of groups available : 4 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

121 Some other considerations and an additional step e s = 24, 692 # of groups available : 4 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

122 Some other considerations and an additional step e s = 24, 692 # of groups available : 4 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

123 Some other considerations and an additional step e s = 24, 692 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

124 Some other considerations and an additional step e s = 24, 692 # of groups available : 3 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

125 Some other considerations and an additional step e s = 24, 692 # of groups available : 2 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

126 Some other considerations and an additional step e s = 24, 692 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

127 Some other considerations and an additional step e s = 24, 692 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

128 Some other considerations and an additional step e s = 24, 692 # of groups available : 1 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

129 Some other considerations and an additional step e s = 24, 692 # of groups available : 0 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

130 Some other considerations and an additional step e s = 24, 692 # of groups available : 0 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

131 Some other considerations and an additional step e s = 24, 692 # of groups available : 0 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

132 Some other considerations and an additional step e s = 24, 692 # of groups available : 0 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

133 Some other considerations and an additional step e s = 24, 692 # of groups available : 0 D = 58, 198, 865 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

134 Some other considerations and an additional step e s = 21, 284 # of groups available : 0 D = 47, 877, 278 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

135 Some other considerations and an additional step e s = 16, 726 # of groups available : 0 D = 45, 342, 835 Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

136 Some other considerations and an additional step Potential improvement One could merge adjacent territories that are similar. Partly from [MacQueen, 1967] Part of a "k-means algorithm" allowing k to vary. The idea is to compare the means. Let c min be the maximal distance between two groups for them to be considered similar. Two groups are merged together if the distance between their means is less than c min, implying that k diminishes. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

137 Dealing with the constraint specific to Toronto Section 5 Dealing with the constraint specific to Toronto Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

138 Dealing with the constraint specific to Toronto Separate application of the algorithm + final optimization Separate application of the algorithm In order to deal with the restriction on the Toronto area (10 final groups), the database is split into two disjoint databases. Final optimization on the entire database Additional condition : no new group can enter the Toronto area. It is allowed for a group already in Toronto to expand outside of it. Alternative The algorithm could have been constructed so that no separate application were necessary. (Worth it?) Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

139 Application and results Section 6 Application and results Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

140 Application and results The database Ontario 524 territories with 102 in the area of Toronto Minimum exposure : 0 Maximum exposure : 23,736 Reminder of the constraints Maximum number of territories allowed : 55 Maximum number of territories allowed in Toronto : 10 Minimum exposure required in (final) territories : 7500 Note : Exposures were collected over a three-year span. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

141 Application and results Results Without the add. step With the add. step Region Toronto Else Total Toronto Else Total Num. sim. 30,000 30,000-2, Min. dev Note : The deviances are given in millions. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

142 Conclusion Section 7 Conclusion Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

143 Conclusion The method The goal Get the lowest intra-group variation attainable. Basis : the k-means algorithm Designed to minimize intra-group variation Modifications Initialization Reassignment Additional step Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

144 Conclusion Disadvantage and solution Very lengthy Performing one simulation is time consuming (and the database could be bigger!) Lots of simulations needed (really?). Solution proposed The second additional step presented could be the solution. Reduce the number of simulations needed. Improve the result. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

145 Conclusion References I Forgy, E. W. (1965). Cluster analysis of multivariate data : efficiency versus interpretability of classifications. Biometrics, 21: Lance, G. N. et Williams, W. T. (1967). A general theory of classificatory sorting strategies ii. clustering systems. The computer journal, 10(3): Lloyd, S. (1982). Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2): Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

146 Conclusion References II MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, page 14. California, USA. Steinley, D. (2006). K-means clustering : a half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59(1):1 34. Perreault & Le Cavalier (ULaval) Constrained Clustering July / 31

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Package HPdcluster. R topics documented: May 29, 2015

Package HPdcluster. R topics documented: May 29, 2015 Type Package Title Distributed Clustering for Big Data Version 1.1.0 Date 2015-04-17 Author HP Vertica Analytics Team Package HPdcluster May 29, 2015 Maintainer HP Vertica Analytics Team

More information

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Matthias Schonlau RAND 7 Main Street Santa Monica, CA 947 USA Summary In hierarchical cluster analysis dendrogram graphs

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

A Comparative Study of clustering algorithms Using weka tools

A Comparative Study of clustering algorithms Using weka tools A Comparative Study of clustering algorithms Using weka tools Bharat Chaudhari 1, Manan Parikh 2 1,2 MECSE, KITRC KALOL ABSTRACT Data clustering is a process of putting similar data into groups. A clustering

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

K-Means Cluster Analysis. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 K-Means Cluster Analsis Chapter 3 PPDM Class Tan,Steinbach, Kumar Introduction to Data Mining 4/18/4 1 What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar

More information

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009 Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows: Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

More information

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status.

A successful market segmentation initiative answers the following critical business questions: * How can we a. Customer Status. MARKET SEGMENTATION The simplest and most effective way to operate an organization is to deliver one product or service that meets the needs of one type of customer. However, to the delight of many organizations

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster

More information

CLUSTER ANALYSIS FOR SEGMENTATION

CLUSTER ANALYSIS FOR SEGMENTATION CLUSTER ANALYSIS FOR SEGMENTATION Introduction We all understand that consumers are not all alike. This provides a challenge for the development and marketing of profitable products and services. Not every

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda Clustering 1/46 Agenda Introduction Distance K-nearest neighbors Hierarchical clustering Quick reference 2/46 1 Introduction It seems logical that in a new situation we should act in a similar way as in

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer

More information

Cluster Analysis using R

Cluster Analysis using R Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other

More information

Statistical Databases and Registers with some datamining

Statistical Databases and Registers with some datamining Unsupervised learning - Statistical Databases and Registers with some datamining a course in Survey Methodology and O cial Statistics Pages in the book: 501-528 Department of Statistics Stockholm University

More information

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1 Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in n-dimensional space and the algorithm calculates the distance between

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 10 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

More information

BIRCH: An Efficient Data Clustering Method For Very Large Databases

BIRCH: An Efficient Data Clustering Method For Very Large Databases BIRCH: An Efficient Data Clustering Method For Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny CPSC 504 Presenter: Discussion Leader: Sophia (Xueyao) Liang HelenJr, Birches. Online Image.

More information

Chapter 7. Cluster Analysis

Chapter 7. Cluster Analysis Chapter 7. Cluster Analysis. What is Cluster Analysis?. A Categorization of Major Clustering Methods. Partitioning Methods. Hierarchical Methods 5. Density-Based Methods 6. Grid-Based Methods 7. Model-Based

More information

Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

More information

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Data Mining for Customer Service Support Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin Traditional Hotline Services Problem Traditional Customer Service Support (manufacturing)

More information

Clustering Connectionist and Statistical Language Processing

Clustering Connectionist and Statistical Language Processing Clustering Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised

More information

Modifying Insurance Rating Territories Via Clustering

Modifying Insurance Rating Territories Via Clustering Modifying Insurance Rating Territories Via Clustering Quncai Zou, New Jersey Manufacturers Insurance Company, West Trenton, NJ Ryan Diehl, New Jersey Manufacturers Insurance Company, West Trenton, NJ ABSTRACT

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

Decision support system based on socio-demographic segmentation and distribution channel analysis in the US furniture market

Decision support system based on socio-demographic segmentation and distribution channel analysis in the US furniture market International Conference on Industrial Engineering and Systems Management IESM 2009 May 13-15, 2009 MONTREAL - CANADA Decision support system based on socio-demographic segmentation and distribution channel

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Clustering Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Clustering Algorithms K-means and its variants Hierarchical clustering

More information

The Service Optimization Challenge Business Paper

The Service Optimization Challenge Business Paper The Service Optimization Challenge Business Paper Table of Contents Introduction............................................2 The Service Optimization Challenge.......................... 3 Avoiding Losses

More information

Colour Image Segmentation Technique for Screen Printing

Colour Image Segmentation Technique for Screen Printing 60 R.U. Hewage and D.U.J. Sonnadara Department of Physics, University of Colombo, Sri Lanka ABSTRACT Screen-printing is an industry with a large number of applications ranging from printing mobile phone

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms 8 Cluster Analysis: Basic Concepts and Algorithms Cluster analysis divides data into groups (clusters) that are meaningful, useful, or both. If meaningful groups are the goal, then the clusters should

More information

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH

DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH DECENTRALIZED LOAD BALANCING IN HETEROGENEOUS SYSTEMS USING DIFFUSION APPROACH P.Neelakantan Department of Computer Science & Engineering, SVCET, Chittoor pneelakantan@rediffmail.com ABSTRACT The grid

More information

Cluster analysis with SPSS: K-Means Cluster Analysis

Cluster analysis with SPSS: K-Means Cluster Analysis analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,

More information

Smart Monitoring and Diagnostic System Application DOE Transactional Network Project

Smart Monitoring and Diagnostic System Application DOE Transactional Network Project Smart Monitoring and Diagnostic System Application DOE Transactional Network Project MICHAEL R. BRAMBLEY YUNZHI HUANG DANNY TAASEVIGEN ROBERT LUTES PNNL-SA-99235 1 Presentation Overview* SMDS performance

More information

On the effect of forwarding table size on SDN network utilization

On the effect of forwarding table size on SDN network utilization IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz

More information

K-Means Clustering Tutorial

K-Means Clustering Tutorial K-Means Clustering Tutorial By Kardi Teknomo,PhD Preferable reference for this tutorial is Teknomo, Kardi. K-Means Clustering Tutorials. http:\\people.revoledu.com\kardi\ tutorial\kmean\ Last Update: July

More information

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy

Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy Astronomical Data Analysis Software and Systems XIV ASP Conference Series, Vol. XXX, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds. P2.1.25 Making the Most of Missing Values: Object Clustering

More information

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

More information

Gerry Hobbs, Department of Statistics, West Virginia University

Gerry Hobbs, Department of Statistics, West Virginia University Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/4 What is

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Cluster Analysis for Evaluating Trading Strategies 1

Cluster Analysis for Evaluating Trading Strategies 1 CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. Jeff.Bacidore@itg.com +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. Kathryn.Berkow@itg.com

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Sampling Within k-means Algorithm to Cluster Large Datasets

Sampling Within k-means Algorithm to Cluster Large Datasets Sampling Within k-means Algorithm to Cluster Large Datasets Team Members: Jeremy Bejarano, 1 Koushiki Bose, 2 Tyler Brannan, 3 Anita Thomas 4 Faculty Mentors: Kofi Adragni 5 and Nagaraj K. Neerchal 5 Client:

More information

Credibility and Pooling Applications to Group Life and Group Disability Insurance

Credibility and Pooling Applications to Group Life and Group Disability Insurance Credibility and Pooling Applications to Group Life and Group Disability Insurance Presented by Paul L. Correia Consulting Actuary paul.correia@milliman.com (207) 771-1204 May 20, 2014 What I plan to cover

More information

Clustering Components of PySAL

Clustering Components of PySAL Clustering Components of PySAL Sergio Rey 1,2 Juan Carlos Duque 1 Luc Anselin 2,3 1 Regional Analysis Laboratory (REGAL) Department of Geography San Diego State University 2 Regional Economics Application

More information

Diagnosis of Students Online Learning Portfolios

Diagnosis of Students Online Learning Portfolios Diagnosis of Students Online Learning Portfolios Chien-Ming Chen 1, Chao-Yi Li 2, Te-Yi Chan 3, Bin-Shyan Jong 4, and Tsong-Wuu Lin 5 Abstract - Online learning is different from the instruction provided

More information

SoSe 2014: M-TANI: Big Data Analytics

SoSe 2014: M-TANI: Big Data Analytics SoSe 2014: M-TANI: Big Data Analytics Lecture 4 21/05/2014 Sead Izberovic Dr. Nikolaos Korfiatis Agenda Recap from the previous session Clustering Introduction Distance mesures Hierarchical Clustering

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

P13 Route Plan. E216 Distribution &Transportation

P13 Route Plan. E216 Distribution &Transportation P13 Route Plan Vehicle Routing Problem (VRP) Principles of Good Routing Technologies to enhance Vehicle Routing Real-Life Application of Vehicle Routing E216 Distribution &Transportation Vehicle Routing

More information

Content Delivery Network (CDN) and P2P Model

Content Delivery Network (CDN) and P2P Model A multi-agent algorithm to improve content management in CDN networks Agostino Forestiero, forestiero@icar.cnr.it Carlo Mastroianni, mastroianni@icar.cnr.it ICAR-CNR Institute for High Performance Computing

More information

Big Data and Scripting map/reduce in Hadoop

Big Data and Scripting map/reduce in Hadoop Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb

More information

How To Solve The Cluster Algorithm

How To Solve The Cluster Algorithm Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Content. Massively Multiplayer Online Games Previous Work. Cluster-based Approach. Evaluation Conclusions. P2P-based Infrastructure

Content. Massively Multiplayer Online Games Previous Work. Cluster-based Approach. Evaluation Conclusions. P2P-based Infrastructure Clustering Players for Load Balancing in Virtual Worlds Simon Rieche, Klaus Wehrle Group Chair of Computer Science IV RWTH Aachen University Marc Fouquet, Heiko Niedermayer, Timo Teifel, Georg Carle Computer

More information

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT

CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT 81 CHAPTER 5 WLDMA: A NEW LOAD BALANCING STRATEGY FOR WAN ENVIRONMENT 5.1 INTRODUCTION Distributed Web servers on the Internet require high scalability and availability to provide efficient services to

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

How To Determine A Class Of Auto Insurance In Massachusetts

How To Determine A Class Of Auto Insurance In Massachusetts Mass Auto Category Placement Rule Placement Variables ƒ Prior Insurance ƒ Years with Massachusetts Homeland Insurance Company or the Prior Carrier ƒ Quote Date ƒ Lapse in Insurance Coverage in the last

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the

More information

Chapter 12 File Management. Roadmap

Chapter 12 File Management. Roadmap Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

More information

A Study of Various Methods to find K for K-Means Clustering

A Study of Various Methods to find K for K-Means Clustering International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 A Study of Various Methods to find K for K-Means Clustering Hitesh Chandra Mahawari

More information

Clustering Anonymized Mobile Call Detail Records to Find Usage Groups

Clustering Anonymized Mobile Call Detail Records to Find Usage Groups Clustering Anonymized Mobile Call Detail Records to Find Usage Groups Richard A. Becker, Ramón Cáceres, Karrie Hanson, Ji Meng Loh, Simon Urbanek, Alexander Varshavsky, and Chris Volinsky AT&T Labs - Research

More information

EDA ad hoc B program. CORASMA project COgnitive RAdio for dynamic Spectrum MAnagement Contract N B-781-IAP4-GC

EDA ad hoc B program. CORASMA project COgnitive RAdio for dynamic Spectrum MAnagement Contract N B-781-IAP4-GC EDA ad hoc B program CORASMA project COgnitive RAdio for dynamic Spectrum MAnagement Contract N B-781-IAP4-GC Learning algorithms for power and frequency allocation in clustered ad hoc networks Luca ROSE,

More information

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets

Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Smart-Sample: An Efficient Algorithm for Clustering Large High-Dimensional Datasets Dudu Lazarov, Gil David, Amir Averbuch School of Computer Science, Tel-Aviv University Tel-Aviv 69978, Israel Abstract

More information

Portfolio Construction with OPTMODEL

Portfolio Construction with OPTMODEL ABSTRACT Paper 3225-2015 Portfolio Construction with OPTMODEL Robert Spatz, University of Chicago; Taras Zlupko, University of Chicago Investment portfolios and investable indexes determine their holdings

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Segmentation of stock trading customers according to potential value

Segmentation of stock trading customers according to potential value Expert Systems with Applications 27 (2004) 27 33 www.elsevier.com/locate/eswa Segmentation of stock trading customers according to potential value H.W. Shin a, *, S.Y. Sohn b a Samsung Economy Research

More information

Decentralized Utility-based Sensor Network Design

Decentralized Utility-based Sensor Network Design Decentralized Utility-based Sensor Network Design Narayanan Sadagopan and Bhaskar Krishnamachari University of Southern California, Los Angeles, CA 90089-0781, USA narayans@cs.usc.edu, bkrishna@usc.edu

More information

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES *

CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES * CLUSTERING LARGE DATA SETS WITH MIED NUMERIC AND CATEGORICAL VALUES * ZHEUE HUANG CSIRO Mathematical and Information Sciences GPO Box Canberra ACT, AUSTRALIA huang@cmis.csiro.au Efficient partitioning

More information

- Clustering Taiwan s Real Estate Data for Market Structure Analysis

- Clustering Taiwan s Real Estate Data for Market Structure Analysis Unlock the Value of Open Data - Clustering Taiwan s Real Estate Data for Market Structure Analysis 1 Sheng-Chi Chen, 2 Chien-hung Liu 1,2 Department of Management Information Systems, National Chengchi

More information

Cluster Analysis: Basic Concepts and Algorithms

Cluster Analysis: Basic Concepts and Algorithms Cluster Analsis: Basic Concepts and Algorithms What does it mean clustering? Applications Tpes of clustering K-means Intuition Algorithm Choosing initial centroids Bisecting K-means Post-processing Strengths

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

AN ADAPTIVE DISTRIBUTED LOAD BALANCING TECHNIQUE FOR CLOUD COMPUTING

AN ADAPTIVE DISTRIBUTED LOAD BALANCING TECHNIQUE FOR CLOUD COMPUTING AN ADAPTIVE DISTRIBUTED LOAD BALANCING TECHNIQUE FOR CLOUD COMPUTING Gurpreet Singh M.Phil Research Scholar, Computer Science Dept. Punjabi University, Patiala gurpreet.msa@gmail.com Abstract: Cloud Computing

More information

A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia

A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value. Malaysia A Hybrid Model of Data Mining and MCDM Methods for Estimating Customer Lifetime Value Amir Hossein Azadnia a,*, Pezhman Ghadimi b, Mohammad Molani- Aghdam a a Department of Engineering, Ayatollah Amoli

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information