# Hierarchical Clustering on Principal Components (HCPC)

1 Clustering on Principal Components (HCPC) height cluster 1 10 Kiev cluster 3 5 Moscow 0 Helsinki Minsk Krakow cluster 2 Oslo Copenhagen Prague Budapest Sarajevo Sofia Madrid Rome Athens Berlin Brussels Paris Lisbon -5 Reykjavik Stockholm Amsterdam London -10 Dublin LE RAY Guillaume MOLTO Quentin Students of AGROCAMPUS OUEST majored in applied statistics 1

2 Context R: A free, opensource software for statistics (1875 packages). FactoMineR: a R package, developped in Agrocampus- Ouest, dedicated to factorial analysis. The aim is to create a complementary tool to this package, dedicated to, especially after a factorial analysis. Wide range of choices and uses, results, and graphical representations. 2

3 Clustering and factorial analysis Factorial analysis and hierarchical are very complementary tools to explore data. Removing the last factors of a factorial analysis remove noise and makes the robuster. Analyses factorielles simples et multiples 4 éme édition, Escofier,Pagès

4 Program structure Factorial analysis Factorial analysis PCA, MCA, MFA Clustering Ward, Euclidean partition K-means 4

5 Statistic methods (1) Factorial analysis : Function agnes Euclidean distance Ward criterion=d²(i,j)x(mi.mj)/(mi+mj) Suggested level to cut the tree: Intra-cluster inertia Partition comparison: Q=(I n+1 - I n )/I n+1 Max = nb of individuals/2 Inertia Nb of clusters 5

6 Statistic methods (2) Factorial analysis Non optimal partition K means with the cluster centers 6

7 Statistic methods (2) Factorial analysis Non optimal partition K means with the cluster centers 7

8 Statistic methods (3) Clusters description Description by individuals: Use real individuals to caracterise clusters. Factorial analysis Description by variables: Give list of typical variable of clusters. Description by axes: Like in factorial analysis. 8

9 Dataset presentation 9

10 Factorial Analysis Dimension 2 (15.4%) June July May August September April October November March February December January Factorial analysis Dimension 1 (82.9%) Dim 2 (15.4%) Moscow Kiev Budapest Minsk Krakow Prague Sofia Helsinki Oslo Sarajevo Stockholm Copenhagen Berlin Paris Amsterdam Brussels London Dublin Reykjavik Madrid Rome Lisbon Athens Dim 1 (82.9%) 10

11 Moscow Minsk Helsinki Oslo Clustering Stockholm Kiev Krakow Reykjavik Copenhagen Click to cut the tree Prague Sarajevo Sofia Berlin Budapest Dublin London Amsterdam Brussels Paris Madrid Rome Lisbon Athens Option: inertia gain suggested level of cutting. Sort the individuals as on the first component. Factorial analysis 11

12 Clustering Factorial analysis Moscow Minsk Helsinki Oslo Stockholm Kiev Krakow Reykjavik Copenhagen Prague Sarajevo Sofia Berlin Budapest Dublin London Amsterdam Brussels Paris Madrid Rome Lisbon Colored rectangles are drawn around the clusters. We keep the same color for each cluster in the next graphs (function rect). Athens Options: cut automatically the tree at the suggested level, Cut at level with a choosen number of clusters. 12

13 Factor map and clusters Factorial analysis Dim 2 (15.4%) cluster 1 cluster 2 cluster 3 Moscow Kiev Krakow Sofia Minsk Prague Oslo Berlin Sarajevo Helsinki Stockholm Copenhagen Paris Reykjavik Amsterdam Dublin Budapest Brussels London Madrid Rome Lisbon Athens Dim 1 (82.9%) Options: Draw other axes, Remove the names, the centers. 13

14 Factor map, clusters, and tree Factorial analysis height cluster 1 cluster 2 cluster 3 Moscow Elsinki Minsk Kiev Oslo Stockholm Krakow Prague Sofia Budapest Berlin Sarajevo Paris Copenhagen London Brussels Amsterdam Reykjavik Dublin Madrid Rome Lisbon Dim 1 (82.9%) Athens Options: Draw only a part of the tree, Draw other axes, Remove the names Change the height

15 Cluster description (1) By individuals factorial analysis Option: the number of individuals for each cluster (here 2) Dim 2 (15.4%) cluster 1 cluster 2 cluster 3 Moscow Kiev Minsk Krakow Sofia Prague Sarajevo Helsinki Berlin Oslo Stockholm Copenhagen Paris Amsterdam Brussels London Reykjavik Budapest Dublin Madrid Rome Lisbon Athens Dim 1 (82.9%) 15

16 Cluster description (1) By individuals factorial analysis Option: the number of individuals for each cluster (here 2) Dim 2 (15.4%) cluster 1 cluster 2 cluster 3 Moscow Kiev Minsk Krakow Sofia Prague Sarajevo Helsinki Berlin Oslo Stockholm Copenhagen Paris Amsterdam Brussels London Reykjavik Budapest Dublin Madrid Rome Lisbon Athens Dim 1 (82.9%) 16

17 Cluster description (2) By individuals factorial analysis Option: the number of individuals for each cluster (here 2) Dim 2 (15.4%) cluster 1 cluster 2 cluster 3 Moscow Kiev Minsk Krakow Sofia Prague Sarajevo Helsinki Berlin Oslo Stockholm Copenhagen Paris Amsterdam Brussels London Reykjavik Budapest Dublin Madrid Rome Lisbon Athens Dim 1 (82.9%) 17

18 Cluster description (3) By variables This is the result of a catdes, it describes the different clusters by the variables (the mean in the category, the v.test ) factorial analysis Option: the p.value (here 0.05). 18

19 Cluster description (3) By axes factorial analysis This is the result of a catdes, it describes the different clusters by the axes (the mean in the category, the v.test ) Option: the p.value (here 0.05). 19

20 Conclusion This function was presented with a PCA, but it also acepts: MCA and MFA results, directly a quantitative dataset (nonscaled PCA), a continuous variables to divide into modalities. A normal distribution divided in 3 clusters 20

21 Function plot.catdes cluster 1 cluster 3 cluster 1 cluster 2 cluster 3 v.test Mars Février Novembre Octobre Décembre Janvier Avril Septembre Août Mai Juillet Juin v.test Mai Janvier Décembre Juin Février Mars Avril Juillet Novembre Août Octobre Septembre v.test Dim.1 v.test Dim.3 v.test Dim.1 It is a graphical representation of the desc.var results Option: show only the quantitative, qualitative variables or all 21

### Tutorial on Exploratory Data Analysis

Tutorial on Exploratory Data Analysis Julie Josse, François Husson, Sébastien Lê julie.josse at agrocampus-ouest.fr francois.husson at agrocampus-ouest.fr Applied Mathematics Department, Agrocampus Ouest

### Journal of Statistical Software

JSS Journal of Statistical Software March 2008, Volume 25, Issue 1. http://www.jstatsoft.org/ FactoMineR: An R Package for Multivariate Analysis Sébastien Lê Agrocampus Rennes Julie Josse Agrocampus Rennes

