Hierarchical Clustering on Principal Components (HCPC)

Similar documents
Technical Report Agrocampus

EuroVelo. The European cycle route network. EuroVelo, a project of the European Cyclists Federation

Gene Regulation, Epigenetics and Genome Stability. InternatIonal PhD Programme. In mainz, germany. Helsinki Tallinn. Riga. Moscow Vilnius Minsk.

Case Igglo. Mikko Ranin

Office Rents map EUROPE, MIDDLE EAST AND AFRICA. Accelerating success.

The History of the EU Speakers Conference. The history of the EU Speakers Conference includes four different stages.

ESIF Financial Instruments

PORTO ANTICO CONGRESS CENTER AQUARIUM PALAZZO DUCALE INTERNATIONAL EXHIBITION

HSE HR Circular 005/ th February 2010.

Indian E-Retail Congress 2013

DHL Global Energy Conference 2015 Outsourcing logistics Enhancing innovation or increasing risk?

Europe s Most Dynamic Cities. City Momentum Index March 2015

Telefónica Global Solutions. Channel Partner Program Expand your limits through our global scale

Guide. Axis Webinar User Guide

Group 1 Group 2 Group 3 Group 4

START OFF AT The MOST STRATEGIC LOCATION

European office market recovery continues but at varying speeds

How To Teach Spain

Opportunities for Action in Industrial Goods. The Price Is Right: Optimizing Industrial Companies Pricing of Services

Tutorial on Exploratory Data Analysis

UK/Europe Global Fares Guide

SWISS FOOTBALL LEAGUE (Änderungen vorbehalten / sous réserve de modification) (Version: )

olivier labarrère aka

The shift in equity shares trading pattern in terms of Number of Trades and Turnover

First half results presentation. October 2006

European office sector recovery continuing Divergence in speed and strength remains

Compared assessment of selected environmental indicators of photovoltaic electricity in OECD cities

Journal of Statistical Software

at the pace of business Leadership development In-house programs available! The Leadership Express Series Ottawa, ON

Maintenance & reliability management. H. Rozelot - Workshop Maintenance & Reliability SOLEIL- November 9 & 10

Discover our combined power

Guide. Axis Webinar. User guide

Opportunities for Action. Achieving Success in Business Process Outsourcing and Offshoring

UK Europe. Global Fares Guide Effective November 2004

Miami KeyCenter Empowering you to do more

business opportunities, Paris excellence

Opportunities for Action in Industrial Goods. Winning by Understanding the Full Customer Experience

Organizing Your Congress in Europe with European Know-How

2015 City RepTrak The World s Most Reputable Cities

EMEA Office MarketView

Freight Forwarders: Thinking Outside the Box

Digital Infrastructure and Economic Development. An Impact Assessment of Facebook s Data Center in Northern Sweden executive summary

What Makes Cities Successful Randstad on the World Stage

Finland Market Potential to Soder Airlines

UK/Europe Global Fares Guide New Zealand Edition

European office sector recovery gains momentum

Oslo, 23 February th quarter and preliminary annual results 2009

The Data Center of the Future: Creating New Jobs in Europe

Opportunities for Action in Consumer Markets. To Spend or Not to Spend: A New Approach to Advertising and Promotions

PRODUCTS AND SERVICES TO MAKE YOUR TRIP MEMORABLE. European Driving Guide

Household Energy Price Index for Europe. February 5th January Prices Just Released

Education at a Glance. OECD Indicators. Annex: UOE Data Collection Sources

Denied Boarding Eligibility

1st Prize th - 10th Prizes 1700х5= nd Prize th - 15th Prizes 1200х5= rd Prize th -20th Prizes 700х5=3500

How CPG manufacturers and retailers can collaborate to create offers that will make a difference. Implications of the Winning with Digital Study

Best Execution Policy for transactions in Financial Instruments

SRM How to maximize vendor value and opportunity

EMEA Rents map RETAIL Accelerating success.

Fact sheet DTZ Fair Value Index TM methodology

Opportunities for Action. Shared Services in Operations and IT: Additional Complexity or Real Synergies?

Interoute Virtual Data Centre. Hands on cloud control.

Human Resources Specialty Practice.

Aviation is crucial for Finland

Helsinki Metropolitan Area Adaptation Strategy Susanna Kankaanpää Helsinki Region Environmental Services Authority

ACADEMIC GUIDE SMSc SN

Professional Clients Mai A. Introduction. Helaba Best Execution Policy

Opportunities for Action in Financial Services. Sales Force Effectiveness: Moving Up the Middle and Managing New Prospects

Thermal performance analysis of two DualSun installa6ons. DualSun Thermal performance study of DualSun installa=ons by Transénergie


Voice & VAS for Smart People

EMEA Office MarketView

Interoute Virtual Data Centre. Hands on cloud control.

Denied Boarding Eligibility

Goodbye Spokesperson, Hello Steward

EXPERIENCE MORE FOR LESS

New inspirations for the Winter season at Nice-Côte d Azur airport

2002 PhD degree in management sciences, Université Louis Pasteur, Strasbourg I Thesis on «Transaction costs and options markets efficiency»

LANDWELL. Solicitors. Life Sciences Unit

seeing the whole picture HAY GROUP JOB EVALUATION MANAGER

Mercer Cost of Living Survey Worldwide Rankings, 2009 (including rental accommodation costs)

The Case for Central London Real Estate. Is the Recent Price Correction a Bubble or Here to Stay?

Vontobel Trading Venues

Clustering. Data Mining. Abraham Otero. Data Mining. Agenda

board solutions litigation support services Executive compensation

Greater than the Sum of its Parts: Professionalizing the Supervisory Board

Opportunities for Action in Operations. Working Capital Productivity: The Overlooked Measure of Business Performance Improvement

Transcription:

Clustering on Principal Components (HCPC) height 0 10 20 30 40 50 60 70 cluster 1 10 Kiev cluster 3 5 Moscow 0 Helsinki Minsk Krakow cluster 2 Oslo Copenhagen Prague Budapest Sarajevo Sofia Madrid Rome Athens Berlin Brussels Paris Lisbon -5 Reykjavik Stockholm Amsterdam London -10 Dublin -15-20 -10 0 10 20 30 LE RAY Guillaume MOLTO Quentin Students of AGROCAMPUS OUEST majored in applied statistics 1

Context R: A free, opensource software for statistics (1875 packages). FactoMineR: a R package, developped in Agrocampus- Ouest, dedicated to factorial analysis. The aim is to create a complementary tool to this package, dedicated to, especially after a factorial analysis. Wide range of choices and uses, results, and graphical representations. 2

Clustering and factorial analysis Factorial analysis and hierarchical are very complementary tools to explore data. Removing the last factors of a factorial analysis remove noise and makes the robuster. Analyses factorielles simples et multiples 4 éme édition, Escofier,Pagès 2008 3

Program structure Factorial analysis Factorial analysis PCA, MCA, MFA Clustering Ward, Euclidean partition K-means 4

Statistic methods (1) Factorial analysis : Function agnes Euclidean distance Ward criterion=d²(i,j)x(mi.mj)/(mi+mj) Suggested level to cut the tree: Intra-cluster inertia Partition comparison: Q=(I n+1 - I n )/I n+1 Max = nb of individuals/2 Inertia 0 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 Nb of clusters 5

Statistic methods (2) Factorial analysis Non optimal partition K means with the cluster centers 6

Statistic methods (2) Factorial analysis Non optimal partition K means with the cluster centers 7

Statistic methods (3) Clusters description Description by individuals: Use real individuals to caracterise clusters. Factorial analysis Description by variables: Give list of typical variable of clusters. Description by axes: Like in factorial analysis. 8

Dataset presentation 9

Factorial Analysis Dimension 2 (15.4%) -1.0-0.5 0.0 0.5 1.0 June July May August September April October November March February December January Factorial analysis -1.0-0.5 0.0 0.5 1.0 Dimension 1 (82.9%) Dim 2 (15.4%) -4-3 -2-1 0 1 2 3 Moscow Kiev Budapest Minsk Krakow Prague Sofia Helsinki Oslo Sarajevo Stockholm Copenhagen Berlin Paris Amsterdam Brussels London Dublin Reykjavik Madrid Rome Lisbon Athens -5 0 5 Dim 1 (82.9%) 10

0 10 20 30 40 50 60 70 Moscow Minsk Helsinki Oslo Clustering Stockholm Kiev Krakow Reykjavik Copenhagen Click to cut the tree Prague Sarajevo Sofia Berlin Budapest Dublin London Amsterdam Brussels Paris Madrid Rome Lisbon Athens 0 20 40 60 80 Option: inertia gain suggested level of cutting. Sort the individuals as on the first component. Factorial analysis 11

Clustering Factorial analysis 0 10 20 30 40 50 60 70 Moscow Minsk Helsinki Oslo Stockholm Kiev Krakow Reykjavik Copenhagen Prague Sarajevo Sofia Berlin Budapest Dublin London Amsterdam Brussels Paris Madrid Rome Lisbon Colored rectangles are drawn around the clusters. We keep the same color for each cluster in the next graphs (function rect). Athens Options: cut automatically the tree at the suggested level, Cut at level with a choosen number of clusters. 12

Factor map and clusters Factorial analysis -15-10 -5 0 5 10 Dim 2 (15.4%) cluster 1 cluster 2 cluster 3 Moscow Kiev Krakow Sofia Minsk Prague Oslo Berlin Sarajevo Helsinki Stockholm Copenhagen Paris Reykjavik Amsterdam Dublin Budapest Brussels London Madrid Rome Lisbon Athens -20-10 0 10 20 30 Dim 1 (82.9%) Options: Draw other axes, Remove the names, the centers. 13

Factor map, clusters, and tree Factorial analysis height 0 10 20 30 40 50 60 70 cluster 1 cluster 2 cluster 3 Moscow Elsinki Minsk Kiev Oslo Stockholm Krakow Prague Sofia Budapest Berlin Sarajevo Paris Copenhagen London Brussels Amsterdam Reykjavik Dublin Madrid Rome Lisbon -20-10 0 10 20 30 Dim 1 (82.9%) Athens 0-5 -10-15 Options: Draw only a part of the tree, Draw other axes, Remove the names Change the height 5 10 14

Cluster description (1) By individuals factorial analysis Option: the number of individuals for each cluster (here 2) Dim 2 (15.4%) -15-10 -5 0 5 10 cluster 1 cluster 2 cluster 3 Moscow Kiev Minsk Krakow Sofia Prague Sarajevo Helsinki Berlin Oslo Stockholm Copenhagen Paris Amsterdam Brussels London Reykjavik Budapest Dublin Madrid Rome Lisbon Athens -20-10 0 10 20 30 Dim 1 (82.9%) 15

Cluster description (1) By individuals factorial analysis Option: the number of individuals for each cluster (here 2) Dim 2 (15.4%) -15-10 -5 0 5 10 cluster 1 cluster 2 cluster 3 Moscow Kiev Minsk Krakow Sofia Prague Sarajevo Helsinki Berlin Oslo Stockholm Copenhagen Paris Amsterdam Brussels London Reykjavik Budapest Dublin Madrid Rome Lisbon Athens -20-10 0 10 20 30 Dim 1 (82.9%) 16

Cluster description (2) By individuals factorial analysis Option: the number of individuals for each cluster (here 2) Dim 2 (15.4%) -15-10 -5 0 5 10 cluster 1 cluster 2 cluster 3 Moscow Kiev Minsk Krakow Sofia Prague Sarajevo Helsinki Berlin Oslo Stockholm Copenhagen Paris Amsterdam Brussels London Reykjavik Budapest Dublin Madrid Rome Lisbon Athens -20-10 0 10 20 30 Dim 1 (82.9%) 17

Cluster description (3) By variables This is the result of a catdes, it describes the different clusters by the variables (the mean in the category, the v.test ) factorial analysis Option: the p.value (here 0.05). 18

Cluster description (3) By axes factorial analysis This is the result of a catdes, it describes the different clusters by the axes (the mean in the category, the v.test ) Option: the p.value (here 0.05). 19

Conclusion This function was presented with a PCA, but it also acepts: MCA and MFA results, directly a quantitative dataset (nonscaled PCA), a continuous variables to divide into modalities. A normal distribution divided in 3 clusters 20

Function plot.catdes cluster 1 cluster 3 cluster 1 cluster 2 cluster 3 v.test -4-2 0 2 4 Mars Février Novembre Octobre Décembre Janvier Avril Septembre Août Mai Juillet Juin v.test -4-2 0 2 4 Mai Janvier Décembre Juin Février Mars Avril Juillet Novembre Août Octobre Septembre v.test -4-2 0 2 4 Dim.1 v.test -4-2 0 2 4 Dim.3 v.test -4-2 0 2 4 Dim.1 It is a graphical representation of the desc.var results Option: show only the quantitative, qualitative variables or all 21