IHEID - IMAS 2008-09 Introduction to data processing Benoît Vulliet Tuesday 10th February 2009 / 3:30-5:30pm Aubert room
Introduction : Reminder of how this course came into being created by Christian Corminboeuf This presentation consists of two sections: 1/ Concepts of data bases 2/ Presenting information in a diagram 2
1/ Concepts of data bases A practical exercise : When we investigate the marks achieved by 2,500 students, representing 16,000 marks, we want to: - Find out the general average - Compare the average for men and women - And break down results according to number of years experience How can we do this? 3
1/ Concepts of data bases We could use a word processor, such as Word, and create a table: Prénom : Kofi Années d'expérience : 6 Pays d'origine : Togo Années d'expérience : 3 Notes obtenues : 4, 5, 6, 5, 5.5, Unusable, no structure Prénom : Pierre Années d'expérience : 2 Pays d'origine : Suisse Années d'expérience : 5 Notes obtenues : 3, 5, 3, 4, 4.5, Etc. 4
1/ Concepts of data bases We could use a data processor, such as Excel, which has a precise structure: Each person s marks are located in the same data cell 5
1/ Notions de base de données On peut mettre une note par cellule : Solution envisageable malgré la redondance des informations 6
1/ Concepts of data bases We could insert one mark per cell: This is a possible solution, despite the empty cells. What other solutions exist apart from Excel? 7
1/ Concepts of data bases What does data base mean? A definition : "One or a series of disc or memory files that enable the permanent or temporary storage of and access to structured information." We generally use the following terms: - "data base" or "DBMS" : Data Base Management System - "SGBD" : Système de Gestion de Base de Données - Sistema de Gestión de Base de Datos 8
1/ Concepts of data bases Analysis phase: questions to ask - How can the data gathered be grouped? - What links exist between the data items gathered? - How can we draw links between two groups of information? - What I.T. tool should be chosen to manage this type of data? 9
1/ Concepts of data bases Among the main DBMS tools available on the market, including: - Access : just for PCs (Window, XP, Vista) - FileMaker : PCs and Macinstosh - MySQL : multi-platform, open source software - 4 th Dimension : PC and Macinstosh, certainly the most powerful database, free for academics 10
1/ Concepts of data bases Understanding the language of data bases - Information belonging to the same group are gathered in a TABLE or FILE. - Data in a table are placed in FIELDS, COLUMNS or ROWS - These fields can be of various types: ALPHA NUMERICAL, WHOLE NUMBER, REAL NUMBER, BOOLEEN, IMAGE, TIME, DATE etc. - Data can be viewed on screen thanks to FORMS, MODELS or LAYOUT. These models can be ENTRY, QUERY, LIST or EXIT. 11
1/ Concepts of data bases An example created with 4D : 12
1/ Concepts of data bases Example of an EXIT or LIST format (4D) : 13
1/ Concepts of data bases Example of a QUERY or ENTRY format: 14
1/ Concepts of data bases Example of analysis (4D) : 15
1/ Concepts of data bases Correlation of marks obtained according to professional experience: : 16
1/ Concepts of data bases An example of data management with Excel 17
END OF THE FIRST SECTION 18
2/ Presenting information in a diagram Problems with presentation and titles 19
Where is the IUED? Where is Guangdong? Teacher and Jury? 20
Where? What is the VNU-DDS? 21
? 22
Title of the photo? Think about what photocopying will do to your image! 23
Colour printing didn t help! 24
Thanks, EQI! photos taken by EQI architect 25
CARE WHEN USING XEROX! 26
CARE WHEN USING XEROX! ILLEGIBLE 27
No comment! 28
2/ Presenting information in a diagram LEARN TO BETTER...! ANALYSE INFORMATION! SUMMARISE DATA IN ORDER TO IMPROVE! DECISION-MAKING! COMMUNICATION 29
2/ Presenting information in a diagram Four stages 1) SORT INFORMATION 2) EMPHASISE DIFFERENCES 3) SINGLE FRAME OF REFERENCE 4) SIMPLIFY SO AS EASIER TO SEE 30
THE TEN COMMANDMENTS! Sort information Summarise, simplify Put it in context Show differences (by enlarging them) Group together things that belong together Compare what is comparable Have a single frame of reference Densify information Map if necessary Clearly show titles 31
Example 1 (Initiation in data analysis, Jean de Lagarde, Dunod, 1983, pp. 38-39) NEIGHBOURHOOD Publicity spending (P) Number visits (v) Sales Volume (V) A statistician s response: Correlation Sales / visits V = + 0.757 v + 56.04 Correlation Publicity / visits P = - 3.214 v + 188.2 Correlation Sales / visites + publicity V = 1.020 v + 0.082 P + 40.67 32
STEP 1 VISUALISE THE VALUES Publicity spending Sales volumes Number of visits 33
ETAPE 2 STEP 2 ADAPT SCALE ADAPTER LES ECHELLES 115 DEPENSES PUBLICITE Publicity spending 95 75 82 80 78 76 74 A B C D E F G H VOLUME DES VENTES Sales volumes A B C D E F G H NOMBRE DE VISITES Number of visits 33 28 23 A B C D E F G H 34
STEP 3 SORT VALUES Publicity spending Sales volumes Number of visits 35
ETAPE 4 STEP 4 ENLARGE THE DIFFERENCES AMPLIFIER LES DIFFERENCES 115 105 95 85 75 DEPENSES PUBLICITE m =95 NOMBRE DE VISITES 33 28 m =29 23 84 VOLUME DES VENTES 79 m = 78 74 H A G B E D C F 36
Example of information processing (Initiation in data analysis, Jean de Lagarde, Dunod, 1983, pp. 38-39) NEIGHBOURHOODS Publicity spending (P) Number visits (v) Sales Volume (V) 37
Example 2 ENGLARGE DIFFERENCES 38
1) Données brutes non classées 1) UNSORTED RAW DATA ENSEMBLE REINS AMIENS LILLE NICE MARSEILLE LYON CLERMONT FERRANT DIJON POITIER PAU TOULOUSE BORDEAUX NANTES BREST ROUEN LE HAVRE 46.4 48.1 48.3 48.1 55.0 51.7 48.1 48.1 45.8 46.5 40.0 47.9 38.6 45.1 41.2 47.7 41.4 0.0 10.0 20.0 30.0 40.0 50.0 60.0 39
2) Données brutes classées 2) SORTED RAW DATA ENSEMBLE NICE MARSEILLE AMIENS REINS LILLE LYON CLERMON FERRANT TOULOUSE ROUEN POITIER DIJON NANTES LE HAVRE BREST PAU BORDEAUX 46.4 55.0 51.7 48.3 48.1 48.1 48.1 48.1 47.9 47.7 46.5 45.8 45.1 41.4 41.2 40.0 38.6 0.0 10.0 20.0 30.0 40.0 50.0 60.0 40
3) Données 3) DATA classées SORTED avec WITH changement CHANGE OF d'échelle SCALE AND et AVERAGE mise en HIGHLIGHTED évidence de la moyenne NICE MARSEILLE AMIENS REINS LILLE LYON CL. FERRANT TOULOUSE ROUENS POITIER DIJON NANTES LE HAVRE BREST PAU BORDEAUX 36.8 % MOYENNE 46.4 % 55.0 % 41
4) A MAP LINKS DATA TO ITS GEOGRAPHICAL DISTRIBUTION 42
Examples of diagram creation In the Excel document "exemples_graph.xls" exemples_graph.xls", see the three examples: - HOUSING - LIVESTOCK - COMPANIES 43
"A lack of clarity is helpful when the message is poor" Read for pleasure......see to remember! 44
END OF SECTION TWO 45
Thank you for your attention IHEID - IMAS 2008-09 / 10.02.09 / BV 46