STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS



Similar documents
Demographic indicators

The drive-test campaigns in Italy

Exploratory Data Analysis

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Delegated to CNR on December 23rd, New synchronous registration system from September 28 th, 2009

ASSOBIOMEDICA AND BIOMEDICAL START-UPS. Vera Codazzi, Ph.D.

Regione Provincia Distretto Abruzzo Chieti DISTRETTO 009 Abruzzo Chieti DISTRETTO 010 Abruzzo Chieti DISTRETTO 011 Abruzzo Chieti DISTRETTO 015

Exploratory data analysis (Chapter 2) Fall 2011

Variables. Exploratory Data Analysis

Graphics in R. Biostatistics 615/815

Descriptive Statistics

Traineeships Regulation in Italy after the Fornero Labour Market Reform

Lecture 1: Review and Exploratory Data Analysis (EDA)

5,000,000,000 Covered Bond Programme

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

AN OVERVIEW OF THE UK OUTBOUND MARKET

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Getting Started with R and RStudio 1

SOME EXCEL FORMULAS AND FUNCTIONS

Strategic Development of Crédit Agricole S.A. in Italy October 12, 2006

RESULTS OF THE NATIONAL SURVEY ON RADON INDOORS IN ALL THE 21 ITALIAN REGIONS

HOSPICE (AND PALLIATIVE CARE NETWORK) IN ITALY AN UNMPREDICTABLE GROWTH

Data Exploration Data Visualization

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

CONTENTS THE ITALIAN REVENUE AGENCY THE GOVERNANCE CENTRAL ORGANISATION CENTRAL DIRECTORATES FUNCTIONS STAFF OFFICES FUNCTIONS

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

February Monitor of Bankruptcies, Insolvency Proceedings and Business Closures FourthQuarter 2012

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Viewing Ecological data using R graphics

Big Ideas in Mathematics

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics: Summary Statistics

a. mean b. interquartile range c. range d. median

How To Write A Data Analysis

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress

PRESENTATION ITALIAN PRISON SYSTEM. Italia, LPPS November 2010

Getting started with qplot

Fiscal federalism in Italy at a glance

AP * Statistics Review. Descriptive Statistics

Module 4: Data Exploration

Best regards President of Italian Dance Sport Federation Christian Zamblera

Summarizing and Displaying Categorical Data

Week 1. Exploratory Data Analysis

SCIENTIFIC CALCULATOR OPERATION GUIDE. <Write View>

MUNICIPAL SOLID WASTE MANAGEMENT IN ITALY

The Australian Curriculum Mathematics

Telecom Italia Portfolio Beni Stabili Investor Day

4 Other useful features on the course web page. 5 Accessing SAS

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

PRE-CALCULUS GRADE 12

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

Exercise 1.12 (Pg )

Math Course Descriptions & Student Learning Outcomes

3: Summary Statistics

Exploratory Data Analysis. Psychology 3256

Rifiuti tra crescita e decrescita

2 Describing, Exploring, and

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

PCHS ALGEBRA PLACEMENT TEST

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Mathematics. GCSE subject content and assessment objectives

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007

FX 260 Training guide. FX 260 Solar Scientific Calculator Overhead OH 260. Applicable activities

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Gruppo Intesa Network

Mathematics programmes of study: key stage 4. National curriculum in England

How Does My TI-84 Do That

MATLAB Basics MATLAB numbers and numeric formats

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Dongfeng Li. Autumn 2010

R: A self-learn tutorial

FX 115 MS Training guide. FX 115 MS Calculator. Applicable activities. Quick Reference Guide (inside the calculator cover)

Core Maths C2. Revision Notes

Geostatistics Exploratory Analysis

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Lecture 2. Summarizing the Sample

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

R Language Fundamentals

Milan: a rich region Province of Milan = 3.7 million inhabitants Italy s richest urban agglomeration & one of the wealthy OECD metro regions

Diagrams and Graphs of Statistical Data

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Descriptive statistics parameters: Measures of centrality

BIOL 933 Lab 6 Fall Data Transformation

Display Format To change the exponential display format, press the [MODE] key 3 times.

2. Filling Data Gaps, Data validation & Descriptive Statistics

Transcription:

STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS Manuela Cattelan 1 ABC OF R 1.1 ARITHMETIC AND LOGICAL OPERATORS. VARIABLES AND AS- SIGNMENT OPERATOR R works just as a pocket calculator, when performing elementary computations. The arithmetic operators of addition, subtraction, multiplication, division and power are ADDITION SUBTRACTION MULTIPLICATION DIVISION POWER + - * / ^ When necessary, (round) parentheses can be used to force a given order of the computations. > 1 + 2 + 3/3 [1] 4 > (1 + 2 + 3)/3 [1] 2 > 2 * (10-20)^3 [1] -2000 > (0.2 * 10 + 0.3 * 12 + 0.5 * 7)/2 [1] 4.55 In most cases it is convenient to store constants or intermediate results into the computer memory for later use, in the same session. The corresponding areas of the computer memory are identified by names chosen by the user. The assignment operator <- is obtained by entering the < and - keys on the computer keyboard. > a <- 100 > b <- a * (5-2)/5 > b [1] 60 The operators for binary comparisons are The logical operators! (NOT ), & (AND) and (OR) are used to form complex logical expressions. Within R, the logical constants TRUE and FALSE are used to show that a given logical expression is true or false. Based on material prepared by Prof. Mario Romanazzi 1

1 ABC OF R 2 LOWER LOWER OR EQUAL GREATER GREATER OR EQUAL EQUAL NOT EQUAL < <= > >= ==! = > neg <- -1 > pos <- 10 > neg <= 0 [1] TRUE > neg * pos < 0 [1] TRUE > neg < 0 & pos > 0 [1] TRUE > pos >= 0 pos < 5 [1] TRUE 1.2 MATHEMATICAL FUNCTIONS The usual mathematical functions are available. Some common functions are listed below. FUNCTION Absolute value Square root Logarithm (natural, base e) Logarithm (base 10) Exponential Trigonometric: sine, cosine, tangent Factorial Binomial coefficient R NAME abs sqrt log log10 exp sin, cos, tan factorial choose The following results reflect the very definitions of the functions. > sqrt(100) [1] 10 > log10(10000) [1] 4 > 10^(log10(10000)) [1] 10000 > sin(pi/2) [1] 1 > cos(pi/2) [1] 6.123032e-17 > factorial(5)

1 ABC OF R 3 [1] 120 > factorial(10) [1] 3628800 > choose(10, 2) [1] 45 1.3 USER DEFINED FUNCTIONS The user can define specific functions. For example, the following code defines two functions to compute the length of the circumference and the area of a circle from the length of the radius. > circ_l <- function(x) 2 * pi * x > circ_a <- function(x) pi * x^2 > circ_l(1) [1] 6.283185 > circ_a(0.5) [1] 0.7853982 1.4 DATA STRUCTURES: VECTORS The most important data structure is the vector, an ordered collection of n 1 items of the same type (numerical, alphanumeric, logical). Note that a scalar is a numerical vector with just one element. There are several ways to define vectors. The most general one is through the c function (c means concatenate). The length function gives the size (number of components) of a vector. > itp_age <- c(71, 74, 63, 71, 66, 63, 82, 58, 74, 79, 81) > itp_name <- c("e. De Nicola", "L. Einaudi", "G. Gronchi", "A. Segni", + "G. Saragat", "G. Leone", "S. Pertini", "F. Cossiga", "O. L. Scalfaro", + "C. A. Ciampi", "G. Napolitano") > usp_age <- c(61, 63, 44, 55, 56, 61, 53, 70, 65, 47, 55, 47) > usp_name <- c("h. S. Truman", "D. D. Eisenhower", "J. F. Kennedy", + "L. B. Johnson", "R. Nixon", "G. Ford", "J. Carter", "R. Reagan", + "G. Bush", "B. Clinton", "G. W. Bush", "B. Obama") > length(itp_age) [1] 11 > length(itp_age) == length(itp_name) [1] TRUE Let x denote the vector (x 1, x 2,..., x n ). Special vectors can be defined by using the (colon) : operator (x i+1 x i = 1, i = 1,..., n 1) or the seq (x i+1 x i = c, i = 1,..., n 1, where c is a constant) and the rep (x i = c, i = 1,..., n, where c is a constant) functions (seq and rep are abbreviations of sequence and repeat, respectively). > -2:10 [1] -2-1 0 1 2 3 4 5 6 7 8 9 10 > -2.5:10

1 ABC OF R 4 [1] -2.5-1.5-0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 > seq(1, 20, 2) [1] 1 3 5 7 9 11 13 15 17 19 > seq(-5, 5, 0.5) [1] -5.0-4.5-4.0-3.5-3.0-2.5-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0 [16] 2.5 3.0 3.5 4.0 4.5 5.0 > rep(0, 5) [1] 0 0 0 0 0 > c(-0.5, 1:4, rep(5, 3)) [1] -0.5 1.0 2.0 3.0 4.0 5.0 5.0 5.0 1.5 VECTOR OPERATIONS Vector components or subvectors are obtained by giving either their positions within the vector or a property they satisfy. In both cases, the subsetting operator [] is used. > itp_name[1] [1] "E. De Nicola" > itp_name[1:3] [1] "E. De Nicola" "L. Einaudi" "G. Gronchi" > itp_name[c(1, length(itp_name))] [1] "E. De Nicola" "G. Napolitano" > itp_name[itp_age < 60] [1] "F. Cossiga" > usp_name[usp_age > 40 & usp_age < 50] [1] "J. F. Kennedy" "B. Clinton" "B. Obama" In general, transformation of vectors by mathematical functions is performed component wise. Moreover, arithmetic operations involving several vectors require the vectors to have the same length. > v1 <- -2:5 > v2 <- 6:13 > abs(v1) [1] 2 1 0 1 2 3 4 5 > v1 + v2 [1] 4 6 8 10 12 14 16 18 > 2 * v1 - v2 [1] -10-9 -8-7 -6-5 -4-3

2 BASIC STATISTICS 5 > 2 * v1-1 [1] -5-3 -1 1 3 5 7 9 To arrange the vector components from minimum to maximum, the sort function can be used. The functions min, max, which.min, which.max produce the minimum and maximum entries and the corresponding positions within the vector. > sort(itp_age) [1] 58 63 63 66 71 71 74 74 79 81 82 > c(min(itp_age), max(itp_age)) [1] 58 82 > c(which.min(itp_age), which.max(itp_age)) [1] 8 7 > sort(c("carla", "Francesco", "Paola", "Matteo", "Maria")) [1] "Carla" "Francesco" "Maria" "Matteo" "Paola" 2 BASIC STATISTICS We use the Presidents age data to illustrate how to produce a statistical report with R. We list in the following table some basic functions (both analytical and graphical). Note that they all have as basic argument the data vector. STATISTICAL FUNCTION Sample size Frequency table Stem-and-leaf Order statistic Basic location statistics Quantile Median Mean Variance (unbiased version) Standard deviation (unbiased version) Box-plot Histogram R NAME length table stem sort summary quantile median mean var sd boxplot hist The stem-and-leaf display shows the general features of the distribution: range, location, dispersion, shape, possible outliers. It is mainly useful with small sample sizes, as in the present case. > stem(itp_age, scale = 0.5) The decimal point is 1 digit(s) to the right of the 5 8 6 336 7 11449 8 12 > stem(usp_age)

2 BASIC STATISTICS 6 The decimal point is 1 digit(s) to the right of the 4 477 5 3556 6 1135 7 0 The summary function, when applied to a numerical vector, gives basic statistics to evaluate location: minimum and maximum values, quartiles and mean. In computing empirical (sample) quantiles, R employs a more refined interpolation algorithm that that described in the textbook (to be used with hand computations). With small samples, discrepancies are observed. The range and the interquartile range are easily evaluated from these results. Another dispersion statistic is the standard deviation, to be used together with the sample mean. > summary(itp_age) Min. 1st Qu. Median Mean 3rd Qu. Max. 58.00 64.50 71.00 71.09 76.50 82.00 > summary(usp_age) Min. 1st Qu. Median Mean 3rd Qu. Max. 44.00 51.50 55.50 56.42 61.50 70.00 > sd(itp_age) [1] 7.905119 > sd(usp_age) [1] 7.925314 A very useful graphical comparison of the two samples is given by the paired box-plot. Whereas the stem-and-leaf displays the full order statistic, the box-plot displays only the quartiles and the extreme statistics. > boxplot(itp_age, usp_age, horizontal = TRUE, xlab = "Presidents' age (years)", + names = c("italy", "US"), col = "lavender", main = "Italy vs US, 1945-2010")

3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS 7 Italy vs US, 1945 2010 Italy US 50 60 70 80 Presidents' age (years) Here, the most important feature is a location shift on the right of Italy with respect to US. The statistical tendency is that the italian Presidents are older than US Presidents: the difference between the medians is about 15.5 years. The dispersion does not seem very different (e.g, compare the standard deviations and the IQRs). 3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS The file regioni.txt contains the required information and can be used to simulate the collection of the data. First of all we input the file into the R system. > regio <- read.table("http://venus.unive.it/romanaz/statistics/data/regioni.txt", + header = TRUE, na.strings = "NA") > str(regio) 'data.frame': 20 obs. of 7 variables: $ name : Factor w/ 20 levels "Abruzzo","Basilicata",..: 12 19 8 9 17 20 6 5 16 18... $ area : Factor w/ 3 levels "C","N","S": 2 2 2 2 2 2 2 2 1 1... $ capo : Factor w/ 20 levels "Ancona","Aosta",..: 17 2 8 10 18 20 19 4 7 13... $ totsurf: int 25399 3263 5421 23861 13607 18379 7844 22123 22997 8456... $ coast : int NA NA 346 NA NA 156 110 130 573 NA... $ res09 : int 4432571 127065 1615064 9742676 1018657 4885548 1230936 4337979 3707818 894222... $ car : num 623 2104 467 558 555...

3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS 8 > regio name area capo totsurf coast res09 car 1 Piemonte N Torino 25399 NA 4432571 623.3 2 Valle d'aosta N Aosta 3263 NA 127065 2104.3 3 Liguria N Genova 5421 346 1615064 467.3 4 Lombardia N Milano 23861 NA 9742676 558.5 5 Trentino-Alto Adige N Trento 13607 NA 1018657 555.0 6 Veneto N Venezia 18379 156 4885548 422.6 7 Friuli-Venezia Giulia N Trieste 7844 110 1230936 525.9 8 Emilia-Romagna N Bologna 22123 130 4337979 534.7 9 Toscana C Firenze 22997 573 3707818 541.9 10 Umbria C Perugia 8456 NA 894222 689.6 11 Marche C Ancona 9694 172 1569578 616.2 12 Lazio C Roma 17208 357 5626710 699.7 13 Abruzzo S L'Aquila 10799 124 1334675 696.2 14 Molise S Campobasso 4438 34 320795 658.5 15 Campania S Napoli 13595 461 5812962 568.1 16 Puglia S Bari 19363 830 4079702 560.7 17 Basilicata S Potenza 9992 59 590601 702.7 18 Calabria S Reggio Calabria 15080 710 2008709 609.5 19 Sicilia S Palermo 25707 1425 5037799 594.4 20 Sardegna S Cagliari 24089 1636 1671001 657.2 The structure is a data table with N = 20 rows (the regions, statistical units) and 7 columns (the variables) 1. name, name of region (identifier) 2. area, geographical area (stratification v.) 3. maint, region capital (categorical v.) 4. totsurf, region total surface area (numerical v., km 2 ) 5. coast, region total coast length (numerical v., km) 6. res09, total number of residents, as of 1/1/2009 (numerical v., count) 7. car, number of cars for 1000 residents, (numerical v.) We start data exploration with the study of the stratification variable, geographical area. The frequency distribution is obtained by the R function table and the corresponding graphical display with barplot. > N <- dim(regio)[1] > abs_f <- table(regio$area) > abs_f C N S 4 8 8 > rel_f <- 100 * (table(regio$area)/n) > rel_f C N S 20 40 40

3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS 9 > barplot(rel_f, xlab = "Geographical Area", ylab = "Frequency (%)", + main = "Distribution of Regions According to Geographical Area", + col = "lavender") Distribution of Regions According to Geographical Area Frequency (%) 0 10 20 30 40 C N S Geographical Area The density of the residents is the ratio of the total number of residents in a geographical area and the corresponding surface measure. In the present situation we can obtain the population density of each region dividing res09 by totsurf. > dens <- regio$res09/regio$totsurf > dens [1] 174.51754 38.94116 297.92732 408.30963 74.86272 265.82230 156.92708 [8] 196.08457 161.23051 105.75000 161.91232 326.98222 123.59246 72.28369 [15] 427.58088 210.69576 59.10739 133.20351 195.96993 69.36780 > stem(dens) The decimal point is 2 digit(s) to the right of the 0 46777 1 1236667 2 0017 3 03 4 13

4 THE UNIFORM DISTRIBUTION 10 Note that the decimal point does not appear in the stem-and-leaf, but the legend supplies the necessary information. In the present case, only the hundred and ten digits are retained and the data are rounded accordingly. The stems are the classes (0, 100), [100, 200), etc. The distribution is skewed on the left and unimodal with the highest frequency in the second class. To obtain the joint distribution of geographical area and density, we use the table function again. Note that variable dens is previously divided into classes with the function cut to avoid uninformative profileration of entries in the frequency table. > table(regio$area, cut(dens, breaks = c(0, 150, 300, 450), include.lowest = TRUE)) [0,150] (150,300] (300,450] C 1 2 1 N 2 5 1 S 5 2 1 The result suggests that northern and central regions concentrate in the middle density class, southern regions in the lowest class. 4 THE UNIFORM DISTRIBUTION Is there any pattern in the statististical distribution of decimal digits of real numbers? As an example we use the first 49 decimal digits of and π 3.1415926535897932384626433832795028841971693993751 e 2.7182818284590452353602874713526624977572470937000 The appropriate tools are the frequency distribution or the stem and leaf plot of the data. > dig_pi <- "3.1415926535897932384626433832795028841971693993751" > dig_pi <- unlist(strsplit(dig_pi, split = ""))[-c(1, 2)] > dig_pi [1] "1" "4" "1" "5" "9" "2" "6" "5" "3" "5" "8" "9" "7" "9" "3" "2" "3" "8" "4" [20] "6" "2" "6" "4" "3" "3" "8" "3" "2" "7" "9" "5" "0" "2" "8" "8" "4" "1" "9" [39] "7" "1" "6" "9" "3" "9" "9" "3" "7" "5" "1" > table(as.numeric(dig_pi)) 0 1 2 3 4 5 6 7 8 9 1 5 5 8 4 5 4 4 5 8 > stem(as.numeric(dig_pi)) The decimal point is at the 0 0 1 00000 2 00000 3 00000000 4 0000 5 00000 6 0000 7 0000 8 00000 9 00000000

5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION 11 > dig_e <- "2.7182818284590452353602874713526624977572470937000" > dig_e <- unlist(strsplit(dig_e, split = ""))[-c(1, 2)] > dig_e [1] "7" "1" "8" "2" "8" "1" "8" "2" "8" "4" "5" "9" "0" "4" "5" "2" "3" "5" "3" [20] "6" "0" "2" "8" "7" "4" "7" "1" "3" "5" "2" "6" "6" "2" "4" "9" "7" "7" "5" [39] "7" "2" "4" "7" "0" "9" "3" "7" "0" "0" "0" > table(as.numeric(dig_e)) 0 1 2 3 4 5 6 7 8 9 6 3 7 4 5 5 3 8 5 3 > stem(as.numeric(dig_e)) The decimal point is at the 0 000000 1 000 2 0000000 3 0000 4 00000 5 00000 6 000 7 00000000 8 00000 9 000 In both cases the distribution does not exhibit a clear mode, as in the unimodal situation, nor a clear increasing or decreasing pattern. A theoretical model could be a uniform distribution on the integers 0, 1, 2,..., 9 giving equal weight 1/10 to all digits. The discrepancies of the observed distribution from this model should depend on the low sample size. 5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION The age classes have different widths, hence density must be used, not frequency. We show the results in the table below. Italian Residents Non Italian Residents Veneto Italy Veneto Italy Age Class Width Freq. % Dens. % Freq. % Dens. % Freq. % Dens. % Freq. % Dens. % (0, 15) 15 13.9 0.93 14.1 0.94 21.0 1.40 19.1 1.27 [15, 30) 15 15.7 1.05 16.8 1.12 26.6 1.77 25.2 1.68 [30, 45) 15 25.5 1.70 24.0 1.60 38.4 2.56 38.7 2.58 [45, 65) 20 25.8 1.29 25.3 1.27 12.7 0.64 15.0 0.75 [65, 80) 15 14.1 0.94 14.6 0.97 1.2 0.08 1.8 0.12 [80, 110) 30 5.1 0.17 5.1 0.17 0.2 0 0.3 0 As expected, there are no very important differences between Veneto and Italy, both for italian and non italian residents. The age distribution of italian residents is unimodal, with the density peak in the class [30, 45). In contrast, the highest frequency is in the class [45, 65).

5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION 12 The age distribution of non italian residents is still unimodal, with the density peak in the class [30, 45), but here the concentration of the data in the modal class is much higher (the density is 2.58 against 1.60). The statistical tendency is clear: non italian residents are younger. With reference to Italy, 19.7% of italian residents have 65 years or more, against 2.1% of non italian residents. What is the explanation of this finding? Do you expect to be a permanent character or to change in the future? The four histograms are shown in the figure. > layout(matrix(1:4, 2, 2)) > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Veneto (Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(0.93, 1.05, 1.7, 1.29, 0.94, 0.17), col = "lavender") > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Italy (Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(0.94, 1.12, 1.6, 1.27, 0.97, 0.17), col = "lavender") > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Veneto (Non Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(1.4, 1.77, 2.56, 0.64, 0.08, 0), col = "lavender") > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Italy (Non Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(1.27, 1.68, 2.58, 0.75, 0.12, 0), col = "lavender")

5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION 13 Veneto (Italian Res.) Veneto (Non Italian Res.) Density (%) 0.0 1.0 2.0 Density (%) 0.0 1.0 2.0 0 20 40 60 80 100 0 20 40 60 80 100 Age (Years) Age (Years) Italy (Italian Res.) Italy (Non Italian Res.) Density (%) 0.0 1.0 2.0 Density (%) 0.0 1.0 2.0 0 20 40 60 80 100 0 20 40 60 80 100 Age (Years) Age (Years)