STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS
|
|
|
- Eunice Hensley
- 10 years ago
- Views:
Transcription
1 STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS Manuela Cattelan 1 ABC OF R 1.1 ARITHMETIC AND LOGICAL OPERATORS. VARIABLES AND AS- SIGNMENT OPERATOR R works just as a pocket calculator, when performing elementary computations. The arithmetic operators of addition, subtraction, multiplication, division and power are ADDITION SUBTRACTION MULTIPLICATION DIVISION POWER + - * / ^ When necessary, (round) parentheses can be used to force a given order of the computations. > /3 [1] 4 > ( )/3 [1] 2 > 2 * (10-20)^3 [1] > (0.2 * * * 7)/2 [1] 4.55 In most cases it is convenient to store constants or intermediate results into the computer memory for later use, in the same session. The corresponding areas of the computer memory are identified by names chosen by the user. The assignment operator <- is obtained by entering the < and - keys on the computer keyboard. > a <- 100 > b <- a * (5-2)/5 > b [1] 60 The operators for binary comparisons are The logical operators! (NOT ), & (AND) and (OR) are used to form complex logical expressions. Within R, the logical constants TRUE and FALSE are used to show that a given logical expression is true or false. Based on material prepared by Prof. Mario Romanazzi 1
2 1 ABC OF R 2 LOWER LOWER OR EQUAL GREATER GREATER OR EQUAL EQUAL NOT EQUAL < <= > >= ==! = > neg <- -1 > pos <- 10 > neg <= 0 [1] TRUE > neg * pos < 0 [1] TRUE > neg < 0 & pos > 0 [1] TRUE > pos >= 0 pos < 5 [1] TRUE 1.2 MATHEMATICAL FUNCTIONS The usual mathematical functions are available. Some common functions are listed below. FUNCTION Absolute value Square root Logarithm (natural, base e) Logarithm (base 10) Exponential Trigonometric: sine, cosine, tangent Factorial Binomial coefficient R NAME abs sqrt log log10 exp sin, cos, tan factorial choose The following results reflect the very definitions of the functions. > sqrt(100) [1] 10 > log10(10000) [1] 4 > 10^(log10(10000)) [1] > sin(pi/2) [1] 1 > cos(pi/2) [1] e-17 > factorial(5)
3 1 ABC OF R 3 [1] 120 > factorial(10) [1] > choose(10, 2) [1] USER DEFINED FUNCTIONS The user can define specific functions. For example, the following code defines two functions to compute the length of the circumference and the area of a circle from the length of the radius. > circ_l <- function(x) 2 * pi * x > circ_a <- function(x) pi * x^2 > circ_l(1) [1] > circ_a(0.5) [1] DATA STRUCTURES: VECTORS The most important data structure is the vector, an ordered collection of n 1 items of the same type (numerical, alphanumeric, logical). Note that a scalar is a numerical vector with just one element. There are several ways to define vectors. The most general one is through the c function (c means concatenate). The length function gives the size (number of components) of a vector. > itp_age <- c(71, 74, 63, 71, 66, 63, 82, 58, 74, 79, 81) > itp_name <- c("e. De Nicola", "L. Einaudi", "G. Gronchi", "A. Segni", + "G. Saragat", "G. Leone", "S. Pertini", "F. Cossiga", "O. L. Scalfaro", + "C. A. Ciampi", "G. Napolitano") > usp_age <- c(61, 63, 44, 55, 56, 61, 53, 70, 65, 47, 55, 47) > usp_name <- c("h. S. Truman", "D. D. Eisenhower", "J. F. Kennedy", + "L. B. Johnson", "R. Nixon", "G. Ford", "J. Carter", "R. Reagan", + "G. Bush", "B. Clinton", "G. W. Bush", "B. Obama") > length(itp_age) [1] 11 > length(itp_age) == length(itp_name) [1] TRUE Let x denote the vector (x 1, x 2,..., x n ). Special vectors can be defined by using the (colon) : operator (x i+1 x i = 1, i = 1,..., n 1) or the seq (x i+1 x i = c, i = 1,..., n 1, where c is a constant) and the rep (x i = c, i = 1,..., n, where c is a constant) functions (seq and rep are abbreviations of sequence and repeat, respectively). > -2:10 [1] > -2.5:10
4 1 ABC OF R 4 [1] > seq(1, 20, 2) [1] > seq(-5, 5, 0.5) [1] [16] > rep(0, 5) [1] > c(-0.5, 1:4, rep(5, 3)) [1] VECTOR OPERATIONS Vector components or subvectors are obtained by giving either their positions within the vector or a property they satisfy. In both cases, the subsetting operator [] is used. > itp_name[1] [1] "E. De Nicola" > itp_name[1:3] [1] "E. De Nicola" "L. Einaudi" "G. Gronchi" > itp_name[c(1, length(itp_name))] [1] "E. De Nicola" "G. Napolitano" > itp_name[itp_age < 60] [1] "F. Cossiga" > usp_name[usp_age > 40 & usp_age < 50] [1] "J. F. Kennedy" "B. Clinton" "B. Obama" In general, transformation of vectors by mathematical functions is performed component wise. Moreover, arithmetic operations involving several vectors require the vectors to have the same length. > v1 <- -2:5 > v2 <- 6:13 > abs(v1) [1] > v1 + v2 [1] > 2 * v1 - v2 [1]
5 2 BASIC STATISTICS 5 > 2 * v1-1 [1] To arrange the vector components from minimum to maximum, the sort function can be used. The functions min, max, which.min, which.max produce the minimum and maximum entries and the corresponding positions within the vector. > sort(itp_age) [1] > c(min(itp_age), max(itp_age)) [1] > c(which.min(itp_age), which.max(itp_age)) [1] 8 7 > sort(c("carla", "Francesco", "Paola", "Matteo", "Maria")) [1] "Carla" "Francesco" "Maria" "Matteo" "Paola" 2 BASIC STATISTICS We use the Presidents age data to illustrate how to produce a statistical report with R. We list in the following table some basic functions (both analytical and graphical). Note that they all have as basic argument the data vector. STATISTICAL FUNCTION Sample size Frequency table Stem-and-leaf Order statistic Basic location statistics Quantile Median Mean Variance (unbiased version) Standard deviation (unbiased version) Box-plot Histogram R NAME length table stem sort summary quantile median mean var sd boxplot hist The stem-and-leaf display shows the general features of the distribution: range, location, dispersion, shape, possible outliers. It is mainly useful with small sample sizes, as in the present case. > stem(itp_age, scale = 0.5) The decimal point is 1 digit(s) to the right of the > stem(usp_age)
6 2 BASIC STATISTICS 6 The decimal point is 1 digit(s) to the right of the The summary function, when applied to a numerical vector, gives basic statistics to evaluate location: minimum and maximum values, quartiles and mean. In computing empirical (sample) quantiles, R employs a more refined interpolation algorithm that that described in the textbook (to be used with hand computations). With small samples, discrepancies are observed. The range and the interquartile range are easily evaluated from these results. Another dispersion statistic is the standard deviation, to be used together with the sample mean. > summary(itp_age) Min. 1st Qu. Median Mean 3rd Qu. Max > summary(usp_age) Min. 1st Qu. Median Mean 3rd Qu. Max > sd(itp_age) [1] > sd(usp_age) [1] A very useful graphical comparison of the two samples is given by the paired box-plot. Whereas the stem-and-leaf displays the full order statistic, the box-plot displays only the quartiles and the extreme statistics. > boxplot(itp_age, usp_age, horizontal = TRUE, xlab = "Presidents' age (years)", + names = c("italy", "US"), col = "lavender", main = "Italy vs US, ")
7 3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS 7 Italy vs US, Italy US Presidents' age (years) Here, the most important feature is a location shift on the right of Italy with respect to US. The statistical tendency is that the italian Presidents are older than US Presidents: the difference between the medians is about 15.5 years. The dispersion does not seem very different (e.g, compare the standard deviations and the IQRs). 3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS The file regioni.txt contains the required information and can be used to simulate the collection of the data. First of all we input the file into the R system. > regio <- read.table(" + header = TRUE, na.strings = "NA") > str(regio) 'data.frame': 20 obs. of 7 variables: $ name : Factor w/ 20 levels "Abruzzo","Basilicata",..: $ area : Factor w/ 3 levels "C","N","S": $ capo : Factor w/ 20 levels "Ancona","Aosta",..: $ totsurf: int $ coast : int NA NA 346 NA NA NA... $ res09 : int $ car : num
8 3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS 8 > regio name area capo totsurf coast res09 car 1 Piemonte N Torino NA Valle d'aosta N Aosta 3263 NA Liguria N Genova Lombardia N Milano NA Trentino-Alto Adige N Trento NA Veneto N Venezia Friuli-Venezia Giulia N Trieste Emilia-Romagna N Bologna Toscana C Firenze Umbria C Perugia 8456 NA Marche C Ancona Lazio C Roma Abruzzo S L'Aquila Molise S Campobasso Campania S Napoli Puglia S Bari Basilicata S Potenza Calabria S Reggio Calabria Sicilia S Palermo Sardegna S Cagliari The structure is a data table with N = 20 rows (the regions, statistical units) and 7 columns (the variables) 1. name, name of region (identifier) 2. area, geographical area (stratification v.) 3. maint, region capital (categorical v.) 4. totsurf, region total surface area (numerical v., km 2 ) 5. coast, region total coast length (numerical v., km) 6. res09, total number of residents, as of 1/1/2009 (numerical v., count) 7. car, number of cars for 1000 residents, (numerical v.) We start data exploration with the study of the stratification variable, geographical area. The frequency distribution is obtained by the R function table and the corresponding graphical display with barplot. > N <- dim(regio)[1] > abs_f <- table(regio$area) > abs_f C N S > rel_f <- 100 * (table(regio$area)/n) > rel_f C N S
9 3 EXPLORATION OF DATA WITH FREQUENCY TABLES AND DISTRIBUTIONAL PLOTS 9 > barplot(rel_f, xlab = "Geographical Area", ylab = "Frequency (%)", + main = "Distribution of Regions According to Geographical Area", + col = "lavender") Distribution of Regions According to Geographical Area Frequency (%) C N S Geographical Area The density of the residents is the ratio of the total number of residents in a geographical area and the corresponding surface measure. In the present situation we can obtain the population density of each region dividing res09 by totsurf. > dens <- regio$res09/regio$totsurf > dens [1] [8] [15] > stem(dens) The decimal point is 2 digit(s) to the right of the
10 4 THE UNIFORM DISTRIBUTION 10 Note that the decimal point does not appear in the stem-and-leaf, but the legend supplies the necessary information. In the present case, only the hundred and ten digits are retained and the data are rounded accordingly. The stems are the classes (0, 100), [100, 200), etc. The distribution is skewed on the left and unimodal with the highest frequency in the second class. To obtain the joint distribution of geographical area and density, we use the table function again. Note that variable dens is previously divided into classes with the function cut to avoid uninformative profileration of entries in the frequency table. > table(regio$area, cut(dens, breaks = c(0, 150, 300, 450), include.lowest = TRUE)) [0,150] (150,300] (300,450] C N S The result suggests that northern and central regions concentrate in the middle density class, southern regions in the lowest class. 4 THE UNIFORM DISTRIBUTION Is there any pattern in the statististical distribution of decimal digits of real numbers? As an example we use the first 49 decimal digits of and π e The appropriate tools are the frequency distribution or the stem and leaf plot of the data. > dig_pi <- " " > dig_pi <- unlist(strsplit(dig_pi, split = ""))[-c(1, 2)] > dig_pi [1] "1" "4" "1" "5" "9" "2" "6" "5" "3" "5" "8" "9" "7" "9" "3" "2" "3" "8" "4" [20] "6" "2" "6" "4" "3" "3" "8" "3" "2" "7" "9" "5" "0" "2" "8" "8" "4" "1" "9" [39] "7" "1" "6" "9" "3" "9" "9" "3" "7" "5" "1" > table(as.numeric(dig_pi)) > stem(as.numeric(dig_pi)) The decimal point is at the
11 5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION 11 > dig_e <- " " > dig_e <- unlist(strsplit(dig_e, split = ""))[-c(1, 2)] > dig_e [1] "7" "1" "8" "2" "8" "1" "8" "2" "8" "4" "5" "9" "0" "4" "5" "2" "3" "5" "3" [20] "6" "0" "2" "8" "7" "4" "7" "1" "3" "5" "2" "6" "6" "2" "4" "9" "7" "7" "5" [39] "7" "2" "4" "7" "0" "9" "3" "7" "0" "0" "0" > table(as.numeric(dig_e)) > stem(as.numeric(dig_e)) The decimal point is at the In both cases the distribution does not exhibit a clear mode, as in the unimodal situation, nor a clear increasing or decreasing pattern. A theoretical model could be a uniform distribution on the integers 0, 1, 2,..., 9 giving equal weight 1/10 to all digits. The discrepancies of the observed distribution from this model should depend on the low sample size. 5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION The age classes have different widths, hence density must be used, not frequency. We show the results in the table below. Italian Residents Non Italian Residents Veneto Italy Veneto Italy Age Class Width Freq. % Dens. % Freq. % Dens. % Freq. % Dens. % Freq. % Dens. % (0, 15) [15, 30) [30, 45) [45, 65) [65, 80) [80, 110) As expected, there are no very important differences between Veneto and Italy, both for italian and non italian residents. The age distribution of italian residents is unimodal, with the density peak in the class [30, 45). In contrast, the highest frequency is in the class [45, 65).
12 5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION 12 The age distribution of non italian residents is still unimodal, with the density peak in the class [30, 45), but here the concentration of the data in the modal class is much higher (the density is 2.58 against 1.60). The statistical tendency is clear: non italian residents are younger. With reference to Italy, 19.7% of italian residents have 65 years or more, against 2.1% of non italian residents. What is the explanation of this finding? Do you expect to be a permanent character or to change in the future? The four histograms are shown in the figure. > layout(matrix(1:4, 2, 2)) > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Veneto (Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(0.93, 1.05, 1.7, 1.29, 0.94, 0.17), col = "lavender") > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Italy (Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(0.94, 1.12, 1.6, 1.27, 0.97, 0.17), col = "lavender") > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Veneto (Non Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(1.4, 1.77, 2.56, 0.64, 0.08, 0), col = "lavender") > plot(0, 0, type = "p", xlab = "Age (Years)", ylab = "Density (%)", + main = "Italy (Non Italian Res.)", xlim = c(0, 115), ylim = c(0, + 2.6)) > rect(c(0, 15, 30, 45, 65, 80), c(0, 0, 0, 0, 0, 0), c(15, 30, + 45, 65, 80, 110), c(1.27, 1.68, 2.58, 0.75, 0.12, 0), col = "lavender")
13 5 ITALIAN VS NON ITALIAN AGE DISTRIBUTION 13 Veneto (Italian Res.) Veneto (Non Italian Res.) Density (%) Density (%) Age (Years) Age (Years) Italy (Italian Res.) Italy (Non Italian Res.) Density (%) Density (%) Age (Years) Age (Years)
Demographic indicators
February 19, 2016 Demographic indicators Estimates for the year 2015 In 2015 the resident population decreased by 2.3 per thousand inhabitants. The reduction corresponds to 139,000 units less, resulting
The drive-test campaigns in Italy
Assessment of QoS in mobile networks The drive-test campaigns in Italy Andrea Neri (FUB) Arianna Rufini (FUB) Mariano Baldi (AGCOM) ITU Regional Workshop for Europe - New Issues in Quality of Service Measuring
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
Delegated to CNR on December 23rd, 1987. New synchronous registration system from September 28 th, 2009
The.it Registry: a short overview Delegated to CNR on December 23rd, 1987 More than 1,860,000 domain names New synchronous registration system from September 28 th, 2009 Coexistence of the two systems
ASSOBIOMEDICA AND BIOMEDICAL START-UPS. Vera Codazzi, Ph.D.
ASSOBIOMEDICA AND BIOMEDICAL START-UPS Vera Codazzi, Ph.D. Startup Biomed FORUM Torino, 26-27 febbraio 2015 2 Towards innovation in the medical device sector NANO-TECHNOLOGIES CHEMISTRY ELECTRONICS MATERIALS
Regione Provincia Distretto Abruzzo Chieti DISTRETTO 009 Abruzzo Chieti DISTRETTO 010 Abruzzo Chieti DISTRETTO 011 Abruzzo Chieti DISTRETTO 015
Regione Provincia Distretto Abruzzo Chieti DISTRETTO 009 Abruzzo Chieti DISTRETTO 010 Abruzzo Chieti DISTRETTO 011 Abruzzo Chieti DISTRETTO 015 Chieti Conteggio 4 Abruzzo L'Aquila DISTRETTO 001 Abruzzo
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Variables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
Graphics in R. Biostatistics 615/815
Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions
Descriptive Statistics
Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web
Traineeships Regulation in Italy after the Fornero Labour Market Reform
Executive Summary Traineeships Regulation in Italy after the Fornero Labour Market Reform www.bollettinoadapt.it Internships in Italy have new rules since the 2012 Fornero labour market reform that changed
Lecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel [email protected] Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
5,000,000,000 Covered Bond Programme
Data as of 30 June 2015 5,000,000,000 Covered Bond Programme Please note being this report as of June 2015, 30 it does not include: - the V Series new issuance (Eur 750ml, IT0005120198), effective from
Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.
Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data
AN OVERVIEW OF THE UK OUTBOUND MARKET
AN OVERVIEW OF THE UK OUTBOUND MARKET 2014 2015 (Provisional data - Quarter 1) The latest data published by the Office for National Statistics (ONS) indicates that in the period Jan- March 2015 the overall
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics
Getting Started with R and RStudio 1
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
SOME EXCEL FORMULAS AND FUNCTIONS
SOME EXCEL FORMULAS AND FUNCTIONS About calculation operators Operators specify the type of calculation that you want to perform on the elements of a formula. Microsoft Excel includes four different types
Strategic Development of Crédit Agricole S.A. in Italy October 12, 2006
Strategic Development of Crédit Agricole S.A. in Italy October 2, 2006 - Mars 2006 . Summary Terms 2. Crédit Agricole S.A. in Italy 3. Overview of Acquired Networks 4. Industrial Project 5. Concluding
RESULTS OF THE NATIONAL SURVEY ON RADON INDOORS IN ALL THE 21 ITALIAN REGIONS
Radon in the Living Environment, 122 RESULTS OF THE NATIONAL SURVEY ON RADON INDOORS IN ALL THE 21 ITALIAN REGIONS F. Bochicchio *, G.Campos Venuti *, S.Piermattei ^, G.Torri ^, C.Nuccetelli *, S.Risica
HOSPICE (AND PALLIATIVE CARE NETWORK) IN ITALY AN UNMPREDICTABLE GROWTH
HOSPICE (AND PALLIATIVE CARE NETWORK) IN ITALY AN UNMPREDICTABLE GROWTH Furio Zucco Past President Italian Society for Palliative Care Past President Italian Federation for Palliative Care Director of
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
CONTENTS THE ITALIAN REVENUE AGENCY THE GOVERNANCE CENTRAL ORGANISATION CENTRAL DIRECTORATES FUNCTIONS STAFF OFFICES FUNCTIONS
CONTENTS THE ITALIAN REVENUE AGENCY 2 THE GOVERNANCE 3 CENTRAL ORGANISATION 4 CENTRAL DIRECTORATES FUNCTIONS 5 STAFF OFFICES FUNCTIONS 6 TERRITORIAL ORGANISATION 7 FINANCIAL AND HUMAN RESOURCES 9 FIGURES
BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
February 2013. Monitor of Bankruptcies, Insolvency Proceedings and Business Closures FourthQuarter 2012
February 213 Monitor of Bankruptcies, Insolvency Proceedings and Business Closures FourthQuarter 212 Total number of business closures breaks the 1k mark in 212 The most bankruptcies in over a decade Summary
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
Viewing Ecological data using R graphics
Biostatistics Illustrations in Viewing Ecological data using R graphics A.B. Dufour & N. Pettorelli April 9, 2009 Presentation of the principal graphics dealing with discrete or continuous variables. Course
Big Ideas in Mathematics
Big Ideas in Mathematics which are important to all mathematics learning. (Adapted from the NCTM Curriculum Focal Points, 2006) The Mathematics Big Ideas are organized using the PA Mathematics Standards
Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
Descriptive Statistics: Summary Statistics
Tutorial for the integration of the software R with introductory statistics Copyright Grethe Hystad Chapter 2 Descriptive Statistics: Summary Statistics In this chapter we will discuss the following topics:
a. mean b. interquartile range c. range d. median
3. Since 4. The HOMEWORK 3 Due: Feb.3 1. A set of data are put in numerical order, and a statistic is calculated that divides the data set into two equal parts with one part below it and the other part
How To Write A Data Analysis
Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction
Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs
Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)
Biggar High School Mathematics Department. National 5 Learning Intentions & Success Criteria: Assessing My Progress
Biggar High School Mathematics Department National 5 Learning Intentions & Success Criteria: Assessing My Progress Expressions & Formulae Topic Learning Intention Success Criteria I understand this Approximation
PRESENTATION ITALIAN PRISON SYSTEM. Italia, LPPS 11-14 November 2010
PRESENTATION ITALIAN PRISON SYSTEM Italia, LPPS 11-14 November 2010 The bodies ORGANIZATION CHART OF PRISON SYSTEM MINISTRY OF JUSTICE (1) DEPARTMENT OF PRISON SYSTEM ADMINISTRATION (1) REGIONAL DEPARTMENTS
Getting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
Fiscal federalism in Italy at a glance
Fiscal federalism in Italy at a glance Distribution of fiscal revenues (2006) Local Governments 7% Distribution of public expenditure (2006) Local Governments 19% Regions 24% State 53% Source: Region of
AP * Statistics Review. Descriptive Statistics
AP * Statistics Review Descriptive Statistics Teacher Packet Advanced Placement and AP are registered trademark of the College Entrance Examination Board. The College Board was not involved in the production
Module 4: Data Exploration
Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive
Best regards President of Italian Dance Sport Federation Christian Zamblera
To all IDO Members Rome, 08/04/2014 SUBJECT: INVITATION DANCE IT!!! WORLD CHAMPIONSHIP DISCO DANCE & DISCO FREESTYLE Dear IDO Friends, Welcome to DANCE IT!!! IDO World Championship Disco Dance & Disco
Summarizing and Displaying Categorical Data
Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency
Week 1. Exploratory Data Analysis
Week 1 Exploratory Data Analysis Practicalities This course ST903 has students from both the MSc in Financial Mathematics and the MSc in Statistics. Two lectures and one seminar/tutorial per week. Exam
SCIENTIFIC CALCULATOR OPERATION GUIDE. <Write View>
SCIENTIFIC CALCULATOR OPERATION GUIDE CONTENTS HOW TO OPERATE Read Before Using Key layout 2 Reset switch/ pattern 3 format and decimal setting function 3-4 Exponent display 4 Angular unit
MUNICIPAL SOLID WASTE MANAGEMENT IN ITALY
MUNICIPAL SOLID WASTE MANAGEMENT IN ITALY L. Rigamonti DIIAR Environmental Section - Politecnico of Milan (Italy) (Sept.-Nov. 2006 Visiting Scholar WTERT, Columbia University; Advisor: Prof. N.J. Themelis)
The Australian Curriculum Mathematics
The Australian Curriculum Mathematics Mathematics ACARA The Australian Curriculum Number Algebra Number place value Fractions decimals Real numbers Foundation Year Year 1 Year 2 Year 3 Year 4 Year 5 Year
Telecom Italia Portfolio Beni Stabili Investor Day
Telecom Italia Portfolio Beni Stabili Investor Day Milan, June 6 th 2012 Core Office Portfolio / Beni Stabili Investor Day / Milan, June 6 th 2012 ALEXANDRE ASTIER Business Development & Telecom Italia
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
A Correlation of. to the. South Carolina Data Analysis and Probability Standards
A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards
PRE-CALCULUS GRADE 12
PRE-CALCULUS GRADE 12 [C] Communication Trigonometry General Outcome: Develop trigonometric reasoning. A1. Demonstrate an understanding of angles in standard position, expressed in degrees and radians.
sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted
Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
Math Course Descriptions & Student Learning Outcomes
Math Course Descriptions & Student Learning Outcomes Table of Contents MAC 100: Business Math... 1 MAC 101: Technical Math... 3 MA 090: Basic Math... 4 MA 095: Introductory Algebra... 5 MA 098: Intermediate
3: Summary Statistics
3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes
Exploratory Data Analysis. Psychology 3256
Exploratory Data Analysis Psychology 3256 1 Introduction If you are going to find out anything about a data set you must first understand the data Basically getting a feel for you numbers Easier to find
Rifiuti tra crescita e decrescita
Rifiuti tra crescita e decrescita Giacomo D Alisa Università di Roma, Sapienza Università Autonoma di Barcellona Rifiuti tra crescita e decrescita L insostenibilità della crescita Il metabolismo dei rifiuti
2 Describing, Exploring, and
2 Describing, Exploring, and Comparing Data This chapter introduces the graphical plotting and summary statistics capabilities of the TI- 83 Plus. First row keys like \ R (67$73/276 are used to obtain
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)
Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence
Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
PCHS ALGEBRA PLACEMENT TEST
MATHEMATICS Students must pass all math courses with a C or better to advance to the next math level. Only classes passed with a C or better will count towards meeting college entrance requirements. If
Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds
Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative
EXPLORING SPATIAL PATTERNS IN YOUR DATA
EXPLORING SPATIAL PATTERNS IN YOUR DATA OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze
Mathematics. GCSE subject content and assessment objectives
Mathematics GCSE subject content and assessment objectives June 2013 Contents Introduction 3 Subject content 4 Assessment objectives 11 Appendix: Mathematical formulae 12 2 Introduction GCSE subject criteria
KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007
KEANSBURG HIGH SCHOOL Mathematics Department HSPA 10 Curriculum September 2007 Written by: Karen Egan Mathematics Supervisor: Ann Gagliardi 7 days Sample and Display Data (Chapter 1 pp. 4-47) Surveys and
FX 260 Training guide. FX 260 Solar Scientific Calculator Overhead OH 260. Applicable activities
Tools Handouts FX 260 Solar Scientific Calculator Overhead OH 260 Applicable activities Key Points/ Overview Basic scientific calculator Solar powered Ability to fix decimal places Backspace key to fix
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Gruppo Intesa Network
Gruppo Intesa Network Branches in Italy broken down by Bank and by Region (Up-dated as at December 2000) Ambroveneto (1) Cariplo BCI Carime (2) FriulAdria Cariparma Other Group banks (3) Piemonte 37 85
Mathematics programmes of study: key stage 4. National curriculum in England
Mathematics programmes of study: key stage 4 National curriculum in England July 2014 Contents Purpose of study 3 Aims 3 Information and communication technology (ICT) 4 Spoken language 4 Working mathematically
How Does My TI-84 Do That
How Does My TI-84 Do That A guide to using the TI-84 for statistics Austin Peay State University Clarksville, Tennessee How Does My TI-84 Do That A guide to using the TI-84 for statistics Table of Contents
MATLAB Basics MATLAB numbers and numeric formats
MATLAB Basics MATLAB numbers and numeric formats All numerical variables are stored in MATLAB in double precision floating-point form. (In fact it is possible to force some variables to be of other types
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Dongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
R: A self-learn tutorial
R: A self-learn tutorial 1 Introduction R is a software language for carrying out complicated (and simple) statistical analyses. It includes routines for data summary and exploration, graphical presentation
FX 115 MS Training guide. FX 115 MS Calculator. Applicable activities. Quick Reference Guide (inside the calculator cover)
Tools FX 115 MS Calculator Handouts Other materials Applicable activities Quick Reference Guide (inside the calculator cover) Key Points/ Overview Advanced scientific calculator Two line display VPAM to
Core Maths C2. Revision Notes
Core Maths C Revision Notes November 0 Core Maths C Algebra... Polnomials: +,,,.... Factorising... Long division... Remainder theorem... Factor theorem... 4 Choosing a suitable factor... 5 Cubic equations...
Geostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras [email protected]
HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Lecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!
STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.
R Language Fundamentals
R Language Fundamentals Data Types and Basic Maniuplation Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Where did R come from? Overview Atomic Vectors Subsetting
Milan: a rich region Province of Milan = 3.7 million inhabitants Italy s richest urban agglomeration & one of the wealthy OECD metro regions
The case of Milan Milan: a rich region Province of Milan = 3.7 million inhabitants Italy s richest urban agglomeration & one of the wealthy OECD metro regions 35 34.08 Milan Province 33 31 Highest GDP
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics
International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics Lecturer: Mikhail Zhitlukhin. 1. Course description Probability Theory and Introductory Statistics
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Descriptive statistics parameters: Measures of centrality
Descriptive statistics parameters: Measures of centrality Contents Definitions... 3 Classification of descriptive statistics parameters... 4 More about central tendency estimators... 5 Relationship between
BIOL 933 Lab 6 Fall 2015. Data Transformation
BIOL 933 Lab 6 Fall 2015 Data Transformation Transformations in R General overview Log transformation Power transformation The pitfalls of interpreting interactions in transformed data Transformations
Display Format To change the exponential display format, press the [MODE] key 3 times.
Tools FX 300 MS Calculator Overhead OH 300 MS Handouts Other materials Applicable activities Activities for the Classroom FX-300 Scientific Calculator Quick Reference Guide (inside the calculator cover)
2. Filling Data Gaps, Data validation & Descriptive Statistics
2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)
