Chapter 1 Exploratory data analysis

Size: px
Start display at page:

Download "Chapter 1 Exploratory data analysis"

Transcription

1 Chapter 1 Exploratory data analysis Xavier Gendre M2 SE X. Gendre (M2 SE) Data Mining 1 / 33

2 cars$dist ## [1] ## [18] ## [35] summary(cars$dist) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## X. Gendre (M2 SE) Data Mining 2 / 33

3 p <- (0:100)/100 qp <- quantile(cars$dist, probs = p) plot(p, qp, type = "l", col = "red") qp 60 Third quartile Median First quartile p X. Gendre (M2 SE) Data Mining 3 / 33

4 boxplot(cars$dist, horizontal = TRUE, col = "orange") X. Gendre (M2 SE) Data Mining 4 / 33

5 hist(cars$dist, breaks = 20 * (-1:7), freq = F, col = "orange", ylab = "") Histogram of cars$dist cars$dist X. Gendre (M2 SE) Data Mining 5 / 33

6 plot(cars$speed, cars$dist, pch = 4) cars$dist cars$speed X. Gendre (M2 SE) Data Mining 6 / 33

7 cor(cars$speed, cars$dist) ## [1] a <- cov(cars$speed, cars$dist)/var(cars$speed) b <- mean(cars$dist) - a * mean(cars$speed) X. Gendre (M2 SE) Data Mining 7 / 33

8 abline(b, a, col = "red") cars$dist cars$speed X. Gendre (M2 SE) Data Mining 8 / 33

9 Grades data X ## Math Phys Fr Eng ## Benny ## Bobby ## Brandy ## Coby ## Daisy ## Emily ## Judy ## Marty ## Sandy Xbar <- scale(x, scale = F) Sigma <- (t(xbar) %*% Xbar)/9 ## Math Phys Fr Eng ## Math ## Phys ## Fr ## Eng X. Gendre (M2 SE) Data Mining 9 / 33

10 boxplot(xbar, col = "orange") Math Phys Fr Eng X. Gendre (M2 SE) Data Mining 10 / 33

11 dg <- eigen(sigma) dg$values ## [1] cumsum(dg$values)/sum(dg$values) ## [1] X. Gendre (M2 SE) Data Mining 11 / 33

12 plot(1:4, dg$values, type = "b", col = "orange") dg$values :4 X. Gendre (M2 SE) Data Mining 12 / 33

13 Skyrim bows data X ## Weight Value Damage Speed ## Long Bow ## Hunting Bow ## Orcish Bow ## Nord Hero Bow ## Dwarven Bow ## Elven Bow ## Glass Bow ## Ebony Bow ## Daedric Bow ## Dragonbone Bow ## Crossbow ## Enhanced Crossbow ## Dwarven Crossbow ## Enhanced Dwarven Crossbow X. Gendre (M2 SE) Data Mining 13 / 33

14 Xbar <- scale(x, scale = F) Sigma <- (t(xbar) %*% Xbar)/nrow(X) ## Weight Value Damage Speed ## Weight ## Value ## Damage ## Speed M <- diag(1/diag(sigma)) ## Weight Value Damage Speed ## Weight e ## Value e ## Damage e ## Speed e X. Gendre (M2 SE) Data Mining 14 / 33

15 boxplot(xbar, col = "orange") boxplot(xbar %*% sqrt(m), col = "blue") Weight Value Damage Speed Weight Value Damage Speed X. Gendre (M2 SE) Data Mining 15 / 33

16 Mhalf <- diag(1/sqrt(diag(sigma))) dg <- eigen(mhalf %*% Sigma %*% Mhalf) dg$vectors <- diag(sqrt(diag(sigma))) %*% dg$vectors dg$values ## [1] cumsum(dg$values)/sum(dg$values) ## [1] X. Gendre (M2 SE) Data Mining 16 / 33

17 plot(1:4, dg$values, type = "b", col = "orange") dg$values :4 X. Gendre (M2 SE) Data Mining 17 / 33

18 Grades data C <- Xbar %*% dg$vectors cos2 <- rowsums(c[, 1:2]^2)/rowSums(C^2) ## Benny Bobby Brandy Coby Daisy Emily Judy Marty Sandy ## rbind(c[, 1]^2/(nrow(X) * dg$values[1]), C[, 2]^2/(nrow(X) * dg$values[2])) ## Benny Bobby Brandy Coby Daisy Emily Judy Marty ## [1,] ## [2,] ## Sandy ## [1,] ## [2,] X. Gendre (M2 SE) Data Mining 18 / 33

19 plot(c[, 1:2], pch = 2, cex = cos2, col = "orange", xlab = "", ylab = "") Emily 4 Marty 2 Daisy Benny Bobby 0 Coby -2 Sandy Brandy -4-6 Judy X. Gendre (M2 SE) Data Mining 19 / 33

20 Skyrim bows data C <- Xbar %*% M %*% dg$vectors cos2 <- rowsums(c[, 1:2]^2)/rowSums(C^2) ## Long Bow Hunting Bow ## ## Orcish Bow Nord Hero Bow ## ## Dwarven Bow Elven Bow ## ## Glass Bow Ebony Bow ## ## Daedric Bow Dragonbone Bow ## ## Crossbow Enhanced Crossbow ## ## Dwarven Crossbow Enhanced Dwarven Crossbow ## X. Gendre (M2 SE) Data Mining 20 / 33

21 plot(c[, 1:2], pch = 2, cex = cos2, col = "orange", xlab = "", ylab = "") Daedric Bow Dragonbone Bow Ebony Bow Glass Bow Elven Bow Dwarven Bow Orcish Bow Nord Hero Bow Hunting Bow Long Bow Enhanced Dwarven Crossbow Crossbow Enhanced Crossbow X. Gendre (M2 SE) Data Mining 21 / 33

22 Grades data Rho <- diag(1/sqrt(diag(sigma))) %*% dg$vectors %*% diag(sqrt(dg$values)) Rho[, 1:2] ## [,1] [,2] ## Math ## Phys ## Fr ## Eng rowsums(rho[, 1:2]^2) ## Math Phys Fr Eng ## X. Gendre (M2 SE) Data Mining 22 / 33

23 DrawCorCircle(Rho) Phys Math Eng Fr X. Gendre (M2 SE) Data Mining 23 / 33

24 DrawBiplot(C, Rho) Math Phys Marty Emily Coby Daisy Bobby Benny Eng Fr Sandy Brandy Judy X. Gendre (M2 SE) Data Mining 24 / 33

25 Skyrim bows data Rho <- diag(1/sqrt(diag(sigma))) %*% dg$vectors %*% diag(sqrt(dg$values)) rownames(rho) <- colnames(x) Rho[, 1:2] ## [,1] [,2] ## Weight ## Value ## Damage ## Speed rowsums(rho[, 1:2]^2) ## Weight Value Damage Speed ## X. Gendre (M2 SE) Data Mining 25 / 33

26 DrawCorCircle(Rho) Value Weight Damage -1.0 Speed X. Gendre (M2 SE) Data Mining 26 / 33

27 DrawBiplot(C, Rho) Ebony Bow 1 Value 0.5 Glass Bow Elven Bow Dwarven Bow Orcish Bow Dragonbone Bow Nord Hero Hunting Bow Long Bow Bow 0 Daedric Bow 0-1 Weight Damage -0.5 Crossbow Enhanced Crossbow -2 Enhanced Dwarven Crossbow Speed X. Gendre (M2 SE) Data Mining 27 / 33

28 Bourdieu data DR SCE LET SC MD PH PD IUT Total EAG SAG PT PLCS CM EMP OUV OTH Total X. Gendre (M2 SE) Data Mining 28 / 33

29 D1 <- diag(1/rowsums(t)) P1 <- D1 %*% T D2 <- diag(1/colsums(t)) P2 <- D2 %*% t(t) PCA of line profiles: M1half <- diag(sqrt(n/colsums(t))) M1halfInv <- diag(sqrt(colsums(t)/n)) dg <- eigen(m1half %*% t(p1) %*% t(p2) %*% M1halfInv) dg$values <- dg$values[2:8] # Avoid the trivial eigenvlue dg$vectors <- (M1halfInv %*% dg$vectors)[, 2:8] # Idem C1 <- P1 %*% (n * D2) %*% dg$vectors Transition formula: C2 <- P2 %*% C1 %*% diag(1/sqrt(dg$values)) X. Gendre (M2 SE) Data Mining 29 / 33

30 # Representation quality of the x's values rowsums(c1[, 1:2]^2)/rowSums(C1^2) ## EAG SAG PT PLCS CM EMP OUV OTH ## # Representation quality of the y's values rowsums(c2[, 1:2]^2)/rowSums(C2^2) ## DR SCE LET SC MD PH PD IUT ## cumsum(dg$values)/sum(dg$values) ## [1] X. Gendre (M2 SE) Data Mining 30 / 33

31 plot(dg$values, type = "b", col = "orange", xlab = "", ylab = "") X. Gendre (M2 SE) Data Mining 31 / 33

32 DrawCA(C1, C2) SAG OUV OTH LET PD EMP CM SC DR SCE PT PLCS MD IUT PH EAG X. Gendre (M2 SE) Data Mining 32 / 33

33 Bibliography The Elements of Statistical Learning : Data Mining, Inference and Prediction, J. Friedman, T. Hastie and R. Tibshirani (2009) Principal Component Analysis, I.T. Jolliffe (2002) Wiki Stat, X. Gendre (M2 SE) Data Mining 33 / 33

Graphics in R. Biostatistics 615/815

Graphics in R. Biostatistics 615/815 Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions

More information

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted

sample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations

More information

Viewing Ecological data using R graphics

Viewing Ecological data using R graphics Biostatistics Illustrations in Viewing Ecological data using R graphics A.B. Dufour & N. Pettorelli April 9, 2009 Presentation of the principal graphics dealing with discrete or continuous variables. Course

More information

2. Filling Data Gaps, Data validation & Descriptive Statistics

2. Filling Data Gaps, Data validation & Descriptive Statistics 2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

R Graphics II: Graphics for Exploratory Data Analysis

R Graphics II: Graphics for Exploratory Data Analysis UCLA Department of Statistics Statistical Consulting Center Irina Kukuyeva ikukuyeva@stat.ucla.edu April 26, 2010 Outline 1 Summary Plots 2 Time Series Plots 3 Geographical Plots 4 3D Plots 5 Simulation

More information

Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Exploratory Data Analysis

Exploratory Data Analysis Goals of EDA Relationship between mean response and covariates (including time). Variance, correlation structure, individual-level heterogeneity. Guidelines for graphical displays of longitudinal data

More information

Computer Statistics with R

Computer Statistics with R MAREK GAGOLEWSKI KONSTANCJA BOBECKA-WESO LOWSKA PRZEMYS LAW GRZEGORZEWSKI Computer Statistics with R 2. Exploratory Data Analysis (Descriptive Statistics) Faculty of Mathematics and Information Science

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

10 20 30 40 50 60 Mark. Use this information and the cumulative frequency graph to draw a box plot showing information about the students marks.

10 20 30 40 50 60 Mark. Use this information and the cumulative frequency graph to draw a box plot showing information about the students marks. GCSE Exam Questions on Frequency (Grade B) 1. 200 students took a test. The cumulative graph gives information about their marks. 200 160 120 80 0 10 20 30 50 60 Mark The lowest mark scored in the test

More information

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

consider the number of math classes taken by math 150 students. how can we represent the results in one number? ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

More information

Lab 13: Logistic Regression

Lab 13: Logistic Regression Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account

More information

Using Open Source Software to Teach Mathematical Statistics p.1/29

Using Open Source Software to Teach Mathematical Statistics p.1/29 Using Open Source Software to Teach Mathematical Statistics Douglas M. Bates bates@r-project.org University of Wisconsin Madison Using Open Source Software to Teach Mathematical Statistics p.1/29 Outline

More information

Cluster Analysis using R

Cluster Analysis using R Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other

More information

Exploratory Data Analyses

Exploratory Data Analyses 5 Exploratory Data Analyses 5.1 Introduction What do time series data look like? The purpose of this chapter is to provide a number of different answers to this question. In addition, we outline the rudiments

More information

NMSA230: Software for Mathematics and Stochastics Sweave Example file

NMSA230: Software for Mathematics and Stochastics Sweave Example file NMSA230: Software for Mathematics and Stochastics Sweave Example file 1 Some Sweave examples This document was prepared using Sweave (Leisch, 2002) in R (R Core Team, 2015), version 3.2.0 (2015-04-16).

More information

3, 8, 8, 6, 4, 2, 8, 9, 4, 5, 1, 5, 7, 8, 9. 3 kg, 11 kg, 5 kg, 20 kg, 11 kg,

3, 8, 8, 6, 4, 2, 8, 9, 4, 5, 1, 5, 7, 8, 9. 3 kg, 11 kg, 5 kg, 20 kg, 11 kg, GCSE Exam Questions on Averages Grade E Questions 1. Work out the median of these 15 numbers. 3, 8, 8, 6, 4, 2, 8, 9, 4, 5, 1, 5, 7, 8, 9. (Total 2 marks) 2. Five boxes are weighed. Their weights are given

More information

Outline. Dispersion Bush lupine survival Quasi-Binomial family

Outline. Dispersion Bush lupine survival Quasi-Binomial family Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the

More information

Risk and return (1) Class 9 Financial Management, 15.414

Risk and return (1) Class 9 Financial Management, 15.414 Risk and return (1) Class 9 Financial Management, 15.414 Today Risk and return Statistics review Introduction to stock price behavior Reading Brealey and Myers, Chapter 7, p. 153 165 Road map Part 1. Valuation

More information

Chapter 23 Inferences About Means

Chapter 23 Inferences About Means Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute

More information

Control de calidad. Felipe de Mendiburu. Second sample Calibration data in D[trial] New data in D[!trial] First samples UCL. Group summary statistics

Control de calidad. Felipe de Mendiburu. Second sample Calibration data in D[trial] New data in D[!trial] First samples UCL. Group summary statistics Control de calidad con First samples Second sample Calibration data in D[trial] New data in D[!trial] summary statistics 0.0 0.1 0.2 0.3 0.4 0.5 Felipe de Mendiburu 1 4 7 11 15 19 23 27 31 39 47 55 63

More information

Math 108 Exam 3 Solutions Spring 00

Math 108 Exam 3 Solutions Spring 00 Math 108 Exam 3 Solutions Spring 00 1. An ecologist studying acid rain takes measurements of the ph in 12 randomly selected Adirondack lakes. The results are as follows: 3.0 6.5 5.0 4.2 5.5 4.7 3.4 6.8

More information

Lecture 5 : The Poisson Distribution

Lecture 5 : The Poisson Distribution Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,

More information

1 J (Gr 6): Summarize and describe distributions.

1 J (Gr 6): Summarize and describe distributions. MAT.07.PT.4.TRVLT.A.299 Sample Item ID: MAT.07.PT.4.TRVLT.A.299 Title: Travel Time to Work (TRVLT) Grade: 07 Primary Claim: Claim 4: Modeling and Data Analysis Students can analyze complex, real-world

More information

STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS

STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS Manuela Cattelan 1 ABC OF R 1.1 ARITHMETIC AND LOGICAL OPERATORS. VARIABLES AND AS- SIGNMENT OPERATOR R works just as a pocket calculator,

More information

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining)

Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Patrick E. Hoffman Sc.D. Senior Principal Analytical Consultant pat.hoffman@oracle.com Agenda (Oracle & R Analysis) Tools Loading

More information

WEEK #22: PDFs and CDFs, Measures of Center and Spread

WEEK #22: PDFs and CDFs, Measures of Center and Spread WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

Statistics with the TI-86

Statistics with the TI-86 Statistics with the TI-86 The TI-86 Manual, besides coming with your calculator, can be downloaded or read in a browser in Adobe Acrobat Reader form from http://education.ti.com/product/pdf/gb/ti86book.pdf.

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9

More information

Introduction to R and Exploratory data analysis

Introduction to R and Exploratory data analysis Introduction to R and Exploratory data analysis Gavin Simpson November 2006 Summary In this practical class we will introduce you to working with R. You will complete an introductory session with R and

More information

Analysing equity portfolios in R

Analysing equity portfolios in R Analysing equity portfolios in R Using the portfolio package by David Kane and Jeff Enos Introduction 1 R is used by major financial institutions around the world to manage billions of dollars in equity

More information

Bellwork Students will review their study guide for their test. Box-and-Whisker Plots will be discussed after the test.

Bellwork Students will review their study guide for their test. Box-and-Whisker Plots will be discussed after the test. Course: 7 th Grade Math Student Objective (Obj. 5c) TSW graph and interpret data in a box-and-whisker plot. DETAIL LESSON PLAN Friday, March 23 / Monday, March 26 Lesson 1-10 Box-and-Whisker Plot (Textbook

More information

A survey analysis example

A survey analysis example A survey analysis example Thomas Lumley November 20, 2013 This document provides a simple example analysis of a survey data set, a subsample from the California Academic Performance Index, an annual set

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media

ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media Abstract: The growth of social media is astounding and part of that success was

More information

Parallel System This is a system that will fail only if they all fail.

Parallel System This is a system that will fail only if they all fail. 1 8 RELIABILITY Systems Reliability A system consists of components which determine whether or not it will work. There are various types of configurations of the components in different systems. Series

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Alternative Graphics System Lattice

Alternative Graphics System Lattice Alternative Graphics System Lattice Lattice/trellis is another high-level graphics system that makes many complex things easy, but annotating plots can be initially complex. This material is optional.

More information

Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log

Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Instructor: Eakta Jain CIS 6930, Research Methods for Human-centered Computing Scribe: Chris(Yunhao) Wan, UFID: 1677-3116

More information

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research

More information

Linda Staub & Alexandros Gekenidis

Linda Staub & Alexandros Gekenidis Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until

More information

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED

Milk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

Math Mammoth End-of-the-Year Test, Grade 6, Answer Key

Math Mammoth End-of-the-Year Test, Grade 6, Answer Key Math Mammoth End-of-the-Year Test, Grade 6, Answer Key Instructions to the teacher: In order to continue with the Math Mammoth Grade 7 Complete Worktext, I recommend that the student score a minimum of

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

LINDEN PARK PH. 11 PROJECT NO. 53B22353 SHEET NO. D-5 DETAIL TYPICAL SECTION CITY OF MADISON SIDEWALK COMPLETION DETAILS ROW VAR. 9"-12" ORIGINATOR: CITY OF MADISON, STREETS DIVISION REV. DATE: PLOT NAME:

More information

Lecture 2: Exploratory Data Analysis with R

Lecture 2: Exploratory Data Analysis with R Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework

More information

Quantitative Software Management

Quantitative Software Management Quantitative Software Management The Impact of Team Size on Software Project Productivity Donald M. Beckett QSM, Inc. 2000 Corporate Ridge, Suite 900 McLean, VA 22102 (360) 697-2640, fax: (703) 749-3795

More information

What is a Box and Whisker Plot?

What is a Box and Whisker Plot? Algebra/Geometry Institute Summer 2006 Faculty Name: Archie Mitchell School: Walter C. Robinson Achievement Center (Cleveland, Ms) Grade Level: 8 th Grade What is a Box and Whisker Plot? 1) Teaching objective(s):

More information

Classification and Regression by randomforest

Classification and Regression by randomforest Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many

More information

ENGINEERING SCIENCE H1 OUTCOME 1 - TUTORIAL 3 BENDING MOMENTS EDEXCEL HNC/D ENGINEERING SCIENCE LEVEL 4 H1 FORMERLY UNIT 21718P

ENGINEERING SCIENCE H1 OUTCOME 1 - TUTORIAL 3 BENDING MOMENTS EDEXCEL HNC/D ENGINEERING SCIENCE LEVEL 4 H1 FORMERLY UNIT 21718P ENGINEERING SCIENCE H1 OUTCOME 1 - TUTORIAL 3 BENDING MOMENTS EDEXCEL HNC/D ENGINEERING SCIENCE LEVEL 4 H1 FORMERLY UNIT 21718P This material is duplicated in the Mechanical Principles module H2 and those

More information

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

More information

10 GEOMETRIC DISTRIBUTION EXAMPLES:

10 GEOMETRIC DISTRIBUTION EXAMPLES: 10 GEOMETRIC DISTRIBUTION EXAMPLES: 1. Terminals on an on-line computer system are attached to a communication line to the central computer system. The probability that any terminal is ready to transmit

More information

6.2 Normal distribution. Standard Normal Distribution:

6.2 Normal distribution. Standard Normal Distribution: 6.2 Normal distribution Slide Heights of Adult Men and Women Slide 2 Area= Mean = µ Standard Deviation = σ Donation: X ~ N(µ,σ 2 ) Standard Normal Distribution: Slide 3 Slide 4 a normal probability distribution

More information

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.) Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center

More information

Introduction, descriptive statistics, R and data visualization

Introduction, descriptive statistics, R and data visualization enote 1 1 enote 1 Introduction, descriptive statistics, R and data visualization This is the first chapter in our eight-chapter material on introduction to statistics: 1. Introduction, descriptive statistics,

More information

Getting Started with R and RStudio 1

Getting Started with R and RStudio 1 Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following

More information

Variables. Exploratory Data Analysis

Variables. Exploratory Data Analysis Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Visualization and descriptive statistics. D.A. Forsyth

Visualization and descriptive statistics. D.A. Forsyth Visualization and descriptive statistics D.A. Forsyth What s going on here? Most important, most creative scientific question Getting answers Make helpful pictures and look at them Compute numbers in support

More information

Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

More information

Each function call carries out a single task associated with drawing the graph.

Each function call carries out a single task associated with drawing the graph. Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic

More information

t-test Statistics Overview of Statistical Tests Assumptions

t-test Statistics Overview of Statistical Tests Assumptions t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)

More information

MBA 611 STATISTICS AND QUANTITATIVE METHODS

MBA 611 STATISTICS AND QUANTITATIVE METHODS MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain

More information

Getting started with qplot

Getting started with qplot Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

# load in the files containing the methyaltion data and the source # code containing the SSRPMM functions

# load in the files containing the methyaltion data and the source # code containing the SSRPMM functions ################ EXAMPLE ANALYSES TO ILLUSTRATE SS-RPMM ######################## # load in the files containing the methyaltion data and the source # code containing the SSRPMM functions # Note, the SSRPMM

More information

Package TRADER. February 10, 2016

Package TRADER. February 10, 2016 Type Package Package TRADER February 10, 2016 Title Tree Ring Analysis of Disturbance Events in R Version 1.2-1 Date 2016-02-10 Author Pavel Fibich , Jan Altman ,

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!

EXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck! STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.

More information

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

More information

Examples of Tasks from CCSS Edition Course 3, Unit 5

Examples of Tasks from CCSS Edition Course 3, Unit 5 Examples of Tasks from CCSS Edition Course 3, Unit 5 Getting Started The tasks below are selected with the intent of presenting key ideas and skills. Not every answer is complete, so that teachers can

More information

Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score

Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score all Spring term exams utilizing the current hardware

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

# For usage of the functions, it is necessary to install the "survival" and the "penalized" package.

# For usage of the functions, it is necessary to install the survival and the penalized package. ###################################################################### ### R-script for the manuscript ### ### ### ### Survival models with preclustered ### ### gene groups as covariates ### ### ### ###

More information

Understanding, Identifying & Analyzing Box & Whisker Plots

Understanding, Identifying & Analyzing Box & Whisker Plots Understanding, Identifying & Analyzing Box & Whisker Plots CCSS: 6.SP.4, 8.SP.1 VA SOLs: A.10 Box and Whisker Plots Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme The inter quartile range

More information

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific

Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Name: The point value of each problem is in the left-hand margin. You

More information

2: Frequency Distributions

2: Frequency Distributions 2: Frequency Distributions Stem-and-Leaf Plots (Stemplots) The stem-and-leaf plot (stemplot) is an excellent way to begin an analysis. Consider this small data set: 218 426 53 116 309 504 281 270 246 523

More information

STAT355 - Probability & Statistics

STAT355 - Probability & Statistics STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive

More information

Stats on the TI 83 and TI 84 Calculator

Stats on the TI 83 and TI 84 Calculator Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures

Visualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the

More information

Computation of the Aggregate Claim Amount Distribution Using R and actuar. Vincent Goulet, Ph.D.

Computation of the Aggregate Claim Amount Distribution Using R and actuar. Vincent Goulet, Ph.D. Computation of the Aggregate Claim Amount Distribution Using R and actuar Vincent Goulet, Ph.D. Actuarial Risk Modeling Process 1 Model costs at the individual level Modeling of loss distributions 2 Aggregate

More information

Consolidation of Grade 3 EQAO Questions Data Management & Probability

Consolidation of Grade 3 EQAO Questions Data Management & Probability Consolidation of Grade 3 EQAO Questions Data Management & Probability Compiled by Devika William-Yu (SE2 Math Coach) GRADE THREE EQAO QUESTIONS: Data Management and Probability Overall Expectations DV1

More information

270107 - MD - Data Mining

270107 - MD - Data Mining Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

ESSENTIAL REVISION QUESTIONS. MathsWatch Higher. Book

ESSENTIAL REVISION QUESTIONS. MathsWatch Higher. Book GCSE Mathematics ESSENTIAL REVISION QUESTIONS MathsWatch Higher Book with answers to all questions www.mathswatch.com enquiries to info@mathswatch.com 1T) a) 4t + 7t b) 4t 7t c) 6y + 2w 5y d) 6y 3t e)

More information

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data

Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm

More information

Pr(X = x) = f(x) = λe λx

Pr(X = x) = f(x) = λe λx Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error

More information

Instruction Manual for SPC for MS Excel V3.0

Instruction Manual for SPC for MS Excel V3.0 Frequency Business Process Improvement 281-304-9504 20314 Lakeland Falls www.spcforexcel.com Cypress, TX 77433 Instruction Manual for SPC for MS Excel V3.0 35 30 25 LSL=60 Nominal=70 Capability Analysis

More information