Chapter 1 Exploratory data analysis
|
|
- Norman Cameron
- 8 years ago
- Views:
Transcription
1 Chapter 1 Exploratory data analysis Xavier Gendre M2 SE X. Gendre (M2 SE) Data Mining 1 / 33
2 cars$dist ## [1] ## [18] ## [35] summary(cars$dist) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## X. Gendre (M2 SE) Data Mining 2 / 33
3 p <- (0:100)/100 qp <- quantile(cars$dist, probs = p) plot(p, qp, type = "l", col = "red") qp 60 Third quartile Median First quartile p X. Gendre (M2 SE) Data Mining 3 / 33
4 boxplot(cars$dist, horizontal = TRUE, col = "orange") X. Gendre (M2 SE) Data Mining 4 / 33
5 hist(cars$dist, breaks = 20 * (-1:7), freq = F, col = "orange", ylab = "") Histogram of cars$dist cars$dist X. Gendre (M2 SE) Data Mining 5 / 33
6 plot(cars$speed, cars$dist, pch = 4) cars$dist cars$speed X. Gendre (M2 SE) Data Mining 6 / 33
7 cor(cars$speed, cars$dist) ## [1] a <- cov(cars$speed, cars$dist)/var(cars$speed) b <- mean(cars$dist) - a * mean(cars$speed) X. Gendre (M2 SE) Data Mining 7 / 33
8 abline(b, a, col = "red") cars$dist cars$speed X. Gendre (M2 SE) Data Mining 8 / 33
9 Grades data X ## Math Phys Fr Eng ## Benny ## Bobby ## Brandy ## Coby ## Daisy ## Emily ## Judy ## Marty ## Sandy Xbar <- scale(x, scale = F) Sigma <- (t(xbar) %*% Xbar)/9 ## Math Phys Fr Eng ## Math ## Phys ## Fr ## Eng X. Gendre (M2 SE) Data Mining 9 / 33
10 boxplot(xbar, col = "orange") Math Phys Fr Eng X. Gendre (M2 SE) Data Mining 10 / 33
11 dg <- eigen(sigma) dg$values ## [1] cumsum(dg$values)/sum(dg$values) ## [1] X. Gendre (M2 SE) Data Mining 11 / 33
12 plot(1:4, dg$values, type = "b", col = "orange") dg$values :4 X. Gendre (M2 SE) Data Mining 12 / 33
13 Skyrim bows data X ## Weight Value Damage Speed ## Long Bow ## Hunting Bow ## Orcish Bow ## Nord Hero Bow ## Dwarven Bow ## Elven Bow ## Glass Bow ## Ebony Bow ## Daedric Bow ## Dragonbone Bow ## Crossbow ## Enhanced Crossbow ## Dwarven Crossbow ## Enhanced Dwarven Crossbow X. Gendre (M2 SE) Data Mining 13 / 33
14 Xbar <- scale(x, scale = F) Sigma <- (t(xbar) %*% Xbar)/nrow(X) ## Weight Value Damage Speed ## Weight ## Value ## Damage ## Speed M <- diag(1/diag(sigma)) ## Weight Value Damage Speed ## Weight e ## Value e ## Damage e ## Speed e X. Gendre (M2 SE) Data Mining 14 / 33
15 boxplot(xbar, col = "orange") boxplot(xbar %*% sqrt(m), col = "blue") Weight Value Damage Speed Weight Value Damage Speed X. Gendre (M2 SE) Data Mining 15 / 33
16 Mhalf <- diag(1/sqrt(diag(sigma))) dg <- eigen(mhalf %*% Sigma %*% Mhalf) dg$vectors <- diag(sqrt(diag(sigma))) %*% dg$vectors dg$values ## [1] cumsum(dg$values)/sum(dg$values) ## [1] X. Gendre (M2 SE) Data Mining 16 / 33
17 plot(1:4, dg$values, type = "b", col = "orange") dg$values :4 X. Gendre (M2 SE) Data Mining 17 / 33
18 Grades data C <- Xbar %*% dg$vectors cos2 <- rowsums(c[, 1:2]^2)/rowSums(C^2) ## Benny Bobby Brandy Coby Daisy Emily Judy Marty Sandy ## rbind(c[, 1]^2/(nrow(X) * dg$values[1]), C[, 2]^2/(nrow(X) * dg$values[2])) ## Benny Bobby Brandy Coby Daisy Emily Judy Marty ## [1,] ## [2,] ## Sandy ## [1,] ## [2,] X. Gendre (M2 SE) Data Mining 18 / 33
19 plot(c[, 1:2], pch = 2, cex = cos2, col = "orange", xlab = "", ylab = "") Emily 4 Marty 2 Daisy Benny Bobby 0 Coby -2 Sandy Brandy -4-6 Judy X. Gendre (M2 SE) Data Mining 19 / 33
20 Skyrim bows data C <- Xbar %*% M %*% dg$vectors cos2 <- rowsums(c[, 1:2]^2)/rowSums(C^2) ## Long Bow Hunting Bow ## ## Orcish Bow Nord Hero Bow ## ## Dwarven Bow Elven Bow ## ## Glass Bow Ebony Bow ## ## Daedric Bow Dragonbone Bow ## ## Crossbow Enhanced Crossbow ## ## Dwarven Crossbow Enhanced Dwarven Crossbow ## X. Gendre (M2 SE) Data Mining 20 / 33
21 plot(c[, 1:2], pch = 2, cex = cos2, col = "orange", xlab = "", ylab = "") Daedric Bow Dragonbone Bow Ebony Bow Glass Bow Elven Bow Dwarven Bow Orcish Bow Nord Hero Bow Hunting Bow Long Bow Enhanced Dwarven Crossbow Crossbow Enhanced Crossbow X. Gendre (M2 SE) Data Mining 21 / 33
22 Grades data Rho <- diag(1/sqrt(diag(sigma))) %*% dg$vectors %*% diag(sqrt(dg$values)) Rho[, 1:2] ## [,1] [,2] ## Math ## Phys ## Fr ## Eng rowsums(rho[, 1:2]^2) ## Math Phys Fr Eng ## X. Gendre (M2 SE) Data Mining 22 / 33
23 DrawCorCircle(Rho) Phys Math Eng Fr X. Gendre (M2 SE) Data Mining 23 / 33
24 DrawBiplot(C, Rho) Math Phys Marty Emily Coby Daisy Bobby Benny Eng Fr Sandy Brandy Judy X. Gendre (M2 SE) Data Mining 24 / 33
25 Skyrim bows data Rho <- diag(1/sqrt(diag(sigma))) %*% dg$vectors %*% diag(sqrt(dg$values)) rownames(rho) <- colnames(x) Rho[, 1:2] ## [,1] [,2] ## Weight ## Value ## Damage ## Speed rowsums(rho[, 1:2]^2) ## Weight Value Damage Speed ## X. Gendre (M2 SE) Data Mining 25 / 33
26 DrawCorCircle(Rho) Value Weight Damage -1.0 Speed X. Gendre (M2 SE) Data Mining 26 / 33
27 DrawBiplot(C, Rho) Ebony Bow 1 Value 0.5 Glass Bow Elven Bow Dwarven Bow Orcish Bow Dragonbone Bow Nord Hero Hunting Bow Long Bow Bow 0 Daedric Bow 0-1 Weight Damage -0.5 Crossbow Enhanced Crossbow -2 Enhanced Dwarven Crossbow Speed X. Gendre (M2 SE) Data Mining 27 / 33
28 Bourdieu data DR SCE LET SC MD PH PD IUT Total EAG SAG PT PLCS CM EMP OUV OTH Total X. Gendre (M2 SE) Data Mining 28 / 33
29 D1 <- diag(1/rowsums(t)) P1 <- D1 %*% T D2 <- diag(1/colsums(t)) P2 <- D2 %*% t(t) PCA of line profiles: M1half <- diag(sqrt(n/colsums(t))) M1halfInv <- diag(sqrt(colsums(t)/n)) dg <- eigen(m1half %*% t(p1) %*% t(p2) %*% M1halfInv) dg$values <- dg$values[2:8] # Avoid the trivial eigenvlue dg$vectors <- (M1halfInv %*% dg$vectors)[, 2:8] # Idem C1 <- P1 %*% (n * D2) %*% dg$vectors Transition formula: C2 <- P2 %*% C1 %*% diag(1/sqrt(dg$values)) X. Gendre (M2 SE) Data Mining 29 / 33
30 # Representation quality of the x's values rowsums(c1[, 1:2]^2)/rowSums(C1^2) ## EAG SAG PT PLCS CM EMP OUV OTH ## # Representation quality of the y's values rowsums(c2[, 1:2]^2)/rowSums(C2^2) ## DR SCE LET SC MD PH PD IUT ## cumsum(dg$values)/sum(dg$values) ## [1] X. Gendre (M2 SE) Data Mining 30 / 33
31 plot(dg$values, type = "b", col = "orange", xlab = "", ylab = "") X. Gendre (M2 SE) Data Mining 31 / 33
32 DrawCA(C1, C2) SAG OUV OTH LET PD EMP CM SC DR SCE PT PLCS MD IUT PH EAG X. Gendre (M2 SE) Data Mining 32 / 33
33 Bibliography The Elements of Statistical Learning : Data Mining, Inference and Prediction, J. Friedman, T. Hastie and R. Tibshirani (2009) Principal Component Analysis, I.T. Jolliffe (2002) Wiki Stat, X. Gendre (M2 SE) Data Mining 33 / 33
Graphics in R. Biostatistics 615/815
Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions
More informationsample median Sample quartiles sample deciles sample quantiles sample percentiles Exercise 1 five number summary # Create and view a sorted
Sample uartiles We have seen that the sample median of a data set {x 1, x, x,, x n }, sorted in increasing order, is a value that divides it in such a way, that exactly half (i.e., 50%) of the sample observations
More informationViewing Ecological data using R graphics
Biostatistics Illustrations in Viewing Ecological data using R graphics A.B. Dufour & N. Pettorelli April 9, 2009 Presentation of the principal graphics dealing with discrete or continuous variables. Course
More information2. Filling Data Gaps, Data validation & Descriptive Statistics
2. Filling Data Gaps, Data validation & Descriptive Statistics Dr. Prasad Modak Background Data collected from field may suffer from these problems Data may contain gaps ( = no readings during this period)
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationTutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationR Graphics II: Graphics for Exploratory Data Analysis
UCLA Department of Statistics Statistical Consulting Center Irina Kukuyeva ikukuyeva@stat.ucla.edu April 26, 2010 Outline 1 Summary Plots 2 Time Series Plots 3 Geographical Plots 4 3D Plots 5 Simulation
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationExploratory Data Analysis
Goals of EDA Relationship between mean response and covariates (including time). Variance, correlation structure, individual-level heterogeneity. Guidelines for graphical displays of longitudinal data
More informationComputer Statistics with R
MAREK GAGOLEWSKI KONSTANCJA BOBECKA-WESO LOWSKA PRZEMYS LAW GRZEGORZEWSKI Computer Statistics with R 2. Exploratory Data Analysis (Descriptive Statistics) Faculty of Mathematics and Information Science
More informationChapter 7 Section 1 Homework Set A
Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the
More information10 20 30 40 50 60 Mark. Use this information and the cumulative frequency graph to draw a box plot showing information about the students marks.
GCSE Exam Questions on Frequency (Grade B) 1. 200 students took a test. The cumulative graph gives information about their marks. 200 160 120 80 0 10 20 30 50 60 Mark The lowest mark scored in the test
More informationconsider the number of math classes taken by math 150 students. how can we represent the results in one number?
ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.
More informationLab 13: Logistic Regression
Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account
More informationUsing Open Source Software to Teach Mathematical Statistics p.1/29
Using Open Source Software to Teach Mathematical Statistics Douglas M. Bates bates@r-project.org University of Wisconsin Madison Using Open Source Software to Teach Mathematical Statistics p.1/29 Outline
More informationCluster Analysis using R
Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other
More informationExploratory Data Analyses
5 Exploratory Data Analyses 5.1 Introduction What do time series data look like? The purpose of this chapter is to provide a number of different answers to this question. In addition, we outline the rudiments
More informationNMSA230: Software for Mathematics and Stochastics Sweave Example file
NMSA230: Software for Mathematics and Stochastics Sweave Example file 1 Some Sweave examples This document was prepared using Sweave (Leisch, 2002) in R (R Core Team, 2015), version 3.2.0 (2015-04-16).
More information3, 8, 8, 6, 4, 2, 8, 9, 4, 5, 1, 5, 7, 8, 9. 3 kg, 11 kg, 5 kg, 20 kg, 11 kg,
GCSE Exam Questions on Averages Grade E Questions 1. Work out the median of these 15 numbers. 3, 8, 8, 6, 4, 2, 8, 9, 4, 5, 1, 5, 7, 8, 9. (Total 2 marks) 2. Five boxes are weighed. Their weights are given
More informationOutline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
More informationRisk and return (1) Class 9 Financial Management, 15.414
Risk and return (1) Class 9 Financial Management, 15.414 Today Risk and return Statistics review Introduction to stock price behavior Reading Brealey and Myers, Chapter 7, p. 153 165 Road map Part 1. Valuation
More informationChapter 23 Inferences About Means
Chapter 23 Inferences About Means Chapter 23 - Inferences About Means 391 Chapter 23 Solutions to Class Examples 1. See Class Example 1. 2. We want to know if the mean battery lifespan exceeds the 300-minute
More informationControl de calidad. Felipe de Mendiburu. Second sample Calibration data in D[trial] New data in D[!trial] First samples UCL. Group summary statistics
Control de calidad con First samples Second sample Calibration data in D[trial] New data in D[!trial] summary statistics 0.0 0.1 0.2 0.3 0.4 0.5 Felipe de Mendiburu 1 4 7 11 15 19 23 27 31 39 47 55 63
More informationMath 108 Exam 3 Solutions Spring 00
Math 108 Exam 3 Solutions Spring 00 1. An ecologist studying acid rain takes measurements of the ph in 12 randomly selected Adirondack lakes. The results are as follows: 3.0 6.5 5.0 4.2 5.5 4.7 3.4 6.8
More informationLecture 5 : The Poisson Distribution
Lecture 5 : The Poisson Distribution Jonathan Marchini November 10, 2008 1 Introduction Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume,
More information1 J (Gr 6): Summarize and describe distributions.
MAT.07.PT.4.TRVLT.A.299 Sample Item ID: MAT.07.PT.4.TRVLT.A.299 Title: Travel Time to Work (TRVLT) Grade: 07 Primary Claim: Claim 4: Modeling and Data Analysis Students can analyze complex, real-world
More informationSTATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS
STATISTICAL LABORATORY, USING R FOR BASIC STATISTICAL ANALYSIS Manuela Cattelan 1 ABC OF R 1.1 ARITHMETIC AND LOGICAL OPERATORS. VARIABLES AND AS- SIGNMENT OPERATOR R works just as a pocket calculator,
More informationStatistical Analysis of Gene Expression Data With Oracle & R (- data mining)
Statistical Analysis of Gene Expression Data With Oracle & R (- data mining) Patrick E. Hoffman Sc.D. Senior Principal Analytical Consultant pat.hoffman@oracle.com Agenda (Oracle & R Analysis) Tools Loading
More informationWEEK #22: PDFs and CDFs, Measures of Center and Spread
WEEK #22: PDFs and CDFs, Measures of Center and Spread Goals: Explore the effect of independent events in probability calculations. Present a number of ways to represent probability distributions. Textbook
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationStatistics with the TI-86
Statistics with the TI-86 The TI-86 Manual, besides coming with your calculator, can be downloaded or read in a browser in Adobe Acrobat Reader form from http://education.ti.com/product/pdf/gb/ti86book.pdf.
More informationDescriptive Statistics
Descriptive Statistics Suppose following data have been collected (heights of 99 five-year-old boys) 117.9 11.2 112.9 115.9 18. 14.6 17.1 117.9 111.8 16.3 111. 1.4 112.1 19.2 11. 15.4 99.4 11.1 13.3 16.9
More informationIntroduction to R and Exploratory data analysis
Introduction to R and Exploratory data analysis Gavin Simpson November 2006 Summary In this practical class we will introduce you to working with R. You will complete an introductory session with R and
More informationAnalysing equity portfolios in R
Analysing equity portfolios in R Using the portfolio package by David Kane and Jeff Enos Introduction 1 R is used by major financial institutions around the world to manage billions of dollars in equity
More informationBellwork Students will review their study guide for their test. Box-and-Whisker Plots will be discussed after the test.
Course: 7 th Grade Math Student Objective (Obj. 5c) TSW graph and interpret data in a box-and-whisker plot. DETAIL LESSON PLAN Friday, March 23 / Monday, March 26 Lesson 1-10 Box-and-Whisker Plot (Textbook
More informationA survey analysis example
A survey analysis example Thomas Lumley November 20, 2013 This document provides a simple example analysis of a survey data set, a subsample from the California Academic Performance Index, an annual set
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Lecture 3: QR, least squares, linear regression Linear Algebra Methods for Data Mining, Spring 2007, University
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media
ISyE 2028 Basic Statistical Methods - Fall 2015 Bonus Project: Big Data Analytics Final Report: Time spent on social media Abstract: The growth of social media is astounding and part of that success was
More informationParallel System This is a system that will fail only if they all fail.
1 8 RELIABILITY Systems Reliability A system consists of components which determine whether or not it will work. There are various types of configurations of the components in different systems. Series
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationAlternative Graphics System Lattice
Alternative Graphics System Lattice Lattice/trellis is another high-level graphics system that makes many complex things easy, but annotating plots can be initially complex. This material is optional.
More informationWeek 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log
Week 11 Lecture 2: Analyze your data: Descriptive Statistics, Correct by Taking Log Instructor: Eakta Jain CIS 6930, Research Methods for Human-centered Computing Scribe: Chris(Yunhao) Wan, UFID: 1677-3116
More informationDescriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion
Descriptive Statistics Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion Statistics as a Tool for LIS Research Importance of statistics in research
More informationLinda Staub & Alexandros Gekenidis
Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until
More informationMilk Data Analysis. 1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED
1. Objective Introduction to SAS PROC MIXED Analyzing protein milk data using STATA Refit protein milk data using PROC MIXED 2. Introduction to SAS PROC MIXED The MIXED procedure provides you with flexibility
More informationFirst Midterm Exam (MATH1070 Spring 2012)
First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems
More informationMath Mammoth End-of-the-Year Test, Grade 6, Answer Key
Math Mammoth End-of-the-Year Test, Grade 6, Answer Key Instructions to the teacher: In order to continue with the Math Mammoth Grade 7 Complete Worktext, I recommend that the student score a minimum of
More informationLecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
More informationLINDEN PARK PH. 11 PROJECT NO. 53B22353 SHEET NO. D-5 DETAIL TYPICAL SECTION CITY OF MADISON SIDEWALK COMPLETION DETAILS ROW VAR. 9"-12" ORIGINATOR: CITY OF MADISON, STREETS DIVISION REV. DATE: PLOT NAME:
More informationLecture 2: Exploratory Data Analysis with R
Lecture 2: Exploratory Data Analysis with R Last Time: 1. Introduction: Why use R? / Syllabus 2. R as calculator 3. Import/Export of datasets 4. Data structures 5. Getting help, adding packages 6. Homework
More informationQuantitative Software Management
Quantitative Software Management The Impact of Team Size on Software Project Productivity Donald M. Beckett QSM, Inc. 2000 Corporate Ridge, Suite 900 McLean, VA 22102 (360) 697-2640, fax: (703) 749-3795
More informationWhat is a Box and Whisker Plot?
Algebra/Geometry Institute Summer 2006 Faculty Name: Archie Mitchell School: Walter C. Robinson Achievement Center (Cleveland, Ms) Grade Level: 8 th Grade What is a Box and Whisker Plot? 1) Teaching objective(s):
More informationClassification and Regression by randomforest
Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many
More informationENGINEERING SCIENCE H1 OUTCOME 1 - TUTORIAL 3 BENDING MOMENTS EDEXCEL HNC/D ENGINEERING SCIENCE LEVEL 4 H1 FORMERLY UNIT 21718P
ENGINEERING SCIENCE H1 OUTCOME 1 - TUTORIAL 3 BENDING MOMENTS EDEXCEL HNC/D ENGINEERING SCIENCE LEVEL 4 H1 FORMERLY UNIT 21718P This material is duplicated in the Mechanical Principles module H2 and those
More informationThe right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median
CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box
More information10 GEOMETRIC DISTRIBUTION EXAMPLES:
10 GEOMETRIC DISTRIBUTION EXAMPLES: 1. Terminals on an on-line computer system are attached to a communication line to the central computer system. The probability that any terminal is ready to transmit
More information6.2 Normal distribution. Standard Normal Distribution:
6.2 Normal distribution Slide Heights of Adult Men and Women Slide 2 Area= Mean = µ Standard Deviation = σ Donation: X ~ N(µ,σ 2 ) Standard Normal Distribution: Slide 3 Slide 4 a normal probability distribution
More informationCenter: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)
Center: Finding the Median When we think of a typical value, we usually look for the center of the distribution. For a unimodal, symmetric distribution, it s easy to find the center it s just the center
More informationIntroduction, descriptive statistics, R and data visualization
enote 1 1 enote 1 Introduction, descriptive statistics, R and data visualization This is the first chapter in our eight-chapter material on introduction to statistics: 1. Introduction, descriptive statistics,
More informationGetting Started with R and RStudio 1
Getting Started with R and RStudio 1 1 What is R? R is a system for statistical computation and graphics. It is the statistical system that is used in Mathematics 241, Engineering Statistics, for the following
More informationVariables. Exploratory Data Analysis
Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationHow To Test For Significance On A Data Set
Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.
More informationVisualization and descriptive statistics. D.A. Forsyth
Visualization and descriptive statistics D.A. Forsyth What s going on here? Most important, most creative scientific question Getting answers Make helpful pictures and look at them Compute numbers in support
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationEach function call carries out a single task associated with drawing the graph.
Chapter 3 Graphics with R 3.1 Low-Level Graphics R has extensive facilities for producing graphs. There are both low- and high-level graphics facilities. The low-level graphics facilities provide basic
More informationt-test Statistics Overview of Statistical Tests Assumptions
t-test Statistics Overview of Statistical Tests Assumption: Testing for Normality The Student s t-distribution Inference about one mean (one sample t-test) Inference about two means (two sample t-test)
More informationMBA 611 STATISTICS AND QUANTITATIVE METHODS
MBA 611 STATISTICS AND QUANTITATIVE METHODS Part I. Review of Basic Statistics (Chapters 1-11) A. Introduction (Chapter 1) Uncertainty: Decisions are often based on incomplete information from uncertain
More informationGetting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More information# load in the files containing the methyaltion data and the source # code containing the SSRPMM functions
################ EXAMPLE ANALYSES TO ILLUSTRATE SS-RPMM ######################## # load in the files containing the methyaltion data and the source # code containing the SSRPMM functions # Note, the SSRPMM
More informationPackage TRADER. February 10, 2016
Type Package Package TRADER February 10, 2016 Title Tree Ring Analysis of Disturbance Events in R Version 1.2-1 Date 2016-02-10 Author Pavel Fibich , Jan Altman ,
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationEXAM #1 (Example) Instructor: Ela Jackiewicz. Relax and good luck!
STP 231 EXAM #1 (Example) Instructor: Ela Jackiewicz Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned.
More informationEngineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
More informationExamples of Tasks from CCSS Edition Course 3, Unit 5
Examples of Tasks from CCSS Edition Course 3, Unit 5 Getting Started The tasks below are selected with the intent of presenting key ideas and skills. Not every answer is complete, so that teachers can
More informationInformation Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score
Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score all Spring term exams utilizing the current hardware
More informationDATA INTERPRETATION AND STATISTICS
PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More information2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.
Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible
More information# For usage of the functions, it is necessary to install the "survival" and the "penalized" package.
###################################################################### ### R-script for the manuscript ### ### ### ### Survival models with preclustered ### ### gene groups as covariates ### ### ### ###
More informationUnderstanding, Identifying & Analyzing Box & Whisker Plots
Understanding, Identifying & Analyzing Box & Whisker Plots CCSS: 6.SP.4, 8.SP.1 VA SOLs: A.10 Box and Whisker Plots Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme The inter quartile range
More informationContemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific
Contemporary Mathematics Online Math 1030 Sample Exam I Chapters 12-14 No Time Limit No Scratch Paper Calculator Allowed: Scientific Name: The point value of each problem is in the left-hand margin. You
More information2: Frequency Distributions
2: Frequency Distributions Stem-and-Leaf Plots (Stemplots) The stem-and-leaf plot (stemplot) is an excellent way to begin an analysis. Consider this small data set: 218 426 53 116 309 504 281 270 246 523
More informationSTAT355 - Probability & Statistics
STAT355 - Probability & Statistics Instructor: Kofi Placid Adragni Fall 2011 Chap 1 - Overview and Descriptive Statistics 1.1 Populations, Samples, and Processes 1.2 Pictorial and Tabular Methods in Descriptive
More informationStats on the TI 83 and TI 84 Calculator
Stats on the TI 83 and TI 84 Calculator Entering the sample values STAT button Left bracket { Right bracket } Store (STO) List L1 Comma Enter Example: Sample data are {5, 10, 15, 20} 1. Press 2 ND and
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationVisualizing Data. Contents. 1 Visualizing Data. Anthony Tanbakuchi Department of Mathematics Pima Community College. Introductory Statistics Lectures
Introductory Statistics Lectures Visualizing Data Descriptive Statistics I Department of Mathematics Pima Community College Redistribution of this material is prohibited without written permission of the
More informationComputation of the Aggregate Claim Amount Distribution Using R and actuar. Vincent Goulet, Ph.D.
Computation of the Aggregate Claim Amount Distribution Using R and actuar Vincent Goulet, Ph.D. Actuarial Risk Modeling Process 1 Model costs at the individual level Modeling of loss distributions 2 Aggregate
More informationConsolidation of Grade 3 EQAO Questions Data Management & Probability
Consolidation of Grade 3 EQAO Questions Data Management & Probability Compiled by Devika William-Yu (SE2 Math Coach) GRADE THREE EQAO QUESTIONS: Data Management and Probability Overall Expectations DV1
More information270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
More informationCSCI-599 DATA MINING AND STATISTICAL INFERENCE
CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:
More informationESSENTIAL REVISION QUESTIONS. MathsWatch Higher. Book
GCSE Mathematics ESSENTIAL REVISION QUESTIONS MathsWatch Higher Book with answers to all questions www.mathswatch.com enquiries to info@mathswatch.com 1T) a) 4t + 7t b) 4t 7t c) 6y + 2w 5y d) 6y 3t e)
More informationData Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data
Data Preparation Part 1: Exploratory Data Analysis & Data Cleaning, Missing Data CAS Predictive Modeling Seminar Louise Francis Francis Analytics and Actuarial Data Mining, Inc. www.data-mines.com Louise.francis@data-mines.cm
More informationPr(X = x) = f(x) = λe λx
Old Business - variance/std. dev. of binomial distribution - mid-term (day, policies) - class strategies (problems, etc.) - exponential distributions New Business - Central Limit Theorem, standard error
More informationInstruction Manual for SPC for MS Excel V3.0
Frequency Business Process Improvement 281-304-9504 20314 Lakeland Falls www.spcforexcel.com Cypress, TX 77433 Instruction Manual for SPC for MS Excel V3.0 35 30 25 LSL=60 Nominal=70 Capability Analysis
More information