# EXPLORING SPATIAL PATTERNS IN YOUR DATA

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 EXPLORING SPATIAL PATTERNS IN YOUR DATA

2 OBJECTIVES Learn how to examine your data using the Geostatistical Analysis tools in ArcMap. Learn how to use descriptive statistics in ArcMap and Geoda to analyze data. Be able to identify Geostatistical Analysis tools that can be used for further analysis.

3 WHY EXPLORE YOUR DATA? It allows you to better select an appropriate tool to analyze your data. If you skip exploring your data, you may miss key information about it that may lead to incorrect conclusions and decisions.

4 GEODA VS. ARCMAP Geoda free, open-source, simple, software specifically for statistical analysis ArcMap proprietary, GIS software that can perform statistical analysis along with hundreds of other analyses

5 GEODA VS. ARCMAP With ArcMap you can view several data layers at once. In Geoda, you view only one data layer. Some tools are found in both programs, while some are found in only one.

6 EXPLORE THE LOCATION OF YOUR DATA

7 EXPLORE THE LOCATION OF YOUR DATA Explore: size of the study area mean median direction data are oriented You will see where data are clustered relative to the rest of the data.

8 MEAN CENTER The geographic center for a set of features. Constructed from the average x and y values for the input feature centroids (middle points, if input features are polygons).

9 MEDIAN CENTER Median Center is robust to outliers. Uses an algorithm to find the point that minimizes travel from it to all other features in the dataset. At each step (t) in the algorithm, a candidate Median Center is found (X t, Y t ) and refined until it represents the location that minimizes Euclidian Distance d to all features (i) in the dataset.

10 DIRECTION DISTRIBUTION (STANDARD DEVIATIONAL ELLIPSE) Standard deviational ellipses summarize the spatial characteristics of geographic features: central tendency, dispersion, and directional trends. The ellipse allows you to see if the distribution of features is elongated and hence has a particular orientation. When the underlying spatial pattern of features is concentrated in the center with fewer features toward the periphery (a spatial normal distribution), a one standard deviation ellipse polygon will cover approximately 68 percent of the features two standard deviations will contain approximately 95 percent of the features three standard deviations will cover approximately 99 percent of the features

11

12 EXPLORE THE VALUES OF YOUR DATA

13 NORMAL DISTRIBUTION Some analysis tools assume a normal distribution: Mean and median are similar Data are symmetrical

14 DATA FREQUENCY USING HISTOGRAMS

15 DATA DISTRIBUTION USING A QQ PLOT Many A characteristics normally Not distributed normal of a normal dataset dataset A normal QQ plot shows the relationship of your data to a normal distribution line.

16 BOX PLOT Displays the median and interquartile range (IQ) (25%-75%) Hinge = multiple of interquartile range

17 MAPS For examining data values and frequencies: Quantile Map Natural breaks Equal intervals For finding outliers: Percentile Map Box Map Standard Deviation Map

18 QUANTILE MAP Displays the distribution of values in categories with an equal number of observations in each category.

19 EQUAL INTERVAL MAP Sets the value ranges in each category equal in size. The entire range of data values is divided equally into however many categories have been chosen.

20 NATURAL BREAKS MAP Seeks to reduce the variance within classes and maximize the variance between classes

21 OTHER EXPLORATORY METHODS Scatter Plot (2 variables) Parallel coordinate plot (A pattern of lines is drawn that connects the coordinates of each observation across the variables on parallel x-axes.)

22 DETECT OUTLIERS

23 OUTLIERS Outliers can reveal mistakes, unusual occurrences, and shift points in data patterns (a valley in a mountain range). You should use more than one method to find outliers because some techniques will only highlight data values near the two ends of your range.

24 PERCENTILE MAP Groups ranked data into 6 categories Lowest and highest 1% are potential outliers

25 BOX MAP Groups data into 4 categories, plus 2 outlier categories at both ends Data are outliers if they are 1.5 or 3 times the IQ. Detects outliers with more certainty than a percentile map

26 STANDARD DEVIATION MAP Displays data 3 standard deviations above and below the mean. As a parametric map, it is sensitive to outliers.

27 SEMIVARIOGRAM CLOUD When points closer together have greater differences in their values, this may indicate an outlier in the data. The selected points may be outliers.

28 VORONOI MAP The gray polygons may be outliers. Cluster Voronoi maps show spatial outliers in your data; simple Voronoi maps can pinpoint data values that are many class breaks removed from surrounding polygons.

29 HISTOGRAM Values in the last bars to the left or right, if far removed from the adjacent values, may indicate outliers.

30 NORMAL QQ PLOT Values at the tails of a normal QQ plot can also be outliers. This can happen when the tail values do not fall along the reference line.

31 BOXPLOT Points outside the hinges (represented by the black, horizontal lines), maybe outliers.

32 EXPLORE SPATIAL RELATIONSHIPS IN YOUR DATA

33 SPATIAL AUTOCORRELATION Everything is related, but objects closer together are more related than objects farther apart. Explore using a semivariogram graph or cloud Can also be explored using Moran s I and Getis-Ord G statistics

34 Height (sill) = variation between data values. Range = distance between points at which the semivariogram flattens out. As the range increase, height should increase, since points further away from each other are not as related, so there should be more variation. If a semivariogram is a horizontal line, there is no spatial autocorrelation.

35 VARIATION IN YOUR DATA Many spatial statistics analysis techniques assume your data are stationary, meaning the relationship between two points and their values depends on the distance between them, not their exact location. Explore variation using a Voronoi map. A Voronoi map is created by defining Thiessen polygons around each point in your dataset. Any location inside a polygon represents the area closer to that data point than to any other data point. This allows you to explore the variation of each sample point based on its relationship to surrounding sample points.

36 A SIMPLE VORONOI MAP Green = little local variation Orange and Red = greater local variation A simple Voronoi map shows the data value at each location. The map is symbolized using a geometrical interval classification. This will show the variation in data values across your entire dataset.

37 TYPES OF VORONOI MAPS Simple: The value assigned to a polygon is the value recorded at the sample point within that polygon. Mean: The value assigned to a polygon is the mean value that is calculated from the polygon and its neighbors. Mode: All polygons are categorized using five class intervals. The value assigned to a polygon is the mode (most frequently occurring class) of the polygon and its neighbors. Cluster: All polygons are categorized using five class intervals. If the class interval of a polygon is different from each of its neighbors, the polygon is colored gray and put into a sixth class to distinguish it from its neighbors. Entropy: All polygons are categorized using five classes based on a natural grouping of data values (smart quantiles). The value assigned to a polygon is the entropy that is calculated from the polygon and its neighbors. Entropy = - Σ (p i * Log p i ),

38 EXPLORE TRENDS IN YOUR DATA

39 TREND ANALYSIS You can use the trend analysis tool in Arcmap to visually compare the trend lines with any patterns in your data. When exploring trends, your data locations are mapped along the x- and y-axes. The values of each data location are mapped as height (z-axis). Trends are analyzed based on direction and on the order of the line that fits the trend. The trend line is a mathematical function, or polynomial, that describes the variation in the data.

40 You can determine whether the order of the polynomial fits your data based on the shape created by the line. A second-order polynomial will appear as an upward or a downward curve (known as a parabola). These polynomials show a clear curve, indicating a second-order trend in the data.

41 SELECTING AN ANALYSIS TECHNIQUE

42 Each of the following techniques are types of interpolation. Interpolation creates surfaces based on spatially continuous data. Each surface uses the values and locations of your points to create (or interpolate) the values for the remaining points in the surface.

43 GEOSTATISTICAL INTERPOLATION Creates surfaces using the relationships between your data locations and their values. Predicts values based on your existing data. Assumptions: Data is not clustered. (Simple kriging technique has a declustering option.) Data is normally distributed. (Transformation options are available.) Data is stationary (no local variation). Data is autocorrelated. Data has no local trends. (You can remove trends from data as part of the interpolation process. )

44 GLOBAL DETERMINISTIC INTERPOLATION Creates surfaces using the existing values at each location. Uses your entire dataset to create your surface. Assumptions: Outliers have been removed from the data. Global trends exist in the data.

45 LOCAL DETERMINISTIC INTERPOLATION Uses several subsets, or neighborhoods, within an entire dataset to create the different components of the surface. Assumption: Data is normally distributed.

46 INVERSE DISTANCE WEIGHTED INTERPOLATION (IDW) A type of local deterministic interpolation. Assumptions: Data is not clustered. Data is autocorrelated.

47 OTHER SPATIAL STATISTICAL TESTS Tests for spatial autocorrelation Getis-Ord General G and Global Moran s I (to determine overall clustering and dispersion of values) Hot Spot Analysis (Getis-Ord Gi*) and Anselin s Local Moran s I (to determine specific clusters of high and low values) Regression Used to evaluate relationships between two or more feature attributes. Are location, crime rates, racial makeup, and income related to housing values in a census tract?

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

### Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes

A frequency distribution is a table used to describe a data set. A frequency table lists intervals or ranges of data values called data classes together with the number of data values from the set that

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Descriptive Statistics. Understanding Data: Categorical Variables. Descriptive Statistics. Dataset: Shellfish Contamination

Descriptive Statistics Understanding Data: Dataset: Shellfish Contamination Location Year Species Species2 Method Metals Cadmium (mg kg - ) Chromium (mg kg - ) Copper (mg kg - ) Lead (mg kg - ) Mercury

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Introduction to Modeling Spatial Processes Using Geostatistical Analyst

Introduction to Modeling Spatial Processes Using Geostatistical Analyst Konstantin Krivoruchko, Ph.D. Software Development Lead, Geostatistics kkrivoruchko@esri.com Geostatistics is a set of models and

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Lecture 1: Review and Exploratory Data Analysis (EDA)

Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course

### GEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification

GEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification Types of Quantitative Thematic Maps (from last lecture) Demonstration: 48states > Layer Properties dialog box > Symbology tab - used to control

### ArcGIS 9. Geostatistical Analyst

ArcGIS 9 Using ArcGIS Geostatistical Analyst Copyright 2001, 2003 ESRI All Rights Reserved. Printed in the United States of America. The information contained in this document is the exclusive property

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### Mathematics. Probability and Statistics Curriculum Guide. Revised 2010

Mathematics Probability and Statistics Curriculum Guide Revised 2010 This page is intentionally left blank. Introduction The Mathematics Curriculum Guide serves as a guide for teachers when planning instruction

### Spatial Analysis with GeoDa Spatial Autocorrelation

Spatial Analysis with GeoDa Spatial Autocorrelation 1. Background GeoDa is a trademark of Luc Anselin. GeoDa is a collection of software tools designed for exploratory spatial data analysis (ESDA) based

### 3: Summary Statistics

3: Summary Statistics Notation Let s start by introducing some notation. Consider the following small data set: 4 5 30 50 8 7 4 5 The symbol n represents the sample size (n = 0). The capital letter X denotes

### Module 4: Data Exploration

Module 4: Data Exploration Now that you have your data downloaded from the Streams Project database, the detective work can begin! Before computing any advanced statistics, we will first use descriptive

### Applied Spatial Statistics in R, Section 5

Applied Spatial Statistics in R, Section 5 Geostatistics Yuri M. Zhukov IQSS, Harvard University January 16, 2010 Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 5 January

### Summarizing and Displaying Categorical Data

Summarizing and Displaying Categorical Data Categorical data can be summarized in a frequency distribution which counts the number of cases, or frequency, that fall into each category, or a relative frequency

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### Kriging Interpolation

Kriging Interpolation Kriging is a geostatistical interpolation technique that considers both the distance and the degree of variation between known data points when estimating values in unknown areas.

### Numerical Measures of Central Tendency

Numerical Measures of Central Tendency Often, it is useful to have special numbers which summarize characteristics of a data set These numbers are called descriptive statistics or summary statistics. A

### BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### We will use the following data sets to illustrate measures of center. DATA SET 1 The following are test scores from a class of 20 students:

MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

### F. Farrokhyar, MPhil, PhD, PDoc

Learning objectives Descriptive Statistics F. Farrokhyar, MPhil, PhD, PDoc To recognize different types of variables To learn how to appropriately explore your data How to display data using graphs How

### 4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

Data Analysis Plan The appropriate methods of data analysis are determined by your data types and variables of interest, the actual distribution of the variables, and the number of cases. Different analyses

### Central Tendency. n Measures of Central Tendency: n Mean. n Median. n Mode

Central Tendency Central Tendency n A single summary score that best describes the central location of an entire distribution of scores. n Measures of Central Tendency: n Mean n The sum of all scores divided

### Lab 7. Exploratory Data Analysis

Lab 7. Exploratory Data Analysis SOC 261, Spring 2005 Spatial Thinking in Social Science 1. Background GeoDa is a trademark of Luc Anselin. GeoDa is a collection of software tools designed for exploratory

### Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

### Research Variables. Measurement. Scales of Measurement. Chapter 4: Data & the Nature of Measurement

Chapter 4: Data & the Nature of Graziano, Raulin. Research Methods, a Process of Inquiry Presented by Dustin Adams Research Variables Variable Any characteristic that can take more than one form or value.

### GIS Tutorial 1. Lecture 2 Map design

GIS Tutorial 1 Lecture 2 Map design Outline Choropleth maps Colors Vector GIS display GIS queries Map layers and scale thresholds Hyperlinks and map tips 2 Lecture 2 CHOROPLETH MAPS Choropleth maps Color-coded

### Report of for Chapter 2 pretest

Report of for Chapter 2 pretest Exam: Chapter 2 pretest Category: Organizing and Graphing Data 1. "For our study of driving habits, we recorded the speed of every fifth vehicle on Drury Lane. Nearly every

### ArcGIS Geostatistical Analyst: Statistical Tools for Data Exploration, Modeling, and Advanced Surface Generation

ArcGIS Geostatistical Analyst: Statistical Tools for Data Exploration, Modeling, and Advanced Surface Generation An ESRI White Paper August 2001 ESRI 380 New York St., Redlands, CA 92373-8100, USA TEL

### 13.2 Measures of Central Tendency

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

### STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

### Geography 4203 / 5203. GIS Modeling. Class (Block) 9: Variogram & Kriging

Geography 4203 / 5203 GIS Modeling Class (Block) 9: Variogram & Kriging Some Updates Today class + one proposal presentation Feb 22 Proposal Presentations Feb 25 Readings discussion (Interpolation) Last

### Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data

### SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

### Lesson 4 Measures of Central Tendency

Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

### 10-3 Measures of Central Tendency and Variation

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

### business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### 2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

2.0 Lesson Plan Answer Questions 1 Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 2. Summary Statistics Given a collection of data, one needs to find representations

### Spatial Data Analysis

14 Spatial Data Analysis OVERVIEW This chapter is the first in a set of three dealing with geographic analysis and modeling methods. The chapter begins with a review of the relevant terms, and an outlines

### Chapter 3: Central Tendency

Chapter 3: Central Tendency Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents

### Statistical Concepts and Market Return

Statistical Concepts and Market Return 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 2 2. Some Fundamental Concepts... 2 3. Summarizing Data Using Frequency Distributions...

### Descriptive Statistics

Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

### Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

### A Correlation of. to the. South Carolina Data Analysis and Probability Standards

A Correlation of to the South Carolina Data Analysis and Probability Standards INTRODUCTION This document demonstrates how Stats in Your World 2012 meets the indicators of the South Carolina Academic Standards

### Lecture 2. Summarizing the Sample

Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

### GCSE HIGHER Statistics Key Facts

GCSE HIGHER Statistics Key Facts Collecting Data When writing questions for questionnaires, always ensure that: 1. the question is worded so that it will allow the recipient to give you the information

### AMS 7L LAB #2 Spring, 2009. Exploratory Data Analysis

AMS 7L LAB #2 Spring, 2009 Exploratory Data Analysis Name: Lab Section: Instructions: The TAs/lab assistants are available to help you if you have any questions about this lab exercise. If you have any

### Data Visualization Techniques and Practices Introduction to GIS Technology

Data Visualization Techniques and Practices Introduction to GIS Technology Michael Greene Advanced Analytics & Modeling, Deloitte Consulting LLP March 16 th, 2010 Antitrust Notice The Casualty Actuarial

### Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

### Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion

Statistics Basics Sampling, frequency distribution, graphs, measures of central tendency, measures of dispersion Part 1: Sampling, Frequency Distributions, and Graphs The method of collecting, organizing,

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Exploratory Spatial Data Analysis

Exploratory Spatial Data Analysis Part II Dynamically Linked Views 1 Contents Introduction: why to use non-cartographic data displays Display linking by object highlighting Dynamic Query Object classification

### Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

### CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 7, 2014 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data

### consider the number of math classes taken by math 150 students. how can we represent the results in one number?

ch 3: numerically summarizing data - center, spread, shape 3.1 measure of central tendency or, give me one number that represents all the data consider the number of math classes taken by math 150 students.

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### An Introduction to Point Pattern Analysis using CrimeStat

Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

### The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

CONDENSED LESSON 2.1 Box Plots In this lesson you will create and interpret box plots for sets of data use the interquartile range (IQR) to identify potential outliers and graph them on a modified box

### Data Preparation and Statistical Displays

Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability

### Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam. Software Profiling Seminar, Statistics 101

Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam Software Profiling Seminar, 2013 Statistics 101 Descriptive Statistics Population Object Object Object Sample numerical description Object

### Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

### Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

### Common Tools for Displaying and Communicating Data for Process Improvement

Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot

### Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics

Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics Topics covered: Parameters and statistics Sample mean and sample standard deviation Order statistics and

### What is GIS? Geographic Information Systems. Introduction to ArcGIS. GIS Maps Contain Layers. What Can You Do With GIS? Layers Can Contain Features

What is GIS? Geographic Information Systems Introduction to ArcGIS A database system in which the organizing principle is explicitly SPATIAL For CPSC 178 Visualization: Data, Pixels, and Ideas. What Can

### Variables. Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis involves both graphical displays of data and numerical summaries of data. A common situation is for a data set to be represented as a matrix. There is

### Spatial Data Analysis Using GeoDa. Workshop Goals

Spatial Data Analysis Using GeoDa 9 Jan 2014 Frank Witmer Computing and Research Services Institute of Behavioral Science Workshop Goals Enable participants to find and retrieve geographic data pertinent

### Intro to Statistics 8 Curriculum

Intro to Statistics 8 Curriculum Unit 1 Bar, Line and Circle Graphs Estimated time frame for unit Big Ideas 8 Days... Essential Question Concepts Competencies Lesson Plans and Suggested Resources Bar graphs

### Frequency distributions, central tendency & variability. Displaying data

Frequency distributions, central tendency & variability Displaying data Software SPSS Excel/Numbers/Google sheets Social Science Statistics website (socscistatistics.com) Creating and SPSS file Open the

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### Histogram. Graphs, and measures of central tendency and spread. Alternative: density (or relative frequency ) plot /13/2004

Graphs, and measures of central tendency and spread 9.07 9/13/004 Histogram If discrete or categorical, bars don t touch. If continuous, can touch, should if there are lots of bins. Sum of bin heights

### Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers) B Bar graph a diagram representing the frequency distribution for nominal or discrete data. It consists of a sequence

### Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize

### Intro to GIS Winter 2011. Data Visualization Part I

Intro to GIS Winter 2011 Data Visualization Part I Cartographer Code of Ethics Always have a straightforward agenda and have a defining purpose or goal for each map Always strive to know your audience

### AMARILLO BY MORNING: DATA VISUALIZATION IN GEOSTATISTICS

AMARILLO BY MORNING: DATA VISUALIZATION IN GEOSTATISTICS William V. Harper 1 and Isobel Clark 2 1 Otterbein College, United States of America 2 Alloa Business Centre, United Kingdom wharper@otterbein.edu

### Characteristics and statistics of digital remote sensing imagery

Characteristics and statistics of digital remote sensing imagery There are two fundamental ways to obtain digital imagery: Acquire remotely sensed imagery in an analog format (often referred to as hard-copy)

### Introduction. One of the most convincing and appealing ways in which statistical results may be presented is through diagrams and graphs.

Introduction One of the most convincing and appealing ways in which statistical results may be presented is through diagrams and graphs. Just one diagram is enough to represent a given data more effectively

### Workshop: Using Spatial Analysis and Maps to Understand Patterns of Health Services Utilization

Enhancing Information and Methods for Health System Planning and Research, Institute for Clinical Evaluative Sciences (ICES), January 19-20, 2004, Toronto, Canada Workshop: Using Spatial Analysis and Maps

### Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

### MAT 12O ELEMENTARY STATISTICS I

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 12O ELEMENTARY STATISTICS I 3 Lecture Hours, 1 Lab Hour, 3 Credits Pre-Requisite:

### Session 1.6 Measures of Central Tendency

Session 1.6 Measures of Central Tendency Measures of location (Indices of central tendency) These indices locate the center of the frequency distribution curve. The mode, median, and mean are three indices

### Describing Data. We find the position of the central observation using the formula: position number =

HOSP 1207 (Business Stats) Learning Centre Describing Data This worksheet focuses on describing data through measuring its central tendency and variability. These measurements will give us an idea of what