What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study



Similar documents
Lecture 2: Descriptive Statistics and Exploratory Data Analysis

STAT355 - Probability & Statistics

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Fairfield Public Schools

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Foundation of Quantitative Data Analysis

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

4.1 Exploratory Analysis: Once the data is collected and entered, the first question is: "What do the data look like?"

How To Check For Differences In The One Way Anova

How To Write A Data Analysis

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

MTH 140 Statistics Videos

What is the purpose of this document? What is in the document? How do I send Feedback?

A Correlation of. to the. South Carolina Data Analysis and Probability Standards

430 Statistics and Financial Mathematics for Business

Week 1. Exploratory Data Analysis

Exploratory data analysis (Chapter 2) Fall 2011

Chapter 7 Section 7.1: Inference for the Mean of a Population

MBA 611 STATISTICS AND QUANTITATIVE METHODS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Exploratory Data Analysis

Lecture 1: Review and Exploratory Data Analysis (EDA)

Diagrams and Graphs of Statistical Data

COMMON CORE STATE STANDARDS FOR

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

3: Summary Statistics

Descriptive Analysis

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Statistics and Probability (Data Analysis)

Street Address: 1111 Franklin Street Oakland, CA Mailing Address: 1111 Franklin Street Oakland, CA 94607

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

An Introduction to Statistics using Microsoft Excel. Dan Remenyi George Onofrei Joe English

Mathematics within the Psychology Curriculum

Descriptive Statistics

Quantitative Methods for Finance

Data Analysis, Statistics, and Probability

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Description. Textbook. Grading. Objective

Exercise 1.12 (Pg )

3. Data Analysis, Statistics, and Probability

Variables. Exploratory Data Analysis

Why do statisticians "hate" us?

Analyzing and interpreting data Evaluation resources from Wilder Research

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Characteristics of Binomial Distributions

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Data Exploration Data Visualization

AP Statistics: Syllabus 1

INTRODUCING DATA ANALYSIS IN A STATISTICS COURSE IN ENVIRONMENTAL SCIENCE STUDIES

Karyn Ruiz-Cordell, MA, PhD Shunda Irons-Brown, PhD, MBA, CHCP Tamar Sapir, PhD

INTRODUCING THE NORMAL DISTRIBUTION IN A DATA ANALYSIS COURSE: SPECIFIC MEANING CONTRIBUTED BY THE USE OF COMPUTERS

STATISTICAL DATA ANALYSIS

Descriptive Statistics and Exploratory Data Analysis

Module 3: Correlation and Covariance

Now we begin our discussion of exploratory data analysis.

Chi Square Tests. Chapter Introduction

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Analysis of Variance (ANOVA) Using Minitab

9. Sampling Distributions

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

Simple linear regression

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

THE OPEN SOURCE SOFTWARE R IN THE STATISTICAL QUALITY CONTROL

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Descriptive statistics parameters: Measures of centrality

Chapter 7. One-way ANOVA

Practice#1(chapter1,2) Name

II. DISTRIBUTIONS distribution normal distribution. standard scores

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Geostatistics Exploratory Analysis

Dealing with Missing Data

MEASURES OF LOCATION AND SPREAD

Introduction to time series analysis

WEEK #22: PDFs and CDFs, Measures of Center and Spread

List of Examples. Examples 319

Dongfeng Li. Autumn 2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Mean = (sum of the values / the number of the value) if probabilities are equal

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

Basics of Statistics

White Paper Combining Attitudinal Data and Behavioral Data for Meaningful Analysis

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Introduction to Statistics and Quantitative Research Methods

Exploratory Data Analysis

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

How Far is too Far? Statistical Outlier Detection

What Does the Normal Distribution Sound Like?

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

Statistics 2014 Scoring Guidelines

Module 4: Data Exploration

Transcription:

Kerala School of Mathematics Course in Statistics for Scientists Introduction to Data Analysis T.Krishnan Strand Life Sciences, Bangalore What is Data Analysis Statistics is a body of methods how to use numbers to elucidate rather than to mislead Statisticians work in many areas probability, exploratory data analysis, modelling, social policy, decision making, and others two fundamental tasks: description and inference Description involves characterizing a batch of data in simple but informative ways, including graphically Inference involves generalizing from a sample of data to a larger population of possible data Descriptive statistics help us to observe more acutely Inferential statistics help us to formulate and test hypotheses ts Steps in a Statistical Study Exploration of Data plan the study understand background and collect questions and issues collect data check the data for errors explore the data review the initial questions generate hypotheses and build statistical models analyze residuals and review hypotheses and models interpret and make recommendations clean and sanitize the data check validity of the values check for missing values and deal with them check for outliers and deal with them understand the data make tables, charts, graphs check if there are groups in the data make transforms if needed check for standard assumptions ts

Descriptive Measures Graphical Representation of Data what measures to use depends on data and purpose mean, median, mode, variance, standard deviation, range, inter-quartile range, etc. depends on nature of distribution symmetric, skewed, outliers, tails (light or heavy), etc. shall discuss in the Descriptive Statistics presentation what graphs to use depend on nature of data and purpose careful not to mislead graphs before, during, and after data analysis Graphs before: visual representation of data and summaries: e.g. bar chart Graphs during: check assumptions and model fit : normal prob plot Graphs after: check assumptions; e.g., normal probability plot of residuals present results: parameter estimates, say in log-linear models ts Reasonable Graph Overemphasized Graph ts

Inferential Statistics want to do more than describe the sample generalize, formulate a policy, or test a hypothesis, to make an inference, to classify, to predict inference implies that we think a model describes a more general population from which our data have been randomly sampled when you make inferences, you should have a population in mind finite and infinite populations Population, Sample, Statistical Inference to use inferential methods to estimate the mean age of India s population on 1 April 2001 could enter all N ages into a SYSTAT file and compute the mean age exactly If practical, this is the preferred method census method sometimes, a sampling estimate can be more accurate than an entire census biases are introduced into large censuses from refusals to comply, keying or coding errors, and other sources a carefully constructed random sample can yield less-biased information about the population it is the analyst s responsibility to ensure that the sample is representative of the larger group (population) on all attributes that might affect the results more on this in Survey Sampling ts Computing Aids and Statistical Analysis Types of Statistical Software Data Analysis with hand computation mechanical calculator electronic calculator with functions electronic computer electronic computer with subroutine packages menu-driven software packages ts Level 1: Excel, MatLab, StatGraphics, Statview limited statistical features; menu-driven; easy to learn and use Level 2: JMP, SPSS, MINITAB, SYSTAT, STATISTICA statistical software packages; more comprehensive features; menu-driven with command-line windows; moderate cost Level 3: SAS, S-PLUS statistical software packages; for expert users; command-line driven; very comprehensive; sophisticated features; very expensive Level 4: R freeware; command-driven; a somewhat steep learning curve Level 5: BUGS, MRBAYES statistical software packages for specialized uses base module + optional add-on modules or toolboxes Many softwares have simpler less expensive or free

Using a Software When using a software for data analysis: Don t be blind to the data set Formulate the issues to be resolved Examine assumptions Analyse by alternative methods Investigate methods suitable Examine the software Use computer-intensive methods Exploratory Data Analysis ts Data Files Data Cleaning be aware that almost every data set is likely to be polluted errors, incompleteness, and other inadequacies especially those data sets obtained or imported from different sources some of the common sources of errors: typing errors or data entry errors coding errors measurement errors missing values detection of errors correction of errors missing value imputation detection of outliers (elimination?) finding groups (lack of homegeneity) need for transformations ts

Data Cleaning Tools Descriptive Statistics cross tabulation bar charts descriptive statistics graphical displays box plots (outliers) density plots (mixture of groups) ts Crosstabulation Outliers ts

Mixtures Transformations ts