Implications of Big Data for Statistics Instruction 17 Nov 2013



Similar documents
430 Statistics and Financial Mathematics for Business

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

MTH 140 Statistics Videos

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Quantitative Methods for Finance

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Diagrams and Graphs of Statistical Data

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Summarizing and Displaying Categorical Data

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Data Exploration Data Visualization

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Unit 1: Introduction to Quality Management

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Alabama Department of Postsecondary Education

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Description. Textbook. Grading. Objective

Lecture 1: Review and Exploratory Data Analysis (EDA)

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Exploratory data analysis (Chapter 2) Fall 2011

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

PCHS ALGEBRA PLACEMENT TEST

A Correlation of. to the. South Carolina Data Analysis and Probability Standards


Week 1. Exploratory Data Analysis

Variables. Exploratory Data Analysis

Fairfield Public Schools

Exercise 1.12 (Pg )

STAT 360 Probability and Statistics. Fall 2012

Exploratory Data Analysis. Psychology 3256

Geostatistics Exploratory Analysis

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, We will continue to score

Final Exam Practice Problem Answers

International College of Economics and Finance Syllabus Probability Theory and Introductory Statistics

UNIT 1: COLLECTING DATA

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Mathematics within the Psychology Curriculum

2. Filling Data Gaps, Data validation & Descriptive Statistics

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Some Essential Statistics The Lure of Statistics

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

How To Write A Data Analysis

Exploratory Data Analysis

Math 1342 STATISTICS Course Syllabus

03 The full syllabus. 03 The full syllabus continued. For more information visit PAPER C03 FUNDAMENTALS OF BUSINESS MATHEMATICS

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

James Madison University

Simple Linear Regression

STAT355 - Probability & Statistics

Visualization Quick Guide

Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

A DATA ANALYSIS TOOL THAT ORGANIZES ANALYSIS BY VARIABLE TYPES. Rodney Carr Deakin University Australia

STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

Algebra 1 Course Information

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

DESCRIPTIVE STATISTICS - CHAPTERS 1 & 2 1

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

COMMON CORE STATE STANDARDS FOR

Teaching Business Statistics through Problem Solving

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Statistics. Measurement. Scales of Measurement 7/18/2012

Course Syllabus MATH 110 Introduction to Statistics 3 credits

Module 3: Correlation and Covariance

Organizing Your Approach to a Data Analysis

An introduction to using Microsoft Excel for quantitative data analysis

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

The right edge of the box is the third quartile, Q 3, which is the median of the data values above the median. Maximum Median

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

I ~ 14J... <r ku...6l &J&!J--=-O--

INTRODUCING THE NORMAL DISTRIBUTION IN A DATA ANALYSIS COURSE: SPECIFIC MEANING CONTRIBUTED BY THE USE OF COMPUTERS

Elements of statistics (MATH0487-1)

Moraine Valley Community College Course Syllabus

TRAINING PROGRAM INFORMATICS

Street Address: 1111 Franklin Street Oakland, CA Mailing Address: 1111 Franklin Street Oakland, CA 94607

Common Core Unit Summary Grades 6 to 8

Introduction to Exploratory Data Analysis

Using SPSS, Chapter 2: Descriptive Statistics

Dongfeng Li. Autumn 2010

How To Check For Differences In The One Way Anova

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Statistics with Aviation Applications Math 211 Mode of Delivery Lecture Blended Course Syllabus

UNIVERSITY of MASSACHUSETTS DARTMOUTH Charlton College of Business Decision and Information Sciences Fall 2010

1.3 Measuring Center & Spread, The Five Number Summary & Boxplots. Describing Quantitative Data with Numbers

Lecture 2. Summarizing the Sample

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Transcription:

Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November 17, 2013 Teaching Introductory Business Statistics to Undergraduates in an Era of Big Data The integration of business, Big Data and statistics is both necessary and long overdue. Kaiser Fung (Significance, August 2013) Computer Scientists and Statisticians Must Coordinate to Accomplish a Common Goal: Making Reliable Decisions from the Available Data. Computer Scientist s Concern is Data Management Statistician s Concern is Data Analysis Computer Scientist s Interest is in Quantity of Data Statistician s Interest is in Quality of Data Computer Scientist s Decisions are Based on Frequency of Counts Statistician s Decisions are Based on Magnitude of Effect Kaiser Fung (Significance, August 2013) Bigger n Doesn t Necessarily Mean Better Results 128,053,180 was the USA population in 1936 78,000,000 were Voting Age Eligible (61.0%) 27,752,648 voted for Roosevelt (60.8%) 16,681,862 voted for Landon (36.5%) 10,000,000 received mailed surveys from Literary Digest 2,300,000 responded to the mailed survey A Proposal for an Introductory Undergraduate Business Statistics Course in an Era of Big Data Course Constraints: Type: One 3 Credit Core Required Course Prerequisite: Intermediate Algebra Articulation: 19 NJ community colleges Software: Excel Student Body: Preparation assessment A New Business Statistics Course Note: In the next several slides all statements in GREEN font reflect items or topics relevant to a proposed introductory business statistics course in an era of Big Data that are not typically taught at the present time. 2013 Berenson DSI MSMESB Slides.pdf 1

Implications of Big Data for Statistics Instruction 17 Nov 2013 Goals for the New Business Statistics Course Numerical literacy is essential in business and the discipline of statistics enables students to learn to: Visualize Data Draw Inferences Make Predictions Manage Processes regardless of sample size Course Topics A. Introduction to Business Statistics An historical Introduction to the subject and practice of statistics as it evolves into an era of Big Data A list of key terms used in the practice of statistics A review of the fundamentals of business numeracy Proportions, Percentages, Percentage Change, Rates, Ratios, Odds, Index Numbers Course Topics B. Obtaining Data Data Types: Structured vs. Unstructured. Data Sources: Surveys Experiments Observational Studies Primary/Secondary Source Acquisitions Transactions Log Data Email Social Media Sensors Free form Text Geospatial Audio Image Data Outcomes: Categorical Numerical Other Course Topics B. Obtaining Data Course Topics C. Categorical Data Visualization and Description Summary Table; Bar Chart and Pareto Chart Mode Cross classification Table; Side by Side Bar Chart Pivot Table Dashboards Report Cards Interactive Graphs Course Topics: D. Numerical Data Visualization and Description Ordered Array Frequency and Percentage Distributions using Sturges Rule for Bin Groupings Histogram and Percentage Polygon Boxplot with Five Number Summary Deciles and Percentiles Central Tendency: Mean, Median, Trimean, 10 % Trimmed Mean Variation: Standard Deviation, IQR, IDR, CV Skewness: Stine & Foster K 3 Kurtosis: Stine & Foster K 4 Searching for Outliers: Z value, Tukey EDA methods 2013 Berenson DSI MSMESB Slides.pdf 2

Implications of Big Data for Statistics Instruction 17 Nov 2013 Course Topics E. Probability Definitions: A Priori, Empirical, Subjective Marginal, Joint and Conditional Probability Bayes Theorem Course Topics F. Probability Distributions Discrete: Definition and Examples Continuous: Definition and Examples The Standardized Normal Distribution Assessing Normality Course Topics G. Sampling Distributions Sampling Distribution of the Mean Sampling Distribution of the Proportion The Central Limit Theorem Course Topics H. Inference Probability Sampling in Surveys and Randomization in Experiments C. I. E. of the Population Mean C. I. E. of the Population Proportion Concept of Effect Size for Comparing Two Groups (A/B Testing) C. I. E. of the Difference in Two Independent Group Means C. I. E. of the Standardized Mean Difference Effect Size C. I. E. of the Population Point Biserial Correlation Effect Size C. I. E. of the Difference in Two Independent Group Proportions Phi Coefficient Measure of Association in 2x2 Tables C. I. E. of the Population Odds Ratio Effect Size Course Topics: I. Simple Linear Regression Modeling Descriptive Analysis and Assessment of Model Appropriateness Effect Size for the Slope and for the Coefficient of Correlation Confidence Interval Estimate of the Mean Response Prediction Interval Estimate of the Individual Response Course Topics J. Quality Management Introduction to Process Management The Use of Control Charts 2013 Berenson DSI MSMESB Slides.pdf 3

Implications of Big Data for Statistics Instruction 17 Nov 2013 Summary and Conclusions A course in Business Statistics needs to be modified to maintain its relevance in an era of Big Data. Business statistics textbooks must adapt its topic coverage to introduce methodology relevant to a Big Data environment the subject of inference must be re engineered. The time has come for AACSB accredited undergraduate programs to include a corerequired course in Business Analytics as a sequel to a course in Business Statistics. 2013 Berenson DSI MSMESB Slides.pdf 4

Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November 17, 2013

Teaching Introductory Business Statistics to Undergraduates in an Era of Big Data The integration of business, Big Data and statistics is both necessary and long overdue. Kaiser Fung (Significance, August 2013)

Computer Scientists and Statisticians Must Coordinate to Accomplish a Common Goal: Making Reliable Decisions from the Available Data. Computer Scientist s Concern is Data Management Statistician s Concern is Data Analysis Computer Scientist s Interest is in Quantity of Data Statistician s Interest is in Quality of Data Computer Scientist s Decisions are Based on Frequency of Counts Statistician s Decisions are Based on Magnitude of Effect Kaiser Fung (Significance, August 2013)

Bigger n Doesn t Necessarily Mean Better Results 128,053,180 was the USA population in 1936 78,000,000 were Voting Age Eligible (61.0%) 27,752,648 voted for Roosevelt (60.8%) 16,681,862 voted for Landon (36.5%) 10,000,000 received mailed surveys from Literary Digest 2,300,000 responded to the mailed survey

A Proposal for an Introductory Undergraduate Business Statistics Course in an Era of Big Data Course Constraints: Type: One 3 Credit Core Required Course Prerequisite: Intermediate Algebra Articulation: 19 NJ community colleges Software: Excel Student Body: Preparation assessment

A New Business Statistics Course Note: In the next several slides all statements in GREEN font reflect items or topics relevant to a proposed introductory business statistics course in an era of Big Data that are not typically taught at the present time.

Goals for the New Business Statistics Course Numerical literacy is essential in business and the discipline of statistics enables students to learn to: Visualize Data Draw Inferences Make Predictions Manage Processes regardless of sample size

Course Topics A. Introduction to Business Statistics An historical Introduction to the subject and practice of statistics as it evolves into an era of Big Data A list of key terms used in the practice of statistics A review of the fundamentals of business numeracy Proportions, Percentages, Percentage Change, Rates, Ratios, Odds, Index Numbers

Course Topics B. Obtaining Data Data Types: Structured vs. Unstructured. Data Sources: Surveys Experiments Observational Studies Primary/Secondary Source Acquisitions Transactions Log Data Email Social Media Sensors Free form Text Geospatial Audio Image

Data Outcomes: Categorical Numerical Other Course Topics B. Obtaining Data

Course Topics C. Categorical Data Visualization and Description Summary Table; Bar Chart and Pareto Chart Mode Cross classification Table; Side by Side Bar Chart Pivot Table Dashboards Report Cards Interactive Graphs

Course Topics: D. Numerical Data Visualization and Description Ordered Array Frequency and Percentage Distributions using Sturges Rule for Bin Groupings Histogram and Percentage Polygon Boxplot with Five Number Summary Deciles and Percentiles Central Tendency: Mean, Median, Trimean, 10 % Trimmed Mean Variation: Standard Deviation, IQR, IDR, CV Skewness: Stine & Foster K 3 Kurtosis: Stine & Foster K 4 Searching for Outliers: Z value, Tukey EDA methods

Course Topics E. Probability Definitions: A Priori, Empirical, Subjective Marginal, Joint and Conditional Probability Bayes Theorem

Course Topics F. Probability Distributions Discrete: Definition and Examples Continuous: Definition and Examples The Standardized Normal Distribution Assessing Normality

Course Topics G. Sampling Distributions Sampling Distribution of the Mean Sampling Distribution of the Proportion The Central Limit Theorem

Course Topics H. Inference Probability Sampling in Surveys and Randomization in Experiments C. I. E. of the Population Mean C. I. E. of the Population Proportion Concept of Effect Size for Comparing Two Groups (A/B Testing) C. I. E. of the Difference in Two Independent Group Means C. I. E. of the Standardized Mean Difference Effect Size C. I. E. of the Population Point Biserial Correlation Effect Size C. I. E. of the Difference in Two Independent Group Proportions Phi Coefficient Measure of Association in 2x2 Tables C. I. E. of the Population Odds Ratio Effect Size

Course Topics: I. Simple Linear Regression Modeling Descriptive Analysis and Assessment of Model Appropriateness Effect Size for the Slope and for the Coefficient of Correlation Confidence Interval Estimate of the Mean Response Prediction Interval Estimate of the Individual Response

Course Topics J. Quality Management Introduction to Process Management The Use of Control Charts

Summary and Conclusions A course in Business Statistics needs to be modified to maintain its relevance in an era of Big Data. Business statistics textbooks must adapt its topic coverage to introduce methodology relevant to a Big Data environment the subject of inference must be re engineered. The time has come for AACSB accredited undergraduate programs to include a corerequired course in Business Analytics as a sequel to a course in Business Statistics.