Business Statistics: Intorduction



Similar documents
Elementary Statistics

Chapter 1: The Nature of Probability and Statistics

Concepts of Variables. Levels of Measurement. The Four Levels of Measurement. Nominal Scale. Greg C Elvers, Ph.D.

Analyzing Research Data Using Excel

Descriptive Statistics and Measurement Scales

Statistics Review PSY379

Lecture 2: Types of Variables


Chapter 1: Data and Statistics GBS221, Class January 28, 2013 Notes Compiled by Nicolas C. Rouse, Instructor, Phoenix College

Measurement and Measurement Scales

II. DISTRIBUTIONS distribution normal distribution. standard scores

S P S S Statistical Package for the Social Sciences

Basic Concepts in Research and Data Analysis

SOST 201 September 18-20, Measurement of Variables 2

Topic #1: Introduction to measurement and statistics

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Chapter 4. Probability and Probability Distributions

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MBA 611 STATISTICS AND QUANTITATIVE METHODS

Supply Chain Management: Risk pooling

DATA COLLECTION AND ANALYSIS

Levels of measurement in psychological research:

Measurement. How are variables measured?

IBM SPSS Direct Marketing 22

Northumberland Knowledge

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Simple Predictive Analytics Curtis Seare

IBM SPSS Statistics for Beginners for Windows

Name: Date: Use the following to answer questions 3-4:

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

IBM SPSS Direct Marketing 23

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Math and Science Bridge Program. Session 1 WHAT IS STATISTICS? 2/22/13. Research Paperwork. Agenda. Professional Development Website

Lecture 6 - Data Mining Processes

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS INTRODUCTION

11. Analysis of Case-control Studies Logistic Regression

Chapter 24. What will you learn in this chapter? Valuing an economy. Measuring the Wealth of Nations

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Data Mining 5. Cluster Analysis

SPSS Workbook 1 Data Entry : Questionnaire Data

Statistical Profile of New Brunswick s Publicly Funded Universities

Statistics. Measurement. Scales of Measurement 7/18/2012

Social Work Statistics Spring 2000

Chapter 3 Review Math 1030

Introduction to Statistics and Quantitative Research Methods

Introduction; Descriptive & Univariate Statistics

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A) ±1.88 B) ±1.645 C) ±1.96 D) ±2.

Midterm Review Problems

- 2 - Chart 2. Annual percent change in hourly compensation costs in manufacturing and exchange rates,

Statistical appendix. A.1 Introduction

Framing Business Problems as Data Mining Problems

IBM SPSS Direct Marketing 19

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

Introduction to Regression and Data Analysis

Determine whether the data are qualitative or quantitative. 8) the colors of automobiles on a used car lot Answer: qualitative

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Science Stage 6 Skills Module 8.1 and 9.1 Mapping Grids

Foundation of Quantitative Data Analysis

parent ROADMAP MATHEMATICS SUPPORTING YOUR CHILD IN HIGH SCHOOL

1.1 What is Statistics?

Social Media Mining. Data Mining Essentials

Week 1. Exploratory Data Analysis

Descriptive Statistics

Visualization Software

Measurement & Data Analysis. On the importance of math & measurement. Steps Involved in Doing Scientific Research. Measurement

Practice#1(chapter1,2) Name

Online Survey of Employees Without Workplace Retirement Plans

How To Understand The Scientific Theory Of Evolution

KEY FINDINGS. 2 New trends in global shopping habits. Smartphones are increasingly important during all stages of the consumer journey

BS2551 Money Banking and Finance. Institutional Investors

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Lecture Note 1 Set and Probability Theory. MIT Spring 2006 Herman Bennett

Microsoft Azure Machine learning Algorithms

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Classify the data as either discrete or continuous. 2) An athlete runs 100 meters in 10.5 seconds. 2) A) Discrete B) Continuous

Probability and statistics; Rehearsal for pattern recognition

Intro to GIS Winter Data Visualization Part I

Statistics 151 Practice Midterm 1 Mike Kowalski

Data Analysis and Interpretation. Eleanor Howell, MS Manager, Data Dissemination Unit State Center for Health Statistics

(Refer Slide Time 00:56)

Chi Square Tests. Chapter Introduction

Data Mining for Business Analytics

How retail investors in Australia view the importance of social and digital media channels

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

Designer: Nathan Kimball. Stage 1 Desired Results

Decision Trees What Are They?

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory Data Analysis

Social Media and Job Searching - A France Overview

Health & Safety The Compensation Fund 2013

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

An Introduction to SPSS. Workshop Session conducted by: Dr. Cyndi Garvan Grace-Anne Jackman

School of Mathematics and Science MATH 153 Introduction to Statistical Methods Section: WE1 & WE2

Transcription:

Business Statistics: Intorduction Donglei Du (ddu@unb.edu) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 September 23, 2015 Donglei Du (UNB) AlgoTrading September 23, 2015 1 / 45

Table of contents 1 Why Statistics? 2 What is Statistics? 3 Two methodologies in Science 4 Variables and types 5 Sources of Statistical Data 6 Software for Statistical Analysis 7 Materials to learn R 8 A brief tutorial of R with a case study Donglei Du (UNB) AlgoTrading September 23, 2015 2 / 45

Section 1 Why Statistics? Donglei Du (UNB) AlgoTrading September 23, 2015 3 / 45

Figure: Source: http://mechanicalforex.com/2015/05/ building-algorithmic-trading-systems-for-the-forex-market-par html Donglei Du (UNB) AlgoTrading September 23, 2015 4 / 45

Why Statistics? This is a required course for your degree. It is a prerequisite for many other topics. Data everywhere, particularly in this big data era. Sampling vs censusing. Decision Making: Statistics will help you make important decision. Donglei Du (UNB) AlgoTrading September 23, 2015 5 / 45

Data everywhere: Example 1: productivity and standard of living of a nation 80 8 0.8 wages Bulgaria Egypt India Pakistan Indonesia Italy France Germany Canada Japan 2 20 200 Labor productivity Figure: High productivity is a key to high standard of living U.K. US Two important statistics that a nation is most concerned are the productivity and standard of living. Productivity is usually measured in terms of output per worker, and standard of living is measured in terms of wages per worker. They are usually strongly related. Countries with high productivity in general are seen with high standard of living (Left figure) Donglei Du (UNB) AlgoTrading September 23, 2015 6 / 45

Data everywhere: Example 2: Consumer Price Index (CPI) in Canada The Consumer Price Index measures the rate of price change for goods and services bought by consumers in a country. It is a statistic constructed using the prices of a sample of representative items whose prices are collected periodically. For instance, the CPI All-items for Canada for the month of July 2013 was 123.1 (2000=100), meaning that consumer prices were 23.1% higher in July 2013 than in 2000. [hyphens]http://www.statcan.gc.ca/tables-tableaux/sumsom/l01/cst01/cpis01a-eng.htm Donglei Du (UNB) AlgoTrading September 23, 2015 7 / 45

Data everywhere: Example 2: Consumer Price Index (CPI) in Canada The CPI directly or indirectly affects nearly all Canadians. Old Age Security pensions, Canada Pension Plan payments, and other forms of social and welfare payments are adjusted periodically to take account of changes in the CPI. Rental agreements, spousal and child support payments and other forms of contractual and price-setting arrangements are frequently tied in some manner to movements in the CPI. Cost-of-living adjustment (COLA) clauses link wage increases to movements in the CPI. Labour contracts governing the wages of many Canadian workers include COLA clauses. Donglei Du (UNB) AlgoTrading September 23, 2015 8 / 45

Sampling vs censusing? Costs of surveying the entire population may be too large or prohibitive e.g., Television networks monitor the popularity of their programs; Destruction of elements during investigation e.g., Manufacturers estimate the average lifetime of light bulbs; doctors take a blood sample to check for disease. Unknown future e.g., stock index; temperature tomorrow Donglei Du (UNB) AlgoTrading September 23, 2015 9 / 45

Decision Making How do large retailers (like COSTO, Walmart) fill their store shelves so as to meet the customer demand while minimizing their operating cost? How do doctors in the hospital make diagnose? How do political leaders run their campaign? How do Investment Banks in Wall Street (or Bay Street) decide which stock (or stocks) to invest? How do insurance companies decide the premium for a particular client? How do car dealers decide how many car models of each brand to be kept in their locations? Donglei Du (UNB) AlgoTrading September 23, 2015 10 / 45

Section 2 What is Statistics? Donglei Du (UNB) AlgoTrading September 23, 2015 11 / 45

What is Statistics? It is the science and art of collecting, organizing, and representing data in such a way that the characteristics and patterns of the data can be easily captured (Descriptive Statistics); also, of estimating attributes and drawing inference from a sample about the entire population (Inferential Statistics). Donglei Du (UNB) AlgoTrading September 23, 2015 12 / 45

Examples: descriptive or inferential? In 1995, 45% of Canadian households owned a computer and 25% were connected to internet; On average, Canadians spend 1.3 hours per day commuting, and 1.5 hours per day with their children. descriptive The accounting department of a firm will select a sample of invoices to check for accuracy of all the invoices of the company. inferential Donglei Du (UNB) AlgoTrading September 23, 2015 13 / 45

Terminologies Population vs sample Population: a set of all elements of interests in an investigation Sample: a subset of all elements of a population Parameter vs Statistic Parameter: a measurable characteristic of a Population (such as average, extreme, variation, proportion) Statistic: a measurable characteristic of a Sample Donglei Du (UNB) AlgoTrading September 23, 2015 14 / 45

Section 3 Two methodologies in Science Donglei Du (UNB) AlgoTrading September 23, 2015 15 / 45

Two methodologies in Science Deductive: general particular Mathematics: Axioms + logic Inductive: particular general Most natural sciences, like physics, chemistry, and Statistics... a.k.a, the method of Experimentation Donglei Du (UNB) AlgoTrading September 23, 2015 16 / 45

The method of Experimentation 1 Define the experimental goal or a working hypothesis 2 Design an experiment 3 Collect and represent data 4 Estimate the values/relations 5 Draw inferences 6 Predict and prepare policy analysis Donglei Du (UNB) AlgoTrading September 23, 2015 17 / 45

Section 4 Variables and types Donglei Du (UNB) AlgoTrading September 23, 2015 18 / 45

Variable 1 A variable is a characteristic/attribute of a population/sample that is interest in a particular investigation and the value of the variable can "vary" from one entity to another. A person s gender is a variable, which could have the value of "Male" for one person and "Female" for another. The rank of faculty members in Business Administration is a variable, which could have the value of "Full Professor" for one person, "Associate Professor" for another, and " Assistant Professor" for yet another. Temperatures in this classroom is a variable which could have the value of "20" or "100". Annual salary of NBA players (or hockey players in Canada), which could have the value of "10M" or "5M". Donglei Du (UNB) AlgoTrading September 23, 2015 19 / 45

Types of Data: Qualitative vs. Quantitative Variables Qualitative (a.k.a. categorical) Qualitative variables take on values that are names or labels. Examples: gender, country names, color Quantitative (a.k.a, numeric) Quantitative variables are numeric. Examples: number of bedrooms in houses, number of minutes to the end of this class, distance between two cities Donglei Du (UNB) AlgoTrading September 23, 2015 20 / 45

Types of Data: Discrete vs. Continuous Variables Quantitative variables can be further classified as discrete or continuous Continuous Variable: a variable can take on any value within a range; Examples: the number of minutes to the end of this class, distance between two cities, pressure in a tire, weight of a pork chop, height of students in a class Discrete variable: a variable can take only certain value (finite or countably infinite) within a range. Examples: number of bedrooms in houses Donglei Du (UNB) AlgoTrading September 23, 2015 21 / 45

Data/variable representation Language such as English, French, Chinese: This is the natural and direct way Such as "All the people in Canada" Numbers: This is the mathematical and indirect way such as, data representation in computer: binary numbers: 0,1 Donglei Du (UNB) AlgoTrading September 23, 2015 22 / 45

Level of measurement Suppose now we represent all data /variable using numbers, then Level of measurement (a.k.a. scales of measure) are the different ways numbers can be used. There are four levels of measurements Nominal Ordinal Interval Ratio However, representing variables as numbers does not give you the license to perform the regular logical/arithmetic operations all the time (such as comparison, addition, subtraction, multiplication, and division etc.); or to infer anything about the magnitude or quantitative difference between the numbers. Donglei Du (UNB) AlgoTrading September 23, 2015 23 / 45

Nominal level A variable is at the nominal level if none of the five operations (namely comparison, addition, subtraction, multiplication, and division) is meaningful. At the nominal level of measurement, numbers are assigned to a set of mutually exclusive and exhaustive categories for the purpose of naming, labeling, or classifying the observations, but no arithmetic operation is meaningful; where Mutually exclusive: any individual object is included in ONLY ONE CATEGORY Exhaustive: any individual object MUST APPEAR in one of the categories Donglei Du (UNB) AlgoTrading September 23, 2015 24 / 45

Examples Examples Barcodes Social insurance numbers (SIN) Student IDs Phones numbers The fact that the barcode for one product is higher than that for another, or that your SIN is higher than mine tells us nothing. In surveys we often use arbitrary numbers to code variables such as religion, ethnicity, major in college or gender. Donglei Du (UNB) AlgoTrading September 23, 2015 25 / 45

Ordinal level A variable is at the ordinal level if only comparison is meaningful. Donglei Du (UNB) AlgoTrading September 23, 2015 26 / 45

Examples Examples Rank of faculty members Maclean ranking of Canadian colleges: [hyphens]http://oncampus.macleans.ca/education/2012/11/01/20 comprehensive/ US News Ranking of US/world colleges http://www.usnews.com/rankings The differences between data values cannot be determined or are meaningless. For instance, first class is better than economy and that business is in between. Just how much better first class is compared to business, and business compared to economy varies from airline to airline, and even from flight to flight. Donglei Du (UNB) AlgoTrading September 23, 2015 27 / 45

Interval A variable is at the interval level if only division is meaningless. Reason: Interval data have meaningful intervals between measurements, but there is no true starting point (zero). Donglei Du (UNB) AlgoTrading September 23, 2015 28 / 45

Examples Examples Temperatures in Celsius and Fahrenheit are interval data Certainly order is important and intervals are meaningful. However, a 20 C dashboard is not twice as hot as the 10 C outside. A conversion formula between Celsius and Fahrenheit F = 9 5 C + 32 So 10 C = 50 F and 20 C = 68 F. Obviously 20 C 10 C = 68 F 50 F Calender year: arbitrary 0 year: the birth of Jesus IQ score: non-arbitrary 0 score: Donglei Du (UNB) AlgoTrading September 23, 2015 29 / 45

Ratio A variable is at the ratio level if all logical/arithmetic operations are meaningful. Reason: Ratios between measurements as well as intervals are meaningful because there is a non-arbitrary zero point. Donglei Du (UNB) AlgoTrading September 23, 2015 30 / 45

Examples Examples Income Distance Height Donglei Du (UNB) AlgoTrading September 23, 2015 31 / 45

Summary of level of measurement comparison addition subtraction multiplication division nominal x x x x x ordinal x x x x interval x x ratio Donglei Du (UNB) AlgoTrading September 23, 2015 32 / 45

A general method for identifying the level of measurement Ask yourself the following three questions: Is order meaningful? No! then the data is nominal Is difference meaningful? No! then the data is ordinal Is zero meaningful No! then the data is interval Yes! then the data is ratio Donglei Du (UNB) AlgoTrading September 23, 2015 33 / 45

Section 5 Sources of Statistical Data Donglei Du (UNB) AlgoTrading September 23, 2015 34 / 45

Sources of Statistical Data Statistics Canada: http://www.statcan.ca Twitter: www.twitter.com Facebook: www.facebook.com.... Donglei Du (UNB) AlgoTrading September 23, 2015 35 / 45

Section 6 Software for Statistical Analysis Donglei Du (UNB) AlgoTrading September 23, 2015 36 / 45

Software for Statistical Analysis Many, but the most popular ones are EXCEL (proprietary) SAS (proprietary) R (open-source).... Donglei Du (UNB) AlgoTrading September 23, 2015 37 / 45

Section 7 Materials to learn R Donglei Du (UNB) AlgoTrading September 23, 2015 38 / 45

Some online resources to learn R I R in a nutshell An introduction to R R for Beginners Try R Quick-R The art of R Programming www.coursera.org offers some nice courses on R Such as this one Donglei Du (UNB) AlgoTrading September 23, 2015 39 / 45

Section 8 A brief tutorial of R with a case study Donglei Du (UNB) AlgoTrading September 23, 2015 40 / 45

R for Statistical Analysis We will retrieve some stock data via package quantmod from Yahoo Finance Donglei Du (UNB) AlgoTrading September 23, 2015 41 / 45

Load package rm(list=ls()) require("quantmod") Donglei Du (UNB) AlgoTrading September 23, 2015 42 / 45

Parameters startdate <- '1957-03-04' # start of data enddate <- '2014-01-06' # end of data Sys.setenv(TZ = "UTC") Donglei Du (UNB) AlgoTrading September 23, 2015 43 / 45

Retrieve data symbols<-c("^gspc") #symbols<-c("aapl","fb", "YHOO") if(file.exists("data/gspc.rdata")) { load("data/gspc.rdata") #GSPC_csv <- read.csv("data/gspc.csv") } else { getsymbols(symbols, src = "yahoo", # OHLC format from = startdate, #to = enddate, index.class=c("posixt","posixct"), # Recommended warnings = FALSE, adjust=true) dir.create(file.path('data'), showwarnings = TRUE) save(list="gspc", file="data/gspc.rdata") Donglei Du (UNB) AlgoTrading September 23, 2015 44 / 45

Plot options(width=60) chartseries(gspc["2004::2014"]) addmacd() addbbands() Donglei Du (UNB) AlgoTrading September 23, 2015 45 / 45