2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

Similar documents
Chapter 6: Point Estimation. Fall Probability & Statistics

Multivariate Normal Distribution

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

STA 4273H: Statistical Machine Learning

Lecture 3: Linear methods for classification

MA 1125 Lecture 14 - Expected Values. Friday, February 28, Objectives: Introduce expected values.

Basics of Statistical Machine Learning

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 13. Random Variables: Distribution and Expectation

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

A Simple Model of Price Dispersion *

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

9. Sampling Distributions

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

Maximum likelihood estimation of mean reverting processes

MATHEMATICAL METHODS OF STATISTICS

AMS 5 CHANCE VARIABILITY

The Binomial Distribution

LOGNORMAL MODEL FOR STOCK PRICES

Maximum Likelihood Estimation

AP Physics 1 and 2 Lab Investigations

LOOKING FOR A GOOD TIME TO BET

STAT 35A HW2 Solutions

Chapter 4 Lecture Notes

WHERE DOES THE 10% CONDITION COME FROM?

Near Misses in Bingo

Probability and Expected Value

Bayesian Analysis for the Social Sciences

VARIANCE REDUCTION TECHNIQUES FOR IMPLICIT MONTE CARLO SIMULATIONS

Multivariate Analysis of Ecological Data

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Decompose Error Rate into components, some of which can be measured on unlabeled data

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

Adaptive Online Gradient Descent

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

CPC/CPA Hybrid Bidding in a Second Price Auction

17. SIMPLE LINEAR REGRESSION II

CHAPTER 2 Estimating Probabilities

The sample space for a pair of die rolls is the set. The sample space for a random number between 0 and 1 is the interval [0, 1].

Statistical Models in R

Example: Find the expected value of the random variable X. X P(X)

Logistic Regression (1/24/13)

1 Maximum likelihood estimation

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

Random graphs with a given degree sequence

How To Check For Differences In The One Way Anova

Lecture 4 Online and streaming algorithms for clustering

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Recursive Estimation

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

One-year reserve risk including a tail factor : closed formula and bootstrap approaches

Applications to Data Smoothing and Image Processing I

From the help desk: Bootstrapped standard errors

Gambling Systems and Multiplication-Invariant Measures

Big Data & Scripting Part II Streaming Algorithms

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

The Math. P (x) = 5! = = 120.

Chapter 5. Discrete Probability Distributions

1 Short Introduction to Time Series

Probability and statistics; Rehearsal for pattern recognition

Bootstrapping Big Data

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Convex Hull Probability Depth: first results

Lecture 6 Online and streaming algorithms for clustering

2. Linear regression with multiple regressors

c. Given your answer in part (b), what do you anticipate will happen in this market in the long-run?

5 Cumulative Frequency Distributions, Area under the Curve & Probability Basics 5.1 Relative Frequencies, Cumulative Frequencies, and Ogives

Cloud Computing is NP-Complete

Discrete Structures for Computer Science

Binomial Sampling and the Binomial Distribution

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

Persuasion by Cheap Talk - Online Appendix

Globally Optimal Crowdsourcing Quality Management

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Rafael Witten Yuze Huang Haithem Turki. Playing Strong Poker. 1. Why Poker?

Statistics 104: Section 6!

99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, cm

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

ECE302 Spring 2006 HW3 Solutions February 2,

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

Econometrics Simple Linear Regression

Review Jeopardy. Blue vs. Orange. Review Jeopardy

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

The Method of Least Squares

Exploratory data analysis (Chapter 2) Fall 2011

2.2 Scientific Notation: Writing Large and Small Numbers

Chapter 4: Vector Autoregressive Models

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Lecture Notes on Polynomials

Time Series Analysis

Applied Algorithm Design Lecture 5

Cross-validation for detecting and preventing overfitting

Transcription:

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR)

What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came into being? Example: 100 balls from the brand HIT-ME are hit by a machine, and the distance reached by each ball is measured (in meters) 261.3 ; 258.1 ; 254.2 ; 257.7 ; 237.9 ; 255.8 ; 241.4 ; 256.8 ; 255.3 ; 255 ; 259.4 ; 270.5 ; 270.7 ; 272.6 ; 272.2 ; 251.1 ; 232.1 ; 273 ; 266.6 ; 273.2 ; 265.7 ; 264.5 ; 245.5 ; 280.3 ; 248.3 ; 267 ; 271.5 ; 240.8 ; 268.9 ; 263.5 ; 262.2 ; 244.8 ; 279.6 ; 272.7 ; 278.7 ; 255.8 ; 276.1 ; 274.2 ; 267.4 ; 244.5 ; 252 ; 264 ; 247.7 ; 273.6 ; 264.5 ; 285.3 ; 277.8 ; 261.4 ; 253.6 ; 278.5 ; 260 ; 271.2 ; 254.8 ; 256.1 ; 264.5 ; 255.4 ; 259.5 ; 274.9 ; 272.1 ; 273.3 ; 279.3 ; 279.8 ; 272.8 ; 268.5 ; 283.7 ; 263.2 ; 257.5 ; 233.7 ; 260.2 ; 263.7 ; 244.3 ; 241.2 ; 254.4 ; 274 ; 260.7 ; 260.6 ; 255.1 ; 233.7 ; 253.7 ; 250.2 ; 251.4 ; 270.6 ; 273.4 ; 242.9 ; 276.6 ; 237.8 ; 261 ; 236 ; 251.8 ; 280.3 ; 268.3 ; 266.8 ; 254.5 ; 234.3 ; 251.6 ; 226.8 ; 240.5 ; 252.1 ; 245.6 ; 270.5 Our goal is to make meaningful statements about all the HIT-ME balls, but using only this sample

The Golf Ball Example Sample Population (all HIT-ME golf balls) (a small number of golf balls randomly taken from the population) What can we assume about the sample? Balls were selected randomly from the pool of ALL HIT-ME balls. All balls are produced by the same process and have similar properties The population of all balls is characterized by some unknown parameters these are our main object of interest.

A Simple Probabilistic Model Definition: Random Sample This forms the basis of our statistical model. We will assume that our dataset is a realization of these random variables, namely: Important Notational Remark: We will use CAPITAL letters to denote random variables, and the corresponding lower-case letters to denote observed values!!!

Tony Soprano s Casino Tony Soprano s casino: There is bingo game in this Mafia casino were you win if a red ball comes out of the cage You are told there is an equal number of red and blue balls in the cage, but you d like to be sure (it seems you are loosing way to often). In particular, the outcome of the game in the last 30 plays was: L, L, L, W, L, L, W, L, W, L, W, L, W, W, W, W, L, W, L, L, L, L, L, L, W, W, L, L, L, W Question: What is a good statistical model in this case?

Tony Soprano s Casino The last point is extremely important it means the outcome of each round has no effect on the other rounds. In this model we don t know the value of p and we would like to say something about it

The Golf Ball Example This is surprisingly very similar to the previous example, if you think about it in the right way: Imagine there is a HUGE bag with ALL the HIT-ME golf balls in the world (a huge number of them!) Now reach into to bag and take n balls chosen at random. Again, the last point is extremely important, it means that the distance of a golf ball has no effect on the distance of all the other balls.

A Simple Probabilistic Model There are many situations that fit, in a way or the other, this exact model: Survey results (you ask a random selection of individuals) Speed of a network link (measurements are noisy) Temperature in a museum room (noisy measurements) Many others (see book ) Definition: Random Sample

Statistic and Estimators Now that we have a model, what can we do with it? Definition: Statistic and Sampling Distribution Some statistics can be quite useful: Definition: Estimator and Estimate

Example Sample Mean Let s try to characterize the sampling distribution of this estimator:

Example Sample Mean Let s try to characterize the sampling distribution of this estimator:

Example Sample Mean

Example Sample Mean It is nice to see that the value we expect to get with this estimator is always around the true mean µ. Furthermore, the variance of the estimator becomes smaller and smaller as we have more samples!!! This is still not a full characterization of the sample distribution. For that we need to know the exact distribution of X 1,,X n. You learned this 2DI26!!! (Chapter 5 of Mont.)

Sample Mean Definition: Sample Mean Proposition:

The Central Limit Theorem It turns out that the sampling distribution of the sample mean is approximately normal even if the distribution of the data is not normal!!! This is sort of the fundamental theorem of statistics: Theorem: The Central Limit Theorem (CLT) This is very powerful, as we ll see later on in the course

The Central Limit Theorem - Examples -4-2 0 2 4 6-2 2 6 Normal Q-Q Plot Sample Quantiles 0.15 0.00 Density Histogram of r.mean 8-4 r.mean 3 4 r.mean 2 4 5 6 1 3 5 Normal Q-Q Plot Sample Quantiles 0.6 Density 0.3 0.0 2 0 Theoretical Quantiles Histogram of r.mean 1-2 -4-2 0 2 Theoretical Quantiles 4

The Central Limit Theorem - Remarks The central limit theorem applies almost any time we compute averages, and aggregate data. We will make extensive use of this result, although sometimes we will state consequences of the CLT without proof. Keep in mind that the CLT is an asymptotic result, which means that, for finite n, the result only holds approximately. So, if you have only a small amount of data it might not be adequate Next: How do we construct good estimators? First we must state what we mean by a good estimator

Criteria for Evaluating Estimators We would like to ensure the estimator is somewhat close to the true unknown parameter, therefore we need to devise a way to measure the distance between the two Since the estimator is a function of the data, it is a random quantity. Therefore we must take this into account To formalize these notions we need state a number of properties that (might) be desirable for estimators to have, just as unbiasedness, small mean squared error, and low variance.

Measures of Error! Definition: Mean Squared Error Clearly, if this is zero then the estimator is perfect. There are of course other ways of measuring the error of an estimator, e.g. We will focus mainly on MSE, because it makes our life easy, but it is worth pointing out that sometimes this is not the most adequate error metric!!!

Bias and Variance! Definition: Bias and Variance and Standard Error Definition: Unbiased Estimator For unbiased estimators, the value of the estimate is centered around the true unknown parameter

The Importance/Irrelevance of the Bias! Are biased estimators all that bad??? Bottom line: unbiasedness is a desirable, but not essential property a good estimator should have. It turns out the mean squared error can be easily written in terms of bias a variance:

Bias/Variance Decomposition! Theorem: Bias-Variance Decomposition not random! = 0

Examples Sample Mean! Therefore this estimator is unbiased!!!

Examples Sample Mean! So the variance gets smaller and smaller as we have more data

Examples Variance Estimator!

Examples Variance Estimator!

Examples Variance Estimator! This estimator is biased. However, the bias gets smaller as sample size increases However, we can easily remove the bias simply multiplying the estimator by n/(n-1), giving rise to our familiar Definition: Sample Variance

Methods of Point Estimation! So far we have described useful properties estimators should have, but haven t given any methodology to construct the estimators There are several ways to do this, and in this course we ll describe only two possibilities: Method of Moments Maximum Likelihood

Method of Moments A very simple, but effective idea: equate the sample moments (e.g. sample mean and variance) to the population moments (e.g., true mean and variance)! This yields a series of equations relating the unknown parameters to the data. Definition: k th sample and population moments

Method of Moments The right-hand-side is a function of the unknown parameter(s) so we can just equate this function with the left-hand-side, and solve that system of equations for. If the parameter space is d dimensional we will typically need at least d equations. Definition: Method of Moments Estimator

Examples

Examples

Examples Home exercise: Derive the results in the previous slide and the above formulas

Maximum Likelihood Estimation This is one of the most common ways of constructing estimators, and perhaps the most well understood. Furthermore these estimators are endowed with many desirable qualities. The main assumption is that the data density/p.m.f. has a distribution with Definition: Maximum Likelihood Estimator

Interpretation For discrete random variables the interpretation of the Maximum Likelihood principle is quite easy: The likelihood function is the probability that we observe the realization of the data, if the distribution of the random sample was drawn from. So the maximum likelihood estimator is choosing the parameter that maximizes the probability/likelihood of the observed data.

Example

Example Home exercise: check that the method of moments yields the same estimator in this case.

Example Soprano s Casino

Example

General Approach to Compute MLEs Definition: likelihood,log-likelihood and score

General Approach to Compute MLEs Provided the likelihood is positive and differentiable the general approach to compute MLEs is: Compute the likelihood function Compute the log-likelihood and score Identify the local extremes of the likelihood by equating the score to zero Identify the extreme corresponding to the global maximum Attention: Sometimes you cannot compute the log-likelihood or the score (e.g. see the example of the uniform distribution) Sometimes identifying the zeros of the score is intractable. In that case one must resort to numerical methods (e.g. gradient ascent).

Examples!

Properties of the MLE! Proposition: Invariance Property Theorem: (informally stated):

Examples!

Examples!

Examples! Unlike we had in the normal data case correcting for the bias is in this case very advantageous!!! So it all depends on the balance between bias and variance

Example Beyond the Random Sample!

Examples!

Examples!

Examples!

Examples!

Examples!