Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1



Similar documents
An Introduction to Basic Statistics and Probability

BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

Bayesian Analysis for the Social Sciences

STA 256: Statistics and Probability I

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Lecture 6: Discrete & Continuous Probability and Random Variables

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

Basics of Statistical Machine Learning

Chapter 4 Lecture Notes

Nonparametric adaptive age replacement with a one-cycle criterion

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Probability and statistics; Rehearsal for pattern recognition

Chapter 4 - Lecture 1 Probability Density Functions and Cumul. Distribution Functions

4. Continuous Random Variables, the Pareto and Normal Distributions

Probability and Statistics Vocabulary List (Definitions for Middle School Teachers)

5. Continuous Random Variables

Important Probability Distributions OPRE 6301

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

Definition: Suppose that two random variables, either continuous or discrete, X and Y have joint density

COMMON CORE STATE STANDARDS FOR

Question: What is the probability that a five-card poker hand contains a flush, that is, five cards of the same suit?

Joint Exam 1/P Sample Exam 1

Chapter 5. Random variables

A Tutorial on Probability Theory

What is the purpose of this document? What is in the document? How do I send Feedback?

Exploratory Data Analysis

Statistics in Geophysics: Introduction and Probability Theory

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Pr(X = x) = f(x) = λe λx

DECISION MAKING UNDER UNCERTAINTY:

Notes on Continuous Random Variables

Homework 8 Solutions

MAS108 Probability I

Measures of Central Tendency and Variability: Summarizing your Data for Others

Introduction to Probability

Statistics: Descriptive Statistics & Probability

Probability Distribution for Discrete Random Variables

CA200 Quantitative Analysis for Business Decisions. File name: CA200_Section_04A_StatisticsIntroduction

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

1 Prior Probability and Posterior Probability

Bayes Theorem. Bayes Theorem- Example. Evaluation of Medical Screening Procedure. Evaluation of Medical Screening Procedure

1. A survey of a group s viewing habits over the last year revealed the following

MEASURES OF VARIATION

SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics

Machine Learning.

Probability & Probability Distributions

Master s Theory Exam Spring 2006

How To Write A Data Analysis

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

STA 4273H: Statistical Machine Learning

3. Data Analysis, Statistics, and Probability

Foundation of Quantitative Data Analysis

Bayes and Naïve Bayes. cs534-machine Learning

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

MBA 611 STATISTICS AND QUANTITATIVE METHODS

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Fairfield Public Schools

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Simple Decision Making

The Basics of Graphical Models

EMPIRICAL FREQUENCY DISTRIBUTION

MAS131: Introduction to Probability and Statistics Semester 1: Introduction to Probability Lecturer: Dr D J Wilkinson

Lecture 3: Continuous distributions, expected value & mean, variance, the normal distribution

Section 6.1 Joint Distribution Functions

1. Let A, B and C are three events such that P(A) = 0.45, P(B) = 0.30, P(C) = 0.35,

Exam 3 Review/WIR 9 These problems will be started in class on April 7 and continued on April 8 at the WIR.

ST 371 (IV): Discrete Random Variables

Chapter 6: Point Estimation. Fall Probability & Statistics

Recursive Estimation

Random Variables. Chapter 2. Random Variables 1

ECE302 Spring 2006 HW5 Solutions February 21,

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Math 370, Spring 2008 Prof. A.J. Hildebrand. Practice Test 1 Solutions

Probability for Estimation (review)

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

The Normal Distribution. Alan T. Arnholt Department of Mathematical Sciences Appalachian State University

ATM 552 Notes: Review of Statistics Page 1

Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin

Definition and Calculus of Probability

Multivariate Normal Distribution

Basic Probability. Probability: The part of Mathematics devoted to quantify uncertainty

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Christfried Webers. Canberra February June 2015

Chapter 4. iclicker Question 4.4 Pre-lecture. Part 2. Binomial Distribution. J.C. Wang. iclicker Question 4.4 Pre-lecture

3: Summary Statistics

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

STATISTICS 8: CHAPTERS 7 TO 10, SAMPLE MULTIPLE CHOICE QUESTIONS

UNIT I: RANDOM VARIABLES PART- A -TWO MARKS

Descriptive Statistics. Purpose of descriptive statistics Frequency distributions Measures of central tendency Measures of dispersion

E3: PROBABILITY AND STATISTICS lecture notes

STA 371G: Statistics and Modeling

Transcription:

Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1

Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields Probability Mainly a field of theoretical mathematics Deals with predicting the likelihood of events given a set of assumptions (e.g. a distribution) Statistics Field of applied mathematics Deals with the collection, analysis, and interpretation of data Manfred Huber 2011 2

Statistics Statistics deals with real world data Data collection How do we have to collect the data to get valid results Analysis of data What properties does the data have What distribution does it come from Interpretation of the data Is it different from other data What could cause issues in the data Manfred Huber 2011 3

Probability Probability is a formal framework to model likelihoods mainly used to make predictions There are two main (interchangeable) views of probability Subjective / uncertainty view: Probabilities summarize the effects of uncertainty on the state of knowledge In Bayesian probability all types of uncertainty are combined in one number Frequency view: Probabilities represent relative frequencies of events P(e) = (# of times of event e) / (# of events) Manfred Huber 2011 4

Data Modeling & Analysis Techniques Probability Theory Manfred Huber 2011 5

Probability Random variables define the entities of probability theory Propositional random variables: E.g.: IsRed, Earthquake Multivalued random variables: E.g.: Event, Color, Weather Real-Valued random variables: E.g.: Height, Weight Manfred Huber 2011 6

Axioms of Probability Probability follows a fixed set of rules Propositional random variables: P(A)![ 0..1] P(T ) = 1, P(F) = 0 P(A! B) = P(A) + P(B) " P(A # B) P(A! B) = P(A)P(B A) " P(X = x) = 1 x!{t,f} Manfred Huber 2011 7

Axioms of Probability The same axioms apply to multi-valued and continuous random variables Multi-valued variables ( X!S = {x 1,..., x N };Y,Z " S ): P(Y )![ 0..1] P(Y = S) = 1, P(Y =!) = 0 P(Y! Z) = P(Y ) + P(Z) " P(Y # Z) P(Y! Z) = P(Y )P(Z Y ) " P(X = x) = 1 x!s Manfred Huber 2011 8

Continuous Random Variables The probability of continuous random variables requires additional tools The probability of a continuous random variable to take on a specific value is often 0 for all (or almost all) possible values Probability has to be defined over ranges of values P(a! X! b) Individual assignments to random variables have to be addressed using probability densities p(x = x)![ 0.." ] Manfred Huber 2011 9

Continuous Random Variables Probability density is a measure of the increase in likelihood when adding the corresponding value to the range of values b P(a! X! b) = p(x = x)dx " a The probability density is effectively the derivative of the cumulative probability distribution p(x = x) = df(x) ; F(x) = P(!" # X # x) dx Manfred Huber 2011 10

Probability Syntax Unconditional or prior probabilities represent the state of knowledge before new observations or evidence P(H) A probability distribution gives values for all possible assignments to a random variable A joint probability distribution gives values for all possible assignments to all random variables Manfred Huber 2011 11

Conditional Probability Conditional probabilities represent the probability after certain observations or facts have been considered P(H E) is the posterior probability of H after evidence E is taken into account Bayes rule allows to derive posterior probabilities from prior probabilities P(H E) = P(E H) P(H)/P(E) Manfred Huber 2011 12

Conditional Probability Probability calculations can be conditioned by conditioning all terms Often it is easier to find conditional probabilities Conditions can be removed by marginalization P(H ) =! E P(H E)P(E) Manfred Huber 2011 13

Joint Distributions A joint distribution defines the probability values for all possible assignments to all random variables Exponential in the number of random variables Conditional probabilities can be computed from a joint probability distribution P(A B) = P(A! B) / P(B) Manfred Huber 2011 14

Inference Inference in probabilistic representation involves the computation of (conditional) probabilities from the available information Most frequently the computation of a posterior probability P(H E) form a prior probability P(H) and new evidence E Manfred Huber 2011 15

Statistics Statistics attempt to represent the important characteristics of a set of data items (or of a probability distribution) and the uncertainty contained in the set (or the distribution). Statistics represent different attributes of the probability distribution represented by the data Statistics are aimed at making it possible to analyze the data based on its important characteristics Manfred Huber 2011 16

Experiment and Sample Space A (random) experiment is a procedure that has a number of possible outcomes and it is not certain which one will occur The sample space is the set of all possible outcomes of an experiment (often denoted by S). Examples: Coin : S={H, T} Two coins: S={HH, HT, TH, TT} Lifetime of a system: S={0.. } Manfred Huber 2011 17

Statistics A number of important statistics can be used to characterize a data set (or a population from which the data items are drawn) Mean Median Mode Variance Standard deviation Manfred Huber 2011 18

Mean The arithmetic mean represents the average value of data set {X i } µ N i= = 1 N X i The arithmetic mean is the expected value of a random variable, i.e. the expected value of a data item drawn at random from a population µ = E[X ] µ Manfred Huber 2011 19

Median and Mode The median m is the middle of a distribution { X X m} = { X X m} i i The mode of a distribution is the most frequently (i.e. most likely) value i i Manfred Huber 2011 20

Variance and Standard Deviation The variance represents the spread of a distribution N In a data set {X i } an unbiased estimate s 2 for the variance can be calculated as N N-1 is often called the number of degrees of freedom of the data set The standard deviation! is the square root of the variance! 2 σ s N 2 i= = 1 2 = i= 1 ( X µ) ( X i i In the case of a sample set, s is often referred to as standard error 2 X ) N 1 2 Manfred Huber 2011 21

Moments Moments are important to characterize distributions r th moment: E "( x! a) r $ # % Important moments: Mean: Variance: Skewness: "( ) 1 E # x! 0 $ % E # x! µ "( ) 2 E # x! µ $ % "( ) 3 $ % Manfred Huber 2011 22