Anomaly detection for Big Data, networks and cyber-security

Size: px
Start display at page:

Download "Anomaly detection for Big Data, networks and cyber-security"

Transcription

1 Anomaly detection for Big Data, networks and cyber-security Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with Nick Heard (Imperial College London), Dan Lawson (University of Bristol), Niall Adams (Imperial College London) and Melissa Turcotte (Los Alamos National Laboratory) Dynamic Networks and Cyber-security 24th June 2015

2

3 Netflow Server IP address:port 10:53 10:771 11:148 11:5353 1:80 2:80 3:80 4:443 5:443 6:443 7:2049 8:389 9:53 9:771 00:00 01:00 02:00 03:00 04:00 05:00 Time (minutes:seconds) Figure : Five minutes of NetFlow, generated by a computer on the Imperial College London network.

4 Sessionization Server IP address:port 10:53 10:771 11:148 11:5353 1:80 2:80 3:80 4:443 5:443 6:443 7:2049 8:389 9:53 9:771 00:00 01:00 02:00 03:00 04:00 05:00 Time (minutes:seconds) Figure : Sessionization of Netflow Bayesian temporal clustering

5 Filtering Probability Probability Time of day (hours) Time

6 Point process modelling λ(t) Time Figure : Non-parametric Bayesian intensity estimation Enron data

7 De Finetti s representation theorem Let X 1, X 2,... be an infinite exchangeable sequence of Bernoulli variables. Then X i are independent conditional on a common success probability, p, where p is a random variable on [0, 1].

8 Aldous-Hoover theorem Let G be an infinite undirected exchangeable graph. Then each edge i j is a conditionally independent Bernoulli variable with success probability f (u i, u j ) where u 1, u 2,... are independent uniform random variables on [0, 1]. f is a random symmetric function from [0, 1] 2 [0, 1] (called a graphon).

9 Block approximation a) Graphon b) Block approximation (coarse) c) Block approximation (fine) Figure : A graphon and its block approximations

10 Stochastic block model Partitioning of nodes into communities represented a vector of labels l Marginal likelihood of the graph where P(G l) = k l 1 0 p n kl (1 p) o kl n kl df (p), 1 n kl number of edges between communities k and l 2 o kl number of possible edges 3 F prior distribution function on edge probability (assuming an IID prior)

11 Anomaly detection with complex models Data: D Anomaly: f (D) P-value: p = P{f (D ) f (D) D}

12 Anomaly detection with complex models Data: D Model: D θ Unknowns: θ Anomaly: f (D, θ) P-value: p = P{f (D, θ) f (D, θ) D}

13 Anomaly detection with complex models Data: D Model: D θ Unknowns: θ Anomaly: f (D, θ) P-value: p = P{f (D, θ) f (D, θ) D} Conservative p-value p + = sup θ (p)

14 Anomaly detection with complex models Data: D Model: D θ Unknowns: θ Anomaly: f (D, θ) P-value: p = P{f (D, θ) f (D, θ) D} Expected p-value p = E θ (p)

15 Posterior predictive p-value The posterior predictive p-value is (Meng, 1994; Gelman et al., 1996, Eq. 2.8, Eq. 7) P = P{f (D, θ) f (D, θ) D}, (1) where θ represents the model parameters, D is the observed dataset, D is a hypothetical replicated dataset generated from the model with parameters θ, and P( D) is the joint posterior distribution of (θ, D ) given D. In words: if a new dataset were generated from the same model and parameters, what is the probability that the new discrepancy would be as large? Discussion: Guttman (1967), Box (1980), Rubin (1984), Bayarri and Berger (2000), Hjort et al. (2006) Posterior predictive p-values in practice: Huelsenbeck et al. (2001), Sinharay and Stern (2003), Thornton and Andolfatto (2006), Steinbakk and Storvik (2009)

16 Two estimates Posterior sample: θ 1,..., θ n from θ D Estimate I: 1 Simulate data D1,..., D n. 2 Estimate ˆP = 1 n I{f (Di, θ i ) f (D, θ i )}. n i=1 Estimate II: 1 Calculate p-values Q i = P{f (D, θ i ) f (D, θ i )}, for i = 1,..., n. 2 Estimate ˆP = 1 n n Q i. i=1

17 Two estimates Posterior sample: θ 1,..., θ n from θ D Estimate I: 1 Simulate data D1,..., D n. 2 Estimate ˆP = 1 n I{f (Di, θ i ) f (D, θ i )}. n i=1 Estimate II: 1 Calculate p-values Q i = P{f (D, θ i ) f (D, θ i )}, for i = 1,..., n. 2 Estimate ˆP = 1 n n Q i. i=1

18 Interpreting posterior predictive p-values Recall: P = P{f (D, θ) f (D, θ) D}. How should we interpret P? For example: Is P = 40% good? If P = 10 6, is there cause for alarm? What if 500 tests are performed, and min(p i ) = 10 6? Some comments: The interpretation and comparison of posterior predictive p-values [is] a difficult and risky matter (Hjort et al., 2006) Its main weakness is that there is an apparent double use of the data...this double use of the data can induce unnatural behavior (Bayarri and Berger, 2000) Donald Rubin alludes to some conservative operating characteristics (Rubin, 1996) The basic problem is: P is not uniformly distributed under the null hypothesis.

19 Interpreting posterior predictive p-values Recall: P = P{f (D, θ) f (D, θ) D}. How should we interpret P? For example: Is P = 40% good? If P = 10 6, is there cause for alarm? What if 500 tests are performed, and min(p i ) = 10 6? Some comments: The interpretation and comparison of posterior predictive p-values [is] a difficult and risky matter (Hjort et al., 2006) Its main weakness is that there is an apparent double use of the data...this double use of the data can induce unnatural behavior (Bayarri and Berger, 2000) Donald Rubin alludes to some conservative operating characteristics (Rubin, 1996) The basic problem is: P is not uniformly distributed under the null hypothesis.

20 Meng s result In the last pages of Meng (1994): In our notation: suppose θ and D are drawn from the prior and model respectively, and f (D, θ) is absolutely continuous. Then, (i) E(P) = E(U) = 1/2 and (ii) for any convex function h, E{h(P)} E{h(U)}, where U is a uniform random variable on [0, 1].

21 A consequence Meng (1994) next finds:

22 The convex order Let X and Y be two random variables with probability measures µ and ν respectively. We say that µ cx ν if, for any convex function h, E{h(X )} E{h(Y )}, whenever the expectations exist (Shaked and Shanthikumar, 2007). We say that a probability measure P is sub-uniform if P cx U, where U is a uniform distribution on [0, 1]. Meng s theorem says that posterior predictive p-values have a sub-uniform distribution.

23 Reflecting on the theorem 1 Proof is in fact quite straightforward (Jensen s inequality) 2 Is there something more we can say? 1 The 2α bound suggests that P could be liberal, whereas we had the impression they were conservative. Can the bound be improved? 2 The theorem seems to be describing a huge space of distributions. Can it somehow be reduced?

24 On taking the average Theorem (Strassen s theorem) For two probability measures µ and ν on the real line the following conditions are equivalent: 1 µ cx ν; 2 there are random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X. Theorem due to Strassen (1965) (see also references therein), this version due to Müller and Rüschendorf (2001). In other words: Taking an average is reducing in the convex order

25 On taking the average Theorem (Strassen s theorem) For two probability measures µ and ν on the real line the following conditions are equivalent: 1 µ cx ν; 2 there are random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X. Theorem due to Strassen (1965) (see also references therein), this version due to Müller and Rüschendorf (2001). In other words: Taking an average is reducing in the convex order

26 Theorem Let µ and ν be two probability measures on the real line where ν is absolutely continuous. The following conditions are equivalent: 1 µ cx ν; 2 there exist random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X and the random variable Y X is either singular, i.e. Y = X, or absolutely continuous with µ-probability one. Theorem (Posterior predictive p-values and the convex order) P is a sub-uniform probability measure if and only if there exist random variables P, D, θ and an absolutely continuous discrepancy f (D, θ) such that P = P{f (D, θ) f (D, θ) D}, where P has measure P, D is a replicate of D conditional on θ and P( D) is the joint posterior distribution of (θ, D ) given D.

27 Theorem Let µ and ν be two probability measures on the real line where ν is absolutely continuous. The following conditions are equivalent: 1 µ cx ν; 2 there exist random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X and the random variable Y X is either singular, i.e. Y = X, or absolutely continuous with µ-probability one. Theorem (Posterior predictive p-values and the convex order) P is a sub-uniform probability measure if and only if there exist random variables P, D, θ and an absolutely continuous discrepancy f (D, θ) such that P = P{f (D, θ) f (D, θ) D}, where P has measure P, D is a replicate of D conditional on θ and P( D) is the joint posterior distribution of (θ, D ) given D.

28 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.

29 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.

30 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.

31 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.

32 ψ(t) density t t Figure : Uniform distribution

33 ψ(t) density t t Figure : Some sub-uniform distributions. Blue: uniform, green: Beta(2,2)

34 ψ(t) density t t Figure : Some sub-uniform distributions. Blue: uniform, green: Beta(2,2), red: a distribution with maximal mass at.3

35 a) Model 1 b) Model 1 discrepancy X(0) θ = 1 θ = 0 1 2α X(1) 2α f(x,θ) f(x,1) f(x,0) x c) Model 2 d) Model 2 discrepancy p(x θ) p(x θ = 0) p(x θ = 1) f(x,θ) f(x,0) f(x,1) x x Figure : Some constructive examples

36 Combining p-values Independent p-values P 1,..., P n. Two canonical tasks: 1 sub-select a set for further analysis 2 combine the p-values into one overall score of significance

37 Discrete p-values Ordinary p-value (T = f (D)): P = P(T T T ), Mid-p-value (Lancaster, 1952): Q = 1 2 P(T T T ) P(T > T T ), Lemma (Fisher s method for mid-p-values) Under the null hypothesis, 1 Q cx U, where U is a uniform random variable on the unit interval 2 As a consequence, for t 2n, where F n = 2 n i=1 log(q i) P(F n t) exp{n t/2 n log(2n/t)},

38 Discrete p-values Ordinary p-value (T = f (D)): P = P(T T T ), Mid-p-value (Lancaster, 1952): Q = 1 2 P(T T T ) P(T > T T ), Lemma (Fisher s method for mid-p-values) Under the null hypothesis, 1 Q cx U, where U is a uniform random variable on the unit interval 2 As a consequence, for t 2n, where F n = 2 n i=1 log(q i) P(F n t) exp{n t/2 n log(2n/t)},

39 Discrete p-values Ordinary p-value (T = f (D)): P = P(T T T ), Mid-p-value (Lancaster, 1952): Q = 1 2 P(T T T ) P(T > T T ), Lemma (Fisher s method for mid-p-values) Under the null hypothesis, 1 Q cx U, where U is a uniform random variable on the unit interval 2 As a consequence, for t 2n, where F n = 2 n i=1 log(q i) P(F n t) exp{n t/2 n log(2n/t)},

40 50/50 Random binary Grid of ten n=10, β= 20 0 F^(x) 1 0 F^(x) 1 0 F^(x) 1 0 x 1 0 x 1 0 x 1 n=100, β= 5 0 F^(x) 1 0 F^(x) 1 0 F^(x) 1 ordinary mid p value y=x 0 x 1 0 x 1 0 x 1 Figure : Fisher s method with mid-p-values

41 Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4): Bayarri, M. and Berger, J. O. (2000). P values for composite null models. Journal of the American Statistical Association, 95(452): Box, G. E. (1980). Sampling and Bayes inference in scientific modelling and robustness. Journal of the Royal Statistical Society. Series A (General), pages Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4): Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society. Series B (Methodological), pages Hjort, N. L., Dahl, F. A., and Steinbakk, G. H. (2006). Post-processing posterior predictive p values. Journal of the American Statistical Association, 101(475): Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301): Hoover, D. N. (1979). Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 2. Huelsenbeck, J. P., Ronquist, F., Nielsen, R., and Bollback, J. P. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294(5550): Lancaster, H. (1952). Statistical control of counting experiments. Biometrika, pages Lehmann, E. L. and Romano, J. P. (2006). Testing statistical hypotheses. Springer Science & Business Media. Meng, X.-L. (1994). Posterior predictive p-values. The Annals of Statistics, 22(3): Müller, A. and Rüschendorf, L. (2001). On the optimal stopping values induced by general dependence structures. Journal of applied probability, 38(3): Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4): Rubin, D. B. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4): Discussion of Gelman, Meng and Stern. Rubin-Delanchy, P. and Heard, N. A. (2014). A test for dependence between two point processes on the real line. arxiv preprint arxiv: Rubin-Delanchy, P. and Lawson, D. J. (2014). Posterior predictive p-values and the convex order. arxiv preprint arxiv: Shaked, M. and Shanthikumar, J. G. (2007). Stochastic orders. Springer. Sinharay, S. and Stern, H. S. (2003). Posterior predictive model checking in hierarchical models. Journal of Statistical Planning and Inference, 111(1): Steinbakk, G. H. and Storvik, G. O. (2009). Posterior predictive p-values in Bayesian hierarchical models. Scandinavian Journal of Statistics, 36(2): Strassen, V. (1965). The existence of probability measures with given marginals. The Annals of Mathematical Statistics, 36(2): Thornton, K. and Andolfatto, P. (2006). Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a netherlands population of drosophila melanogaster. Genetics, 172(3):

Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Combining Weak Statistical Evidence in Cyber Security

Combining Weak Statistical Evidence in Cyber Security Combining Weak Statistical Evidence in Cyber Security Nick Heard Department of Mathematics, Imperial College London; Heilbronn Institute for Mathematical Research, University of Bristol Intelligent Data

More information

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012 Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic

More information

Finding statistical patterns in Big Data

Finding statistical patterns in Big Data Finding statistical patterns in Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research IAS Research Workshop: Data science for the real world (workshop 1)

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i. Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior

More information

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

More information

5 Directed acyclic graphs

5 Directed acyclic graphs 5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

More information

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

More information

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

More information

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.

Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015. Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x

More information

The Chinese Restaurant Process

The Chinese Restaurant Process COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.

More information

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

People have thought about, and defined, probability in different ways. important to note the consequences of the definition: PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Lecture 8. Confidence intervals and the central limit theorem

Lecture 8. Confidence intervals and the central limit theorem Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Dongfeng Li. Autumn 2010

Dongfeng Li. Autumn 2010 Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis

More information

Bayesian Analysis for the Social Sciences

Bayesian Analysis for the Social Sciences Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935) Section 7.1 Introduction to Hypothesis Testing Schrodinger s cat quantum mechanics thought experiment (1935) Statistical Hypotheses A statistical hypothesis is a claim about a population. Null hypothesis

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

The Variability of P-Values. Summary

The Variability of P-Values. Summary The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Notes on Continuous Random Variables

Notes on Continuous Random Variables Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia

Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia Estimating the Degree of Activity of jumps in High Frequency Financial Data joint with Yacine Aït-Sahalia Aim and setting An underlying process X = (X t ) t 0, observed at equally spaced discrete times

More information

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

More information

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails 12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM TORONTO THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Permutation Tests for Comparing Two Populations

Permutation Tests for Comparing Two Populations Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of

More information

How to Gamble If You Must

How to Gamble If You Must How to Gamble If You Must Kyle Siegrist Department of Mathematical Sciences University of Alabama in Huntsville Abstract In red and black, a player bets, at even stakes, on a sequence of independent games

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

Lecture 13: Martingales

Lecture 13: Martingales Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

An Introduction to Basic Statistics and Probability

An Introduction to Basic Statistics and Probability An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

5.1 Identifying the Target Parameter

5.1 Identifying the Target Parameter University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié. Random access protocols for channel access Markov chains and their stability [email protected] Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines

More information

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

More information

Lecture 7: Continuous Random Variables

Lecture 7: Continuous Random Variables Lecture 7: Continuous Random Variables 21 September 2005 1 Our First Continuous Random Variable The back of the lecture hall is roughly 10 meters across. Suppose it were exactly 10 meters, and consider

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

Probability Generating Functions

Probability Generating Functions page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Convex Hull Probability Depth: first results

Convex Hull Probability Depth: first results Conve Hull Probability Depth: first results Giovanni C. Porzio and Giancarlo Ragozini Abstract In this work, we present a new depth function, the conve hull probability depth, that is based on the conve

More information

Chapter 2. Hypothesis testing in one population

Chapter 2. Hypothesis testing in one population Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

A Statistical Framework for Operational Infrasound Monitoring

A Statistical Framework for Operational Infrasound Monitoring A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100 Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Separation Properties for Locally Convex Cones

Separation Properties for Locally Convex Cones Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam

More information