Anomaly detection for Big Data, networks and cyber-security
|
|
|
- Thomasine Cummings
- 9 years ago
- Views:
Transcription
1 Anomaly detection for Big Data, networks and cyber-security Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with Nick Heard (Imperial College London), Dan Lawson (University of Bristol), Niall Adams (Imperial College London) and Melissa Turcotte (Los Alamos National Laboratory) Dynamic Networks and Cyber-security 24th June 2015
2
3 Netflow Server IP address:port 10:53 10:771 11:148 11:5353 1:80 2:80 3:80 4:443 5:443 6:443 7:2049 8:389 9:53 9:771 00:00 01:00 02:00 03:00 04:00 05:00 Time (minutes:seconds) Figure : Five minutes of NetFlow, generated by a computer on the Imperial College London network.
4 Sessionization Server IP address:port 10:53 10:771 11:148 11:5353 1:80 2:80 3:80 4:443 5:443 6:443 7:2049 8:389 9:53 9:771 00:00 01:00 02:00 03:00 04:00 05:00 Time (minutes:seconds) Figure : Sessionization of Netflow Bayesian temporal clustering
5 Filtering Probability Probability Time of day (hours) Time
6 Point process modelling λ(t) Time Figure : Non-parametric Bayesian intensity estimation Enron data
7 De Finetti s representation theorem Let X 1, X 2,... be an infinite exchangeable sequence of Bernoulli variables. Then X i are independent conditional on a common success probability, p, where p is a random variable on [0, 1].
8 Aldous-Hoover theorem Let G be an infinite undirected exchangeable graph. Then each edge i j is a conditionally independent Bernoulli variable with success probability f (u i, u j ) where u 1, u 2,... are independent uniform random variables on [0, 1]. f is a random symmetric function from [0, 1] 2 [0, 1] (called a graphon).
9 Block approximation a) Graphon b) Block approximation (coarse) c) Block approximation (fine) Figure : A graphon and its block approximations
10 Stochastic block model Partitioning of nodes into communities represented a vector of labels l Marginal likelihood of the graph where P(G l) = k l 1 0 p n kl (1 p) o kl n kl df (p), 1 n kl number of edges between communities k and l 2 o kl number of possible edges 3 F prior distribution function on edge probability (assuming an IID prior)
11 Anomaly detection with complex models Data: D Anomaly: f (D) P-value: p = P{f (D ) f (D) D}
12 Anomaly detection with complex models Data: D Model: D θ Unknowns: θ Anomaly: f (D, θ) P-value: p = P{f (D, θ) f (D, θ) D}
13 Anomaly detection with complex models Data: D Model: D θ Unknowns: θ Anomaly: f (D, θ) P-value: p = P{f (D, θ) f (D, θ) D} Conservative p-value p + = sup θ (p)
14 Anomaly detection with complex models Data: D Model: D θ Unknowns: θ Anomaly: f (D, θ) P-value: p = P{f (D, θ) f (D, θ) D} Expected p-value p = E θ (p)
15 Posterior predictive p-value The posterior predictive p-value is (Meng, 1994; Gelman et al., 1996, Eq. 2.8, Eq. 7) P = P{f (D, θ) f (D, θ) D}, (1) where θ represents the model parameters, D is the observed dataset, D is a hypothetical replicated dataset generated from the model with parameters θ, and P( D) is the joint posterior distribution of (θ, D ) given D. In words: if a new dataset were generated from the same model and parameters, what is the probability that the new discrepancy would be as large? Discussion: Guttman (1967), Box (1980), Rubin (1984), Bayarri and Berger (2000), Hjort et al. (2006) Posterior predictive p-values in practice: Huelsenbeck et al. (2001), Sinharay and Stern (2003), Thornton and Andolfatto (2006), Steinbakk and Storvik (2009)
16 Two estimates Posterior sample: θ 1,..., θ n from θ D Estimate I: 1 Simulate data D1,..., D n. 2 Estimate ˆP = 1 n I{f (Di, θ i ) f (D, θ i )}. n i=1 Estimate II: 1 Calculate p-values Q i = P{f (D, θ i ) f (D, θ i )}, for i = 1,..., n. 2 Estimate ˆP = 1 n n Q i. i=1
17 Two estimates Posterior sample: θ 1,..., θ n from θ D Estimate I: 1 Simulate data D1,..., D n. 2 Estimate ˆP = 1 n I{f (Di, θ i ) f (D, θ i )}. n i=1 Estimate II: 1 Calculate p-values Q i = P{f (D, θ i ) f (D, θ i )}, for i = 1,..., n. 2 Estimate ˆP = 1 n n Q i. i=1
18 Interpreting posterior predictive p-values Recall: P = P{f (D, θ) f (D, θ) D}. How should we interpret P? For example: Is P = 40% good? If P = 10 6, is there cause for alarm? What if 500 tests are performed, and min(p i ) = 10 6? Some comments: The interpretation and comparison of posterior predictive p-values [is] a difficult and risky matter (Hjort et al., 2006) Its main weakness is that there is an apparent double use of the data...this double use of the data can induce unnatural behavior (Bayarri and Berger, 2000) Donald Rubin alludes to some conservative operating characteristics (Rubin, 1996) The basic problem is: P is not uniformly distributed under the null hypothesis.
19 Interpreting posterior predictive p-values Recall: P = P{f (D, θ) f (D, θ) D}. How should we interpret P? For example: Is P = 40% good? If P = 10 6, is there cause for alarm? What if 500 tests are performed, and min(p i ) = 10 6? Some comments: The interpretation and comparison of posterior predictive p-values [is] a difficult and risky matter (Hjort et al., 2006) Its main weakness is that there is an apparent double use of the data...this double use of the data can induce unnatural behavior (Bayarri and Berger, 2000) Donald Rubin alludes to some conservative operating characteristics (Rubin, 1996) The basic problem is: P is not uniformly distributed under the null hypothesis.
20 Meng s result In the last pages of Meng (1994): In our notation: suppose θ and D are drawn from the prior and model respectively, and f (D, θ) is absolutely continuous. Then, (i) E(P) = E(U) = 1/2 and (ii) for any convex function h, E{h(P)} E{h(U)}, where U is a uniform random variable on [0, 1].
21 A consequence Meng (1994) next finds:
22 The convex order Let X and Y be two random variables with probability measures µ and ν respectively. We say that µ cx ν if, for any convex function h, E{h(X )} E{h(Y )}, whenever the expectations exist (Shaked and Shanthikumar, 2007). We say that a probability measure P is sub-uniform if P cx U, where U is a uniform distribution on [0, 1]. Meng s theorem says that posterior predictive p-values have a sub-uniform distribution.
23 Reflecting on the theorem 1 Proof is in fact quite straightforward (Jensen s inequality) 2 Is there something more we can say? 1 The 2α bound suggests that P could be liberal, whereas we had the impression they were conservative. Can the bound be improved? 2 The theorem seems to be describing a huge space of distributions. Can it somehow be reduced?
24 On taking the average Theorem (Strassen s theorem) For two probability measures µ and ν on the real line the following conditions are equivalent: 1 µ cx ν; 2 there are random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X. Theorem due to Strassen (1965) (see also references therein), this version due to Müller and Rüschendorf (2001). In other words: Taking an average is reducing in the convex order
25 On taking the average Theorem (Strassen s theorem) For two probability measures µ and ν on the real line the following conditions are equivalent: 1 µ cx ν; 2 there are random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X. Theorem due to Strassen (1965) (see also references therein), this version due to Müller and Rüschendorf (2001). In other words: Taking an average is reducing in the convex order
26 Theorem Let µ and ν be two probability measures on the real line where ν is absolutely continuous. The following conditions are equivalent: 1 µ cx ν; 2 there exist random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X and the random variable Y X is either singular, i.e. Y = X, or absolutely continuous with µ-probability one. Theorem (Posterior predictive p-values and the convex order) P is a sub-uniform probability measure if and only if there exist random variables P, D, θ and an absolutely continuous discrepancy f (D, θ) such that P = P{f (D, θ) f (D, θ) D}, where P has measure P, D is a replicate of D conditional on θ and P( D) is the joint posterior distribution of (θ, D ) given D.
27 Theorem Let µ and ν be two probability measures on the real line where ν is absolutely continuous. The following conditions are equivalent: 1 µ cx ν; 2 there exist random variables X and Y with marginal distributions µ and ν respectively such that E(Y X ) = X and the random variable Y X is either singular, i.e. Y = X, or absolutely continuous with µ-probability one. Theorem (Posterior predictive p-values and the convex order) P is a sub-uniform probability measure if and only if there exist random variables P, D, θ and an absolutely continuous discrepancy f (D, θ) such that P = P{f (D, θ) f (D, θ) D}, where P has measure P, D is a replicate of D conditional on θ and P( D) is the joint posterior distribution of (θ, D ) given D.
28 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.
29 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.
30 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.
31 Exploring the set of sub-uniform distributions The integrated distribution function of a random variable X with distribution function F X is, ψ X (x) = We have (Müller and Rüschendorf, 2001) 1 ψ X is non-decreasing and convex; x F X (t) d t. 2 Its right derivative ψ + X (x) exists and 0 ψ+ X (x) 1; 3 lim x ψ X (x) = 0 and lim x {x ψ X (x)} = E(X ). Furthermore, for any function ψ satisfying these properties, there is a random variable X such that ψ is the integrated distribution function of X. The right derivative of ψ is the distribution function of X, F X (x) = ψ + (x). Let Y be another random variable with integrated distribution function ψ Y. Then X cx Y if and only if ψ X ψ Y and lim x {ψ Y (x) ψ X (x)} = 0.
32 ψ(t) density t t Figure : Uniform distribution
33 ψ(t) density t t Figure : Some sub-uniform distributions. Blue: uniform, green: Beta(2,2)
34 ψ(t) density t t Figure : Some sub-uniform distributions. Blue: uniform, green: Beta(2,2), red: a distribution with maximal mass at.3
35 a) Model 1 b) Model 1 discrepancy X(0) θ = 1 θ = 0 1 2α X(1) 2α f(x,θ) f(x,1) f(x,0) x c) Model 2 d) Model 2 discrepancy p(x θ) p(x θ = 0) p(x θ = 1) f(x,θ) f(x,0) f(x,1) x x Figure : Some constructive examples
36 Combining p-values Independent p-values P 1,..., P n. Two canonical tasks: 1 sub-select a set for further analysis 2 combine the p-values into one overall score of significance
37 Discrete p-values Ordinary p-value (T = f (D)): P = P(T T T ), Mid-p-value (Lancaster, 1952): Q = 1 2 P(T T T ) P(T > T T ), Lemma (Fisher s method for mid-p-values) Under the null hypothesis, 1 Q cx U, where U is a uniform random variable on the unit interval 2 As a consequence, for t 2n, where F n = 2 n i=1 log(q i) P(F n t) exp{n t/2 n log(2n/t)},
38 Discrete p-values Ordinary p-value (T = f (D)): P = P(T T T ), Mid-p-value (Lancaster, 1952): Q = 1 2 P(T T T ) P(T > T T ), Lemma (Fisher s method for mid-p-values) Under the null hypothesis, 1 Q cx U, where U is a uniform random variable on the unit interval 2 As a consequence, for t 2n, where F n = 2 n i=1 log(q i) P(F n t) exp{n t/2 n log(2n/t)},
39 Discrete p-values Ordinary p-value (T = f (D)): P = P(T T T ), Mid-p-value (Lancaster, 1952): Q = 1 2 P(T T T ) P(T > T T ), Lemma (Fisher s method for mid-p-values) Under the null hypothesis, 1 Q cx U, where U is a uniform random variable on the unit interval 2 As a consequence, for t 2n, where F n = 2 n i=1 log(q i) P(F n t) exp{n t/2 n log(2n/t)},
40 50/50 Random binary Grid of ten n=10, β= 20 0 F^(x) 1 0 F^(x) 1 0 F^(x) 1 0 x 1 0 x 1 0 x 1 n=100, β= 5 0 F^(x) 1 0 F^(x) 1 0 F^(x) 1 ordinary mid p value y=x 0 x 1 0 x 1 0 x 1 Figure : Fisher s method with mid-p-values
41 Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4): Bayarri, M. and Berger, J. O. (2000). P values for composite null models. Journal of the American Statistical Association, 95(452): Box, G. E. (1980). Sampling and Bayes inference in scientific modelling and robustness. Journal of the Royal Statistical Society. Series A (General), pages Gelman, A., Meng, X.-L., and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4): Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society. Series B (Methodological), pages Hjort, N. L., Dahl, F. A., and Steinbakk, G. H. (2006). Post-processing posterior predictive p values. Journal of the American Statistical Association, 101(475): Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301): Hoover, D. N. (1979). Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 2. Huelsenbeck, J. P., Ronquist, F., Nielsen, R., and Bollback, J. P. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294(5550): Lancaster, H. (1952). Statistical control of counting experiments. Biometrika, pages Lehmann, E. L. and Romano, J. P. (2006). Testing statistical hypotheses. Springer Science & Business Media. Meng, X.-L. (1994). Posterior predictive p-values. The Annals of Statistics, 22(3): Müller, A. and Rüschendorf, L. (2001). On the optimal stopping values induced by general dependence structures. Journal of applied probability, 38(3): Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4): Rubin, D. B. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4): Discussion of Gelman, Meng and Stern. Rubin-Delanchy, P. and Heard, N. A. (2014). A test for dependence between two point processes on the real line. arxiv preprint arxiv: Rubin-Delanchy, P. and Lawson, D. J. (2014). Posterior predictive p-values and the convex order. arxiv preprint arxiv: Shaked, M. and Shanthikumar, J. G. (2007). Stochastic orders. Springer. Sinharay, S. and Stern, H. S. (2003). Posterior predictive model checking in hierarchical models. Journal of Statistical Planning and Inference, 111(1): Steinbakk, G. H. and Storvik, G. O. (2009). Posterior predictive p-values in Bayesian hierarchical models. Scandinavian Journal of Statistics, 36(2): Strassen, V. (1965). The existence of probability measures with given marginals. The Annals of Mathematical Statistics, 36(2): Thornton, K. and Andolfatto, P. (2006). Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a netherlands population of drosophila melanogaster. Genetics, 172(3):
Monte Carlo testing with Big Data
Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:
1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
Combining Weak Statistical Evidence in Cyber Security
Combining Weak Statistical Evidence in Cyber Security Nick Heard Department of Mathematics, Imperial College London; Heilbronn Institute for Mathematical Research, University of Bristol Intelligent Data
Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012
Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic
Finding statistical patterns in Big Data
Finding statistical patterns in Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research IAS Research Workshop: Data science for the real world (workshop 1)
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.
Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Principle of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
Gambling Systems and Multiplication-Invariant Measures
Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS
INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals 1. Be able to explain the difference between the p-value and a posterior
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS
A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of
5 Directed acyclic graphs
5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate
Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
Probability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception
Exact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
The Chinese Restaurant Process
COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.
People have thought about, and defined, probability in different ways. important to note the consequences of the definition:
PROBABILITY AND LIKELIHOOD, A BRIEF INTRODUCTION IN SUPPORT OF A COURSE ON MOLECULAR EVOLUTION (BIOL 3046) Probability The subject of PROBABILITY is a branch of mathematics dedicated to building models
Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software
Lecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
1 Sufficient statistics
1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Dongfeng Li. Autumn 2010
Autumn 2010 Chapter Contents Some statistics background; ; Comparing means and proportions; variance. Students should master the basic concepts, descriptive statistics measures and graphs, basic hypothesis
Bayesian Analysis for the Social Sciences
Bayesian Analysis for the Social Sciences Simon Jackman Stanford University http://jackman.stanford.edu/bass November 9, 2012 Simon Jackman (Stanford) Bayesian Analysis for the Social Sciences November
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
Likelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
Random graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)
Section 7.1 Introduction to Hypothesis Testing Schrodinger s cat quantum mechanics thought experiment (1935) Statistical Hypotheses A statistical hypothesis is a claim about a population. Null hypothesis
False Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Notes on Continuous Random Variables
Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Inference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
E3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Lecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Estimating the Degree of Activity of jumps in High Frequency Financial Data. joint with Yacine Aït-Sahalia
Estimating the Degree of Activity of jumps in High Frequency Financial Data joint with Yacine Aït-Sahalia Aim and setting An underlying process X = (X t ) t 0, observed at equally spaced discrete times
Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation
Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers
A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
THE CENTRAL LIMIT THEOREM TORONTO
THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Permutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
How to Gamble If You Must
How to Gamble If You Must Kyle Siegrist Department of Mathematical Sciences University of Alabama in Huntsville Abstract In red and black, a player bets, at even stakes, on a sequence of independent games
Chapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
Big Data, Statistics, and the Internet
Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes
Lecture 13: Martingales
Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Model-based Synthesis. Tony O Hagan
Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
Hypothesis Testing for Beginners
Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
Chapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
MULTIVARIATE PROBABILITY DISTRIBUTIONS
MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined
An Introduction to Basic Statistics and Probability
An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
5.1 Identifying the Target Parameter
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 20, 2012 Date of latest update: August 20 Lecture 5: Estimation with Confidence intervals 5.1 Identifying
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.
Random access protocols for channel access Markov chains and their stability [email protected] Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines
Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
Parametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
Lecture 7: Continuous Random Variables
Lecture 7: Continuous Random Variables 21 September 2005 1 Our First Continuous Random Variable The back of the lecture hall is roughly 10 meters across. Suppose it were exactly 10 meters, and consider
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
Gambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
Probability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
Week 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
Towards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
Convex Hull Probability Depth: first results
Conve Hull Probability Depth: first results Giovanni C. Porzio and Giancarlo Ragozini Abstract In this work, we present a new depth function, the conve hull probability depth, that is based on the conve
Chapter 2. Hypothesis testing in one population
Chapter 2. Hypothesis testing in one population Contents Introduction, the null and alternative hypotheses Hypothesis testing process Type I and Type II errors, power Test statistic, level of significance
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Introduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
A Statistical Framework for Operational Infrasound Monitoring
A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,
Joint Exam 1/P Sample Exam 1
Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question
Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100
Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,
Supplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Separation Properties for Locally Convex Cones
Journal of Convex Analysis Volume 9 (2002), No. 1, 301 307 Separation Properties for Locally Convex Cones Walter Roth Department of Mathematics, Universiti Brunei Darussalam, Gadong BE1410, Brunei Darussalam
