Message-passing sequential detection of multiple change points in networks

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal 1

2 Sequential change-point detection Quickest detection of change in distribution of a sequence of data data collected sequentially over time tradeoff between false alarm rate and detection delay time extensions to decentralized network with a fusion center classical setting involves only one single change point variable Nguyen/Amini/Rajagopal 2

3 Sequential change-point detection Quickest detection of change in distribution of a sequence of data data collected sequentially over time tradeoff between false alarm rate and detection delay time extensions to decentralized network with a fusion center classical setting involves only one single change point variable We study problems requiring detection of multiple change points in multiple sequences across network sites multiple change points are statistically dependent need to borrow information across network sites no fusion center needs message-passing type algorithm new elements of modeling and asymptotic theory Nguyen/Amini/Rajagopal 2-a

4 Example Simultaneous traffic monitoring Problem: detecting in real-time potential hotspots in a traffic network data are sequences of measurements of traffic volume at multiple sites sequential change point detection for each site Nguyen/Amini/Rajagopal 3

5 Sequential detection for single change point network site j collects sequence of data X n j for n = 1, 2,... time λ j N is change point variable for site j data are i.i.d. according to density g before the change point; and i.i.d. according to f after Nguyen/Amini/Rajagopal 4

6 Sequential detection for single change point network site j collects sequence of data X n j for n = 1, 2,... time λ j N is change point variable for site j data are i.i.d. according to density g before the change point; and i.i.d. according to f after a sequential change point detection procedure is a stopping time τ j, i.e., {τ j n} σ(x [n] j ) Nguyen/Amini/Rajagopal 4-a

7 Sequential detection for single change point network site j collects sequence of data X n j for n = 1, 2,... time λ j N is change point variable for site j data are i.i.d. according to density g before the change point; and i.i.d. according to f after a sequential change point detection procedure is a stopping time τ j, i.e., {τ j n} σ(x [n] j ) Neyman-Pearson criterion: constraint on false alarm error PFA(τ j ) = P(τ j < λ j ) α for some small α minimum detection delay E[(τ j λ j ) τ j λ j ]. Nguyen/Amini/Rajagopal 4-b

8 Optimal rule for single change point detection taking a Bayesian approach, λ j is endowed with a prior under some conditions, optimal sequential rule obtained by thresholding the posterior of λ j : (Shiryaev, 1978) where τ j = inf{n : Λ n 1 α}, Λ n = P(λ j n X [n] j ). well-established asymptotic properties (Tartakovsky & Veeravalli, 2006): false alarm: PFA(τ j ) α. detection delay: D(τ j ) = «log α 1 + o(1) I j + d as α 0. here I j = KL(f j g j ), the Kullback-Leibler information, constant d depends on the prior Nguyen/Amini/Rajagopal 5

9 Extensions to network setting. survey paper by Tsitsiklis (1993) decentralized sequential detection: Veeravalli, Basar and Poor (1993), Mei (2008), Nguyen, Wainwright and Jordan (2008) sequential change diagnosis: Dayanik, Goulding and Poor (2008) multiple sequence change point detection: Xie and Siegmund (2010) sequential detection of a markov process: Raghavan and Veeravalli (2010)... Nguyen/Amini/Rajagopal 6

10 Talk outline statistical formulation for sequential detection of multiple change points in a network setting probabilistic graphical models extension of sequential analysis to multiple change point variables sequential and real-time message-passing detection algorithms decision procedures with limited data and computation asymptotic theory characterizing detection delay and algorithm convergence roles of graphical models in asymptotic analysis Nguyen/Amini/Rajagopal 7

11 Graphical models for multiple change points m network sites labeled by U = {1,...,m} given a graph G = (U, E) that specifies the the connections among u U each site j experiences a change at time λ j N λ j is endowed with (independent) prior distribution π j Nguyen/Amini/Rajagopal 8

12 Graphical models for multiple change points m network sites labeled by U = {1,...,m} given a graph G = (U, E) that specifies the the connections among u U each site j experiences a change at time λ j N λ j is endowed with (independent) prior distribution π j there may be private data sequence (X n j ) n 1 for site j private data sequence changes its distribution after λ j Nguyen/Amini/Rajagopal 8-a

13 Graphical models for multiple change points m network sites labeled by U = {1,...,m} given a graph G = (U, E) that specifies the the connections among u U each site j experiences a change at time λ j N λ j is endowed with (independent) prior distribution π j there may be private data sequence (X n j ) n 1 for site j private data sequence changes its distribution after λ j there is shared data sequence (X n ij ) n 1 for each edge e = (i, j) connecting neighboring pair of sites j and i: X n ij iid g ij ( ), for n < λ ij := min(λ i, λ j ) iid f ij ( ), for n λ ij = min(λ i, λ j ) Nguyen/Amini/Rajagopal 8-b

14 Graphical model of change points and data sequences X 23 λ 3 λ 1 λ 3 λ 1 λ 2 λ 2 X 24 X 45 λ 4 λ 5 X 12 λ 4 λ 5 (a) Topology of sensor network (b) Graphical model of random variables Joint distribution of change points and observed network data at time n: P(λ, X n ) = π j (λ j ) P(X n j λ j ) P(X n ij λ i λ j ) j V j V (ij) E Star notations: λ := (λ 1,...,λ m ), X n = (X n 1,...,X n m). Change point variables are statistically dependent a posteriori! Nguyen/Amini/Rajagopal 9

15 Min-functional of change points λ 3 X 23 λ 1 λ 2 λ 2 X 24 X 45 λ 1 λ 3 λ 4 λ 5 X 12 λ 4 λ 5 Let S be a subset of network sites. Define the earliest change point among any sites in S: λ S := min u S λ j. Question: what is the optimal stopping rule τ S for estimating λ S? τ S σ(x [n] ). Nguyen/Amini/Rajagopal 10

16 A natural rule is by thresholding the posterior probability: τ S = inf{n : P(min λ j n X [n] ) 1 α}, u S for small α > 0. Nguyen/Amini/Rajagopal 11

17 A natural rule is by thresholding the posterior probability: for small α > 0. τ S = inf{n : P(min λ j n X [n] ) 1 α}, u S This rule is sub-optimal (unlike the single change point case, which is optimal under some conditions on the prior). But it will be shown to be asymptotically optimal and computationally tractable. Nguyen/Amini/Rajagopal 11-a

18 Message-passing distributed computation via sum-product algorithm: the issue to compute posterior probabilities, assuming that data and statistical messages can be only be passed through the graphical structure: P(λ S n X [n] ) 1 α}. X 23 λ 3 λ 1 λ 2 λ 3 m n 12 λ 1 λ 2 m n 32 X 24 X 45 m n 24 λ 4 λ 5 (a) Topology of sensor network X 12 λ 4 λ 5 m n 45 (b) Message-passing in network Simple to implement via an adaptation of the sum-product algorithm Computational complexity. When G is a tree, the computational complexity of the message passing algorithm at each time step n is O(( V + E )n), but linearity in n is not desirable. Nguyen/Amini/Rajagopal 12

19 Mean-field approximation. Define latent binary variables Z n j = I(λ j n). Compute P(Z n X [n] ) in terms of P(Z n X [n 1] ) by Bayes rule. Decoupling approximation: As n gets large, due to concentration, the variables Zj n become decoupled across the graph. So, approximate: P(Z n X [n 1] ) j V P(Z n j X [n 1] ) In effect, we have avoided marginalization over time at every time step, resulting in O(1) computational complexity in n. Theorem 1. Both exact message-passing algorithm and mean-field approximation algorithm construct a Markov sequence of posterior probabilities that obey a contraction map. This entails that both sequences converge to 1 almost surely. Nguyen/Amini/Rajagopal 13

20 Approximation of posterior paths, n P(λ j n X [n] ) MP APPROX MP APPROX MP APPROX MP APPROX MP APPROX MP APPROX Nguyen/Amini/Rajagopal 14

21 Main Theorem (optimal delay theorem). Assume that (a) The change points λ j are endowed with independent geometric priors. (b) The likelihood ratio functions are bounded from above. Then the proposed stopping rule τ S satisfies: (i) False alarm rate: P(τ S λ S ) α. (ii) The expected delay is asymptotically optimal, and takes the form: E[(τ S min u S λ j) τ S min u S λ j] = log α (1 + o(1)) d + I j +. I ij j S (ij) E S } {{ } I λs Here, I j = f i log(f j /g j ), and I ij = f ij log(f ij /g ij ). Nguyen/Amini/Rajagopal 15

22 Graph-based Kullback-Leibler information... X 23 λ 3 λ 1 λ 3 λ 1 λ 2 λ 2 X 24 X 45 λ 4 λ 5 X 12 λ 4 λ 5 If S = {1}, then I λs = I 1 If S = {1, 2}, then I λs = I 1 + I 2 + I 12 If S = {1, 2, 3}, then I λs = I 1 + I 2 + I 3 + I 12 + I 23. Nguyen/Amini/Rajagopal 16

23 Concentration inequalities for marginal LRs For φ = min u S λ j, define marginal likelihood ratio D k,n φ := D k φ(x n ) := Pk φ (Xn ) P φ (Xn ), where P k φ denotes P( φ = k). Define conditional prior probability π k φ (m ) := P(λ = m φ = k). By a general result of Tartakovski & Veeravalli (2006), if 1 ] Pφ[ k N max log 1 n N Dk φ(x k+n N ) (1 + ε)i φ 0 (1) for all (small) ε > 0 and all k N, then the lower bound follows, inf eτ φ (α) E [ τ φ τ φ ] log α ( ) q φ +I φ 1 + o(1). Nguyen/Amini/Rajagopal 17

24 Furthermore, let T k ε := sup { n N : By Tartakovski-Veeravalli (2006), if one has 1 } n log Dk φ(x k+n 1 ) < I φ ε. E T φ ε := P(φ = k) E k φ(tε k ) <, (2) k=1 for all (small) ε > 0, then the upper bound follows, that is, E[τ S φ log α τ S φ] q φ +I φ (1 + o(1)). Both conditions (1) and (2) can be deduced from an elaborate form of concentration inequality for the marginal likelihood ratio. Nguyen/Amini/Rajagopal 18

25 Key concentration lemma. Denote by P m λ the conditional probability P( λ = m ). Assume that for all m N d in the support of πφ k( ), { P m 1 } λ n log Dk φ(x n ) I φ > ε q(n) exp( c 1 nε 2 ) (3) for all n N and ε (0, ε 0 ) such that n 1 ε 2 p2 (m, k), where p( ) and q( ) are polynomials with nonnegative coefficients, both P(φ = ) and P(λ j φ = k) have finite polynomial moments. Then the optimal delay Theorem holds. Nguyen/Amini/Rajagopal 19

26 Probabilistic calculus of ǫ-equivalence Definition. Consider two sequences {a n } and {b n } of random variables, where a n = a n (k) and b n = b n (k) could depend on a common parameter k N. The two sequences are called asymptotically ε-equivalent as n, under {P m λ : m supp(πφ k )}, and denoted a n ε bn, if there exist polynomials p( ) and q( ) (with constant nonnegative coefficients), and ε 0 > 0, such that for all m supp(πφ k ), we have P m λ ( a n b n ε) 1 q(n)e c 1nε 2 for all n N and ε (0, ε 0 ) satisfying nε p(m, k). Nguyen/Amini/Rajagopal 20

27 By union bound and algebraic manipulation, we obtain the following rules: 1. a n ε bn implies a n Cε b n for C > 0 and αa n ε αbn for α R. 2. a n ε bn and b n ε cn implies a n ε cn. (Transitivity) 3. a n ε bn and c n ε dn implies a n ± c n ε bn ± d n. 4. a n ε bn implies max{a n, c n } ε max{b n, c n }. 5. a n ε bn, c n ε 1 and {bn } bounded implies a n c n ε b n. 6. a n ε a > 0 and bn ε b < 0 implies max{an, b n } ε a. 7. log sum-max inequality for positive sequences {a n } and {b n }: n 1 log(a n + b n ) ε max{n 1 log a n, n 1 log b n }. Based on this calculus we can deduce the ǫ-equivalence of the marginal likelihood ratio from the ǫ-equivalence of the likelihood ratios defined on individual sites and edges of neighboring sites. Nguyen/Amini/Rajagopal 21

28 Plots of the slope 1 log α E[τ S φ S τ S φ S ] for star network of (1,2,3,4) centering at j = 1 SINGLE MP APPROX LIMIT j = 2 SINGLE MP APPROX LIMIT e = (1,2) SINGLE MP APPROX LIMIT e = (3,2) SINGLE MP APPROX LIMIT Nguyen/Amini/Rajagopal 22

29 Summary decentralized sequential detection of multiple change points model, algorithm and asymptotic theory needed to go beyond single change point setting new statistical formulation drawing from: classical sequential analysis probabilistic graphical models (Bayes nets) introduced a message-passing sequential detection algorithm, exploiting the benefit of network information asymptotic theory for analyzing false alarm rates and detection delay Nguyen/Amini/Rajagopal 23

30 for more detail, see A. Amini and X. Nguyen. Sequential detection of multiple change points: A graphical models approach. Technical report, Department of Statistics, Univ of Michigan, See also: R. Rajagopal, X. Nguyen, S.C. Ergen and P. Varaiya. Simultaneous sequential detection of multiple interacting faults. Nguyen/Amini/Rajagopal 24

INFORMATION from a single node (entity) can reach other

1 Network Infusion to Infer Information Sources in Networks Soheil Feizi, Muriel Médard, Gerald Quon, Manolis Kellis, and Ken Duffy arxiv:166.7383v1 [cs.si] 23 Jun 216 Abstract Several significant models

The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

Channel Estimation Error in Distributed Detection Systems Hamidreza Ahmadi Outline Detection Theory Neyman-Pearson Method Classical l Distributed ib t Detection ti Fusion and local sensor rules Channel

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

Introduction to Detection Theory

Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see http://www.ece.rice.edu/~dhj/courses/elec531/notes5.pdf. Ch. 10 in Wasserman. EE 527, Detection

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

A Practical Scheme for Wireless Network Operation

A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving

Lecture 4: BK inequality 27th August and 6th September, 2007

CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

Lecture 13: Martingales

Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

L4: Bayesian Decision Theory

L4: Bayesian Decision Theory Likelihood ratio test Probability of error Bayes risk Bayes, MAP and ML criteria Multi-class problems Discriminant functions CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as

A mixture model for random graphs

A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:

Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

Randomization Approaches for Network Revenue Management with Customer Choice Behavior

Randomization Approaches for Network Revenue Management with Customer Choice Behavior Sumit Kunnumkal Indian School of Business, Gachibowli, Hyderabad, 500032, India sumit kunnumkal@isb.edu March 9, 2011

Modeling and Performance Evaluation of Computer Systems Security Operation 1

Modeling and Performance Evaluation of Computer Systems Security Operation 1 D. Guster 2 St.Cloud State University 3 N.K. Krivulin 4 St.Petersburg State University 5 Abstract A model of computer system

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

Random graphs with a given degree sequence

Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

Markov random fields and Gibbs measures

Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

Fuzzy Probability Distributions in Bayesian Analysis

Fuzzy Probability Distributions in Bayesian Analysis Reinhard Viertl and Owat Sunanta Department of Statistics and Probability Theory Vienna University of Technology, Vienna, Austria Corresponding author:

Statistical Machine Learning from Data

Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

61. REARRANGEMENTS 119

61. REARRANGEMENTS 119 61. Rearrangements Here the difference between conditionally and absolutely convergent series is further refined through the concept of rearrangement. Definition 15. (Rearrangement)

Quickest Detection in Multiple ON OFF Processes Qing Zhao, Senior Member, IEEE, and Jia Ye

5994 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 12, DECEMBER 2010 Quickest Detection in Multiple ON OFF Processes Qing Zhao, Senior Member, IEEE, and Jia Ye Abstract We consider the quickest

THE DYING FIBONACCI TREE. 1. Introduction. Consider a tree with two types of nodes, say A and B, and the following properties:

THE DYING FIBONACCI TREE BERNHARD GITTENBERGER 1. Introduction Consider a tree with two types of nodes, say A and B, and the following properties: 1. Let the root be of type A.. Each node of type A produces

Section 3 Sequences and Limits, Continued.

Section 3 Sequences and Limits, Continued. Lemma 3.6 Let {a n } n N be a convergent sequence for which a n 0 for all n N and it α 0. Then there exists N N such that for all n N. α a n 3 α In particular

Linear Models for Classification

Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

Notes on metric spaces

Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

On Decentralized Detection with Partial Information Sharing among Sensors

On Decentralized Detection with Partial Information Sharing among Sensors The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

I. Pointwise convergence

MATH 40 - NOTES Sequences of functions Pointwise and Uniform Convergence Fall 2005 Previously, we have studied sequences of real numbers. Now we discuss the topic of sequences of real valued functions.

A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

Proximal mapping via network optimization

L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

Some stability results of parameter identification in a jump diffusion model

Some stability results of parameter identification in a jump diffusion model D. Düvelmeyer Technische Universität Chemnitz, Fakultät für Mathematik, 09107 Chemnitz, Germany Abstract In this paper we discuss

An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

Discrete Optimization

Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Random access protocols for channel access Markov chains and their stability laurent.massoulie@inria.fr Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines

LECTURE 4. Last time: Lecture outline

LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

2.3 Convex Constrained Optimization Problems

42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails

12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint

Week 1: Introduction to Online Learning

Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / -21-8418-9 Cesa-Bianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider

arxiv:1408.3610v1 [math.pr] 15 Aug 2014

PageRank in scale-free random graphs Ningyuan Chen, Nelly Litvak 2, Mariana Olvera-Cravioto Columbia University, 500 W. 20th Street, 3rd floor, New York, NY 0027 2 University of Twente, P.O.Box 27, 7500AE,

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Greedy Link Scheduler for Wireless Networks With Gaussian Multiple-Access and Broadcast Channels Arun Sridharan, Student Member, IEEE, C Emre Koksal, Member, IEEE,

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

Let H and J be as in the above lemma. The result of the lemma shows that the integral

Let and be as in the above lemma. The result of the lemma shows that the integral ( f(x, y)dy) dx is well defined; we denote it by f(x, y)dydx. By symmetry, also the integral ( f(x, y)dx) dy is well defined;

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

Monte Carlo Simulation

1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: sweber@stochastik.uni-hannover.de web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging

Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

A Network Flow Approach in Cloud Computing

1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

Math 115 Spring 2011 Written Homework 5 Solutions

. Evaluate each series. a) 4 7 0... 55 Math 5 Spring 0 Written Homework 5 Solutions Solution: We note that the associated sequence, 4, 7, 0,..., 55 appears to be an arithmetic sequence. If the sequence

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center Hewlett-Packard

Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

Classification - Examples

Lecture 2 Scheduling 1 Classification - Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking

LECTURE 15: AMERICAN OPTIONS

LECTURE 15: AMERICAN OPTIONS 1. Introduction All of the options that we have considered thus far have been of the European variety: exercise is permitted only at the termination of the contract. These

Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

The Chinese Restaurant Process

COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.

Imperfect Debugging in Software Reliability

Imperfect Debugging in Software Reliability Tevfik Aktekin and Toros Caglar University of New Hampshire Peter T. Paul College of Business and Economics Department of Decision Sciences and United Health

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

Monte Carlo-based statistical methods (MASM11/FMS091)

Monte Carlo-based statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlo-based

Probabilistic Graphical Models

Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

RECURSIVE TREES. Michael Drmota. Institue of Discrete Mathematics and Geometry Vienna University of Technology, A 1040 Wien, Austria

RECURSIVE TREES Michael Drmota Institue of Discrete Mathematics and Geometry Vienna University of Technology, A 040 Wien, Austria michael.drmota@tuwien.ac.at www.dmg.tuwien.ac.at/drmota/ 2008 Enrage Topical

False Discovery Rates

False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

Single machine parallel batch scheduling with unbounded capacity

Workshop on Combinatorics and Graph Theory 21th, April, 2006 Nankai University Single machine parallel batch scheduling with unbounded capacity Yuan Jinjiang Department of mathematics, Zhengzhou University

Set theory, and set operations

1 Set theory, and set operations Sayan Mukherjee Motivation It goes without saying that a Bayesian statistician should know probability theory in depth to model. Measure theory is not needed unless we

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING

QUICKEST MULTIDECISION ABRUPT CHANGE DETECTION WITH SOME APPLICATIONS TO NETWORK MONITORING I. Nikiforov Université de Technologie de Troyes, UTT/ICD/LM2S, UMR 6281, CNRS 12, rue Marie Curie, CS 42060

A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA: A FIRST REPORT

New Mathematics and Natural Computation Vol. 1, No. 2 (2005) 295 303 c World Scientific Publishing Company A HYBRID GENETIC ALGORITHM FOR THE MAXIMUM LIKELIHOOD ESTIMATION OF MODELS WITH MULTIPLE EQUILIBRIA:

To discuss this topic fully, let us define some terms used in this and the following sets of supplemental notes.

INFINITE SERIES SERIES AND PARTIAL SUMS What if we wanted to sum up the terms of this sequence, how many terms would I have to use? 1, 2, 3,... 10,...? Well, we could start creating sums of a finite number

Discuss the size of the instance for the minimum spanning tree problem.

3.1 Algorithm complexity The algorithms A, B are given. The former has complexity O(n 2 ), the latter O(2 n ), where n is the size of the instance. Let n A 0 be the size of the largest instance that can

Near Optimal Solutions

Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving

Network Security A Decision and Game-Theoretic Approach

Network Security A Decision and Game-Theoretic Approach Tansu Alpcan Deutsche Telekom Laboratories, Technical University of Berlin, Germany and Tamer Ba ar University of Illinois at Urbana-Champaign, USA

Transfer Learning! and Prior Estimation! for VC Classes!

15-859(B) Machine Learning Theory! Transfer Learning! and Prior Estimation! for VC Classes! Liu Yang! 04/25/2012! Liu Yang 2012 1 Notation! Instance space X = R n! Concept space C of classifiers h: X ->

Introduction to Markov Chain Monte Carlo

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem

VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS

VERTICES OF GIVEN DEGREE IN SERIES-PARALLEL GRAPHS MICHAEL DRMOTA, OMER GIMENEZ, AND MARC NOY Abstract. We show that the number of vertices of a given degree k in several kinds of series-parallel labelled

Compression algorithm for Bayesian network modeling of binary systems

Compression algorithm for Bayesian network modeling of binary systems I. Tien & A. Der Kiureghian University of California, Berkeley ABSTRACT: A Bayesian network (BN) is a useful tool for analyzing the

WE consider a decentralized detection problem and study

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 4, APRIL 2011 1759 On Decentralized Detection With Partial Information Sharing Among Sensors O. Patrick Kreidl, John N. Tsitsiklis, Fellow, IEEE, and

Offline sorting buffers on Line

Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

Bayesian logistic betting strategy against probability forecasting. Akimichi Takemura, Univ. Tokyo. November 12, 2012

Bayesian logistic betting strategy against probability forecasting Akimichi Takemura, Univ. Tokyo (joint with Masayuki Kumon, Jing Li and Kei Takeuchi) November 12, 2012 arxiv:1204.3496. To appear in Stochastic