Introduction to Detection Theory

Size: px
Start display at page:

Download "Introduction to Detection Theory"

Transcription

1 Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see Ch. 10 in Wasserman. EE 527, Detection and Estimation Theory, # 5 1

2 Introduction to Detection Theory (cont.) We wish to make a decision on a signal of interest using noisy measurements. Statistical tools enable systematic solutions and optimal design. Application areas include: Communications, Radar and sonar, Nondestructive evaluation (NDE) of materials, Biomedicine, etc. EE 527, Detection and Estimation Theory, # 5 2

3 Example: Radar Detection. We wish to decide on the presence or absence of a target. EE 527, Detection and Estimation Theory, # 5 3

4 Introduction to Detection Theory We assume a parametric measurement model p(x θ) [or p(x ; θ), which is the notation that we sometimes use in the classical setting]. In point estimation theory, we estimated the parameter θ Θ given the data x. Suppose now that we choose Θ 0 and Θ 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 = Θ, Θ 0 Θ 1 =. In detection theory, we wish to identify which hypothesis is true (i.e. make the appropriate decision): H 0 : θ Θ 0, null hypothesis H 1 : θ Θ 1, alternative hypothesis. Terminology: If θ can only take two values, Θ = {θ 0, θ 1 }, Θ 0 = {θ 0 }, Θ 1 = {θ 1 } we say that the hypotheses are simple. Otherwise, we say that they are composite. Composite Hypothesis Example: H 0 : θ = 0 versus H 1 : θ (0, ). EE 527, Detection and Estimation Theory, # 5 4

5 The Decision Rule We wish to design a decision rule (function) φ(x) : X (0, 1): φ(x) = { 1, decide H1, 0, decide H 0. which partitions the data space X [i.e. the support of p(x θ)] into two regions: Rule φ(x): X 0 = {x : φ(x) = 0}, X 1 = {x : φ(x) = 1}. Let us define probabilities of false alarm and miss: P FA = E x θ [φ(x) θ] = p(x θ) dx for θ in Θ 0 X 1 P M = E x θ [1 φ(x) θ] = 1 p(x θ) dx X 1 = p(x θ) dx for θ in Θ 1. X 0 Then, the probability of detection (correctly deciding H 1 ) is P D = 1 P M = E x θ [φ(x) θ] = X 1 p(x θ) dx for θ in Θ 1. EE 527, Detection and Estimation Theory, # 5 5

6 Note: P FA and P D /P M are generally functions of the parameter θ (where θ Θ 0 when computing P FA and θ Θ 1 when computing P D /P M ). More Terminology. Statisticians use the following terminology: False alarm Type I error Miss Type II error Probability of detection Power Probability of false alarm Significance level. EE 527, Detection and Estimation Theory, # 5 6

7 Bayesian Decision-theoretic Detection Theory Recall (a slightly generalized version of) the posterior expected loss: ρ(action x) = L(θ, action) p(θ x) dθ Θ that we introduced in handout # 4 when we discussed Bayesian decision theory. Let us now apply this theory to our easy example discussed here: hypothesis testing, where our action space consists of only two choices. We first assign a loss table: decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) x X 0 L(0 1) L(0 0) = 0 with the loss function described by the quantities L(declared true): L(1 0) quantifies loss due to a false alarm, L(0 1) quantifies loss due to a miss, L(1 1) and L(0 0) (losses due to correct decisions) typically set to zero in real life. Here, we adopt zero losses for correct decisions. EE 527, Detection and Estimation Theory, # 5 7

8 Now, our posterior expected loss takes two values: ρ 0 (x) = L(0 1) p(θ x) dθ Θ 1 + L(0 0) p(θ x) dθ } {{ } Θ 0 0 = L(0 1) p(θ x) dθ Θ 1 L(0 1) is constant = L(0 1) p(θ x) dθ Θ } 1 {{ } P [θ Θ 1 x] and, similarly, ρ 1 (x) = L(1 0) p(θ x) dθ Θ 0 L(1 0) is constant = L(1 0) p(θ x) dθ. Θ } 0 {{ } P [θ Θ 0 x] We define the Bayes decision rule as the rule that minimizes the posterior expected loss; this rule corresponds to choosing the data-space partitioning as follows: X 1 = {x : ρ 1 (x) ρ 0 (x)} EE 527, Detection and Estimation Theory, # 5 8

9 or P [θ Θ 1 x] { }} { { p(θ x) dθ } Θ X 1 = x : 1 L(1 0) L(0 1) p(θ x) dθ Θ } 0 {{ } P [θ Θ 0 x] or, equivalently, upon applying the Bayes rule: X 1 = { x : Θ 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 0-1 loss: For L(1 0) = L(0 1) = 1, we have (1) } L(1 0) L(0 1). (2) decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) = 1 x X 0 L(0 1) = 1 L(0 0) = 0 yielding the following Bayes decision rule, called the maximum a posteriori (MAP) rule: X 1 = { x : P [θ Θ 1 x] P [θ Θ 0 x] 1 } or, equivalently, upon applying the Bayes rule: { } Θ X 1 = x : 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 1. (4) (3) EE 527, Detection and Estimation Theory, # 5 9

10 Simple hypotheses. Let us specialize (1) to the case of simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): X 1 = { x : p(θ 1 x) p(θ 0 x) } {{ } posterior-odds ratio } L(1 0) L(0 1). (5) We can rewrite (5) using the Bayes rule: X 1 = { x : p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio } π 0 L(1 0) π 1 L(0 1) (6) where π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 describe the prior probability mass function (pmf) of the binary random variable θ (recall that θ {θ 0, θ 1 }). Hence, for binary simple hypotheses, the prior pmf of θ is the Bernoulli pmf. EE 527, Detection and Estimation Theory, # 5 10

11 Preposterior (Bayes) Risk The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 L(1 0) p(x θ)π(θ) dθ dx Θ 0 Θ 1 L(0 1) p(x θ)π(θ) dθ dx. How do we choose the rule φ(x) that minimizes the preposterior EE 527, Detection and Estimation Theory, # 5 11

12 risk? L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 = L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 = const } {{ } not dependent on φ(x) { + L(1 0) p(x θ)π(θ) dθ X 1 Θ 0 } L(0 1) p(x θ)π(θ) dθ dx Θ 1 implying that X 1 should be chosen as { } X 1 : L(1 0) p(x θ)π(θ) dθ L(0 1) p(x θ)π(θ) < 0 Θ 0 Θ 1 EE 527, Detection and Estimation Theory, # 5 12

13 which, as expected, is the same as (2), since minimizing the posterior expected loss minimizing the preposterior risk for every x as showed earlier in handout # loss: For the 0-1 loss, i.e. L(1 0) = L(0 1) = 1, the preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx (7) which is simply the average error probability, with averaging performed over the joint probability density or mass function (pdf/pmf) or the data x and parameters θ. EE 527, Detection and Estimation Theory, # 5 13

14 Bayesian Decision-theoretic Detection for Simple Hypotheses The Bayes decision rule for simple hypotheses is (6): Λ(x) } {{ } likelihood ratio = p(x θ 1) p(x θ 0 ) H 1 π 0 L(1 0) π 1 L(0 1) τ (8) see also Ch. 3.7 in Kay-II. (Recall that Λ(x) is the sufficient statistic for the detection problem, see p. 37 in handout # 1.) Equivalently, log Λ(x) = log[p(x θ 1 )] log[p(x θ 0 )] H 1 log τ τ. Minimum Average Error Probability Detection: In the familiar 0-1 loss case where L(1 0) = L(0 1) = 1, we know that the preposterior (Bayes) risk is equal to the average error probability, see (7). This average error probability greatly EE 527, Detection and Estimation Theory, # 5 14

15 simplifies in the simple hypothesis testing case: av. error probability = L(1 0) p(x θ } {{ } 0 )π 0 dx X L(0 1) p(x θ } {{ } 1 )π 1 dx X 0 1 = π 0 p(x θ 0 ) dx +π 1 p(x θ 1 ) dx X } 1 X {{ } } 0 {{ } P FA P M where, as before, the averaging is performed over the joint pdf/pmf of the data x and parameters θ, and π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 (the Bernoulli pmf). In this case, our Bayes decision rule simplifies to the MAP rule (as expected, see (5) and Ch. 3.6 in Kay-II): p(θ 1 x) p(θ 0 x) } {{ } posterior-odds ratio or, equivalently, upon applying the Bayes rule: p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio H 1 1 (9) H 1 π 0 π 1. (10) EE 527, Detection and Estimation Theory, # 5 15

16 which is the same as (4), upon substituting the Bernoulli pmf as the prior pmf for θ and (8), upon substituting L(1 0) = L(0 1) = 1. EE 527, Detection and Estimation Theory, # 5 16

17 Bayesian Decision-theoretic Detection Theory: Handling Nuisance Parameters We apply the same approach as before integrate the nuisance parameters (ϕ, say) out! Therefore, (1) still holds for testing H 0 : θ Θ 0 versus H 1 : θ Θ 1 but p θ x (θ x) is computed as follows: p θ x (θ x) = p θ,ϕ x (θ, ϕ x) dϕ and, therefore, Θ 1 Θ 0 p(θ x) { }} { p θ,ϕ x (θ, ϕ x) dϕ dθ p θ,ϕ x (θ, ϕ x) dϕ } {{ } p(θ x) dθ H 1 L(1 0) L(0 1) (11) EE 527, Detection and Estimation Theory, # 5 17

18 or, equivalently, upon applying the Bayes rule: Θ 1 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ Θ 0 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ H 1 L(1 0) L(0 1). (12) Now, if θ and ϕ are independent a priori, i.e. π θ,ϕ (θ, ϕ) = π θ (θ) π ϕ (ϕ) (13) then (12) can be rewritten as Θ 1 π θ (θ) Θ 0 π θ (θ) p(x θ) { }} { p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) dϕ dθ p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) } {{ } p(x θ) dϕ dθ H 1 L(1 0) L(0 1). (14) Simple hypotheses and independent priors for θ and ϕ: Let us specialize (11) to the simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): pθ,ϕ x (θ 1, ϕ x) dϕ pθ,ϕ x (θ 0, ϕ x) dϕ H 1 L(1 0) L(0 1). (15) Now, if θ and ϕ are independent a priori, i.e. (13) holds, then EE 527, Detection and Estimation Theory, # 5 18

19 we can rewrite (14) [or (15) using the Bayes rule]: px θ,ϕ (x θ 1, ϕ) π ϕ (ϕ) dϕ px θ,ϕ (x θ 0, ϕ) π ϕ (ϕ) dϕ } {{ } integrated likelihood ratio = same as (6) { }} { p(x θ 1 ) H 1 π 0 L(1 0) p(x θ 0 ) π 1 L(0 1) (16) where π 0 = π θ (θ 0 ), π 1 = π θ (θ 1 ) = 1 π 0. EE 527, Detection and Estimation Theory, # 5 19

20 Chernoff Bound on Average Error Probability for Simple Hypotheses Recall that minimizing the average error probability av. error probability = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx leads to the MAP decision rule: X 1 = { x : Θ 0 p(x θ) π(θ) dθ } p(x θ) π(θ) dθ < 0. Θ 1 In many applications, we may not be able to obtain a simple closed-form expression for the minimum average error EE 527, Detection and Estimation Theory, # 5 20

21 probability, but we can bound it as follows: min av. error probability = X1 p(x θ)π(θ) dθ dx Θ 0 + p(x θ)π(θ) dθ dx Θ 1 X 0 using the def. of X 1 = X }{{} data space { } min p(x θ)π(θ) dθ, p(x θ)π(θ) dθ) dx Θ 0 Θ 1 = X X [ = q 0 (x) = q { }} { 1 (x) ] λ [ { }} { ] 1 λ p(x θ)π(θ) dθ p(x θ)π(θ) dθ dx Θ 0 Θ 1 [q 0 (x)] λ [q 1 (x)] 1 λ dx which is the Chernoff bound on the minimum average error probability. Here, we have used the fact that min{a, b} a λ b 1 λ, for 0 λ 1, a, b 0. When x = x 1 x 2. x N EE 527, Detection and Estimation Theory, # 5 21

22 with x 1, x 2,... x N conditionally independent, identically distributed (i.i.d.) given θ and simple hypotheses (i.e. Θ 0 = {θ 0 }, Θ 1 = {θ 1 }) then N q 0 (x) = p(x θ 0 ) π 0 = π 0 p(x n θ 0 ) yielding n=1 N q 1 (x) = p(x θ 1 ) π 1 = π 1 p(x n θ 1 ) n=1 Chernoff bound for N conditionally i.i.d. measurements (given θ) and simple hyp. [ N ] λ N 1 λ = π 0 p(x n θ 0 ) [π 1 p(x n θ 1 )] dx n=1 = π λ 0 π 1 λ 1 = π λ 0 π 1 λ 1 N n=1 n=1 { [p(x n θ 0 )] λ [p(x n θ 1 )] 1 λ dx n } { [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1 } N EE 527, Detection and Estimation Theory, # 5 22

23 or, in other words, 1 N log(min av. error probability) log(πλ 0 π1 1 λ ) + log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1, λ [0, 1]. If π 0 = π 1 = 1/2 (which is almost always the case of interest when evaluating average error probabilities), we can say that, as N, min av. error probability for N cond. i.i.d. measurements (given θ) and simple hypotheses ( { f(n) exp N min log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ} ) λ [0,1] } {{ } Chernoff information for a single observation where f(n) is a slowly-varying function compared with the exponential term: lim N log f(n) N = 0. Note that the Chernoff information in the exponent term of the above expression quantifies the asymptotic behavior of the minimum average error probability. We now give a useful result, taken from EE 527, Detection and Estimation Theory, # 5 23

24 K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., San Diego, CA: Academic Press, 1990 for evaluating a class of Chernoff bounds. Lemma 1. Consider p 1 (x) = N (µ 1, Σ 1 ) and p 2 (x) = N (µ 2, Σ 2 ). Then [p 1 (x)] λ [p 2 (x)] 1 λ dx = exp[ g(λ)] where g(λ) = λ (1 λ) 2 (µ 2 µ 1 ) T [λ Σ 1 + (1 λ) Σ 2 ] 1 (µ 2 µ 1 ) [ λ log Σ1 + (1 λ) Σ 2 ] Σ 1 λ Σ 2 1 λ. EE 527, Detection and Estimation Theory, # 5 24

25 Probabilities of False Alarm (P FA ) and Detection (P D ) for Simple Hypotheses 0.2 PDF under H 0 PDF under H 1 Probability Density Function P FA P D τ Test Statistic Comments: Λ(x) { }} { P FA = P [ test } statistic {{ > τ} θ = θ 0 ] X X 1 P D = P [test statistic > τ } {{ } X X 0 θ = θ 1 ]. (i) As the region X 1 shrinks (i.e. τ ), both of the above probabilities shrink towards zero. EE 527, Detection and Estimation Theory, # 5 25

26 (ii) As the region X 1 grows (i.e. τ 0), both of these probabilities grow towards unity. (iii) Observations (i) and (ii) do not imply equality between P FA and P D ; in most cases, as R 1 grows, P D grows more rapidly than P FA (i.e. we better be right more often than we are wrong). (iv) However, the perfect case where our rule is always right and never wrong (P D = 1 and P FA = 0) cannot occur when the conditional pdfs/pmfs p(x θ 0 ) and p(x θ 1 ) overlap. (v) Thus, to increase the detection probability P D, we must also allow for the false-alarm probability P FA to increase. This behavior represents the fundamental tradeoff in hypothesis testing and detection theory and motivates us to introduce a (classical) approach to testing simple hypotheses, pioneered by Neyman and Pearson (to be discussed next). EE 527, Detection and Estimation Theory, # 5 26

27 Neyman-Pearson Test for Simple Hypotheses Setup: Parametric data models p(x ; θ 0 ), p(x ; θ 1 ), Simple hypothesis testing: H 0 : θ = θ 0 versus H 1 : θ = θ 1. No prior pdf/pmf on θ is available. Goal: Design a test that maximizes the probability of detection P D = P [X X 1 ; θ = θ 0 ] (equivalently, minimizes the miss probability P M ) under the constraint P FA = P [X X 1 ; θ = θ 0 ] = α α. Here, we consider simple hypotheses; classical version of testing composite hypotheses is much more complicated. The Bayesian version of testing composite hypotheses is trivial (as EE 527, Detection and Estimation Theory, # 5 27

28 is everything else Bayesian, at least conceptually) and we have already seen it. Solution. We apply the Lagrange-multiplier approach: maximize L = P D + λ (P FA α ) [ ] = p(x ; θ 1 ) dx + λ p(x; θ 0 ) dx α X 1 X 1 = [p(x ; θ 1 ) λ p(x ; θ 0 )] dx λ α. X 1 To maximize L, set X 1 = {x : p(x ; θ 1 ) λ p(x ; θ 0 ) > 0} = { x : p(x ; θ 1) p(x ; θ 0 ) > λ }. Again, we find the likelihood ratio: Recall our constraint: Λ(x) = p(x ; θ 1) p(x ; θ 0 ). X 1 p(x ; θ 0 ) dx = P FA = α α. EE 527, Detection and Estimation Theory, # 5 28

29 If we increase λ, P FA and P D go down. Similarly, if we decrease λ, P FA and P D go up. Hence, to maximize P D, choose λ so that P FA is as big as possible under the constraint. Two useful ways for determining the threshold that achieves a specified false-alarm rate: Find λ that satisfies or, x : Λ(x)>λ p(x ; θ 0 ) dx = P FA = α expressing in terms of the pdf/pmf of Λ(x) under H 0 : λ p Λ;θ0 (l ; θ 0 ) dl = α. or, perhaps, in terms of a monotonic function of Λ(x), say T (x) = monotonic function(λ(x)). Warning: We have been implicitly assuming that P FA is a continuous function of λ. Some (not insightful) technical adjustments are needed if this is not the case. A way of handling nuisance parameters: We can utilize the integrated (marginal) likelihood ratio (16) under the Neyman- Pearson setup as well. EE 527, Detection and Estimation Theory, # 5 29

30 Chernoff-Stein Lemma for Bounding the Miss Probability in Neyman-Pearson Tests of Simple Hypotheses Recall the definition of the Kullback-Leibler (K-L) distance D(p q) from one pmf (p) to another (p): D(p q) = k p k log p k q k. The complete proof of this lemma for the discrete (pmf) case is given in Additional Reading: T.M. Cover and J.A. Thomas, Elements of Information Theory. Second ed., New York: Wiley, EE 527, Detection and Estimation Theory, # 5 30

31 Setup for the Chernoff-Stein Lemma Assume that x 1, x 2,..., x N are conditionally i.i.d. given θ. We adopt the Neyman-Pearson framework, i.e. obtain a decision threshold to achieve a fixed P FA. Let us study the asymptotic P M = 1 P D as the number of observations N gets large. To keep P FA constant as N increases, we need to make our decision threshold (γ, say) vary with N, i.e. Now, the miss probability is γ = γ N (P FA ) ) P M = P M (γ) = P M (γ N (P FA ). EE 527, Detection and Estimation Theory, # 5 31

32 Chernoff-Stein Lemma The Chernoff-Stein lemma says: lim P FA 0 lim N 1 N log P M = ( ) D p(x n θ 0 ) p(x n θ 1 ) } {{ } K-L distance for a single observation where the K-L distance between p(x n θ 0 ) and p(x n θ 1 ) ( ) D p(x n θ 0 ) p(x n θ 1 ) discrete (pmf) case = x n p(x n θ 0 ) log [ = E p(xn θ 0 ) log p(x n θ 1 ) ] p(x n θ 0 ) [ p(xn θ 1 ) ] p(x n θ 0 ) does not depend on the observation index n, since x n conditionally i.i.d. given θ. are Equivalently, we can state that [ N + P M f(n) exp N D ( p(x n θ 0 ) p(x n θ 1 ) )] as P FA 0 and N, where f(n) is a slowly-varying function compared with the exponential term (when P FA 0 and N ). EE 527, Detection and Estimation Theory, # 5 32

33 Detection for Simple Hypotheses: Example Known positive DC level in additive white Gaussian noise (AWGN), Example 3.2 in Kay-II. Consider where H 0 : x[n] = w[n], n = 1, 2,..., N versus H 1 : x[n] = A + w[n], n = 1, 2,..., N A > 0 is a known constant, w[n] is zero-mean white Gaussian noise with known variance σ 2, i.e. w[n] N (0, σ 2 ). The above hypothesis-testing formulation is EE-like: noise versus signal plus noise. It is similar to the on-off keying scheme in communications, which gives us an idea to rephrase it so that it fits our formulation on p. 4 (for which we have developed all the theory so far). Here is such an EE 527, Detection and Estimation Theory, # 5 33

34 alternative formulation: consider a family of pdfs p(x ; a) = 1 [ (2πσ2 ) exp N 1 2 σ 2 N (x[n] a) 2] (17) n=1 and the following (equivalent) hypotheses: H 0 : a = 0 (off) versus H 1 : a = A (on). Then, the likelihood ratio is Λ(x) = p(x ; a = A) p(x ; a = 0) = 1/(2πσ2 ) N/2 exp[ 1 N 2σ 2 n=1 (x[n] A)2 ] 1/(2πσ 2 ) N/2 exp( 1 N. 2σ 2 n=1 x[n]2 ) Now, take the logarithm and, after simple manipulations, reduce our likelihood-ratio test to comparing T (x) = x = 1 N N x[n] n=1 with a threshold γ. [Here, T (x) is a monotonic function of Λ(x).] If T (x) > γ accept H 1 (i.e. reject H 0 ), otherwise accept EE 527, Detection and Estimation Theory, # 5 34

35 H 0 (well, not exactly, we will talk more about this decision on p. 59). The choice of γ depends on the approach that we take. For the Bayes decision rule, γ is a function of π 0 and π 1. For the Neyman-Pearson test, γ is chosen to achieve (control) a desired P FA. Bayesian decision-theoretic detection for 0-1 loss (corresponding to minimizing the average error EE 527, Detection and Estimation Theory, # 5 35

36 probability): log Λ(x) = 1 2 σ σ 2 2 A ( N n=1 N (x[n] A) σ 2 n=1 N (x[n] x[n] } {{ } 0 n=1 ( N n=1 ) x[n] N n=1 (x[n]) 2 H ( 1 π0 ) log π 1 +A) (x[n] + x[n] A) H ( 1 π0 ) log π 1 ) x[n] A 2 N H ( 1 2 σ 2 π0 ) log π 1 AN 2 H 1 σ2 A log ( π0 π 1 ) σ2 since A > 0 and, finally, x H 1 N A log(π 0/π 1 ) + A } {{ 2} γ which, for the practically most interesting case of equiprobable hypotheses π 0 = π 1 = 1 2 (18) simplifies to x H 1 A 2 }{{} γ known as the maximum-likelihood test (i.e. the Bayes decision rule for 0-1 loss and a priori equiprobable hypotheses is defined as the maximum-likelihood test). This maximum-likelihood detector does not require the knowledge of the noise variance EE 527, Detection and Estimation Theory, # 5 36

37 σ 2 to declare its decision. However, the knowledge of σ 2 is key to assessing the detection performance. Interestingly, these observations will carry over to a few maximum-likelihood tests that we will derive in the future. Assuming (18), we now derive the minimum average error probability. First, note that X a = 0 N (0, σ 2 /N) and X a = A N (A, σ 2 /N). Then min av. error prob. = 1 2 P [X > A } {{ 2 a = 0] } +1 P FA = 1 2 P [ P [ X σ2 /N } {{ } standard normal random var. X A σ2 /N } {{ } standard normal random var. = 1 2 ( N A Q ) } {{ 4 σ 2 } standard normal complementary cdf > A/2 σ2 /N ; a = 0 ] < A/2 A σ2 /N ; a = A ] ( N A Φ 2) } {{ 4 σ 2 } standard normal cdf 2 P [X < A } {{ 2 a = A] } P M ( = Q 1 2 N A 2 σ 2 ) EE 527, Detection and Estimation Theory, # 5 37

38 since Φ( x) = Q(x). The minimum probability of error decreases monotonically with N A 2 /σ 2, which is known as the deflection coefficient. In this case, the Chernoff information is equal to A2 (see Lemma 1): 8 σ 2 { min log λ [0,1] [ λ (1 λ) = min λ [0,1] 2 } [N (0, σ 2 /N)] } {{ } λ [N (A, σ 2 /N)] } {{ } 1 λ dx p 0 (x) p 1 (x) A2 σ 2 ] = A2 8 σ 2 and, therefore, we expect the following asymptotic behavior: min av. error probability N + ) f(n) exp ( N A2 8 σ 2 (19) where f(n) varies slowly with N when N is large, compared with the exponential term: lim N log f(n) N = 0. Indeed, in our example, ( N A Q 2 ) N + 4 σ 2 1 N A 2 2π 4 σ } 2 {{ } f(n) ( exp N ) A2 8 σ 2 EE 527, Detection and Estimation Theory, # 5 38

39 where we have used the asymptotic formula Q(x) x + 1 x 2π exp( x2 /2) (20) given e.g. in equation (2.4) of Kay-II. EE 527, Detection and Estimation Theory, # 5 39

40 Known DC Level in AWGN: Neyman-Pearson Approach We now derive the Neyman-Pearson test for detecting a known DC level in AWGN. Recall that our likelihood-ratio test compares with a threshold γ. T (x) = x = 1 N N 1 n=0 x[n] If T (x) > γ, decide H 1 (i.e. reject H 0 ), otherwise decide H 0 (see also the discussion on p. 59). Performance evaluation: Assuming (17), we have T (x) a N (a, σ 2 /N). Therefore, T (X) a = 0 N (0, σ 2 /N), implying ( γ ) P FA = P [T (X) > γ ; a = 0] = Q σ2 /N EE 527, Detection and Estimation Theory, # 5 40

41 and we obtain the decision threshold as follows: γ = σ 2 N Q 1 (P FA ). Now, T (X) a = A N (A, σ 2 /N), implying ( γ A ) P D = P (T (X) > γ a = A) = Q σ2 /N ( ) = Q Q 1 A (P FA ) 2 σ 2 /N ( ) = Q Q 1 (P FA ) NA2 /σ } {{ } 2. = SNR = d 2 Given the false-alarm probability P FA, the detection probability P D depends only on the deflection coefficient: d 2 = NA2 σ 2 = {E [T (X) a = A] E [T (X) a = 0]}2 var[t (X a = 0)] which is also (a reasonable definition for) the signal-to-noise ratio (SNR). EE 527, Detection and Estimation Theory, # 5 41

42 Receiver Operating Characteristics (ROC) Comments: P D = Q ( Q 1 (P FA ) d ). As we raise the threshold γ, P FA goes down but so does P D. ROC should be above the 45 line otherwise we can do better by flipping a coin. Performance improves with d 2. EE 527, Detection and Estimation Theory, # 5 42

43 Typical Ways of Depicting the Detection Performance Under the Neyman-Pearson Setup To analyze the performance of a Neyman-Pearson detector, we examine two relationships: Between P D and P FA, for a given SNR, called the operating characteristics (ROC). receiver Between P D and SNR, for a given P FA. Here are examples of the two: see Figs. 3.8 and 3.5 in Kay-II, respectively. EE 527, Detection and Estimation Theory, # 5 43

44 Asymptotic (as N and P FA 0) P D and P M for a Known DC Level in AWGN We apply the Chernoff-Stein lemma, for which we need to compute the following K-L distance: log where ( ) D p(x n a = 0) p(x n a = A) { = E p(xn a=0) log [ p(xn a = A) ]} p(x n a = 0) p(x n a = 0) = N (0, σ 2 ) [ p(xn a = A) ] = (x n A) 2 p(x n a = 0) 2 σ 2 + x2 n 2 σ 2 = 1 2σ 2 (2 A x n A 2 ). Therefore, ( ) D p(x n θ 0 ) p(x n θ 1 ) = A2 2 σ 2 and the Chernoff-Stein lemma predicts the following behavior of the detection probability as N and P FA 0: ( ) P D 1 f(n) exp N A2 } {{ 2 σ 2 } P M EE 527, Detection and Estimation Theory, # 5 44

45 where f(n) is a slowly-varying function of N compared with the exponential term. In this case, the exact expression for P M (P D ) is available and consistent with the Chernoff-Stein lemma: P M = 1 Q (Q 1 (P FA ) ) NA 2 /σ 2 ) = Q( NA2 /σ 2 Q 1 (P FA ) N + = = 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [ } NA 2 /σ 2 Q 1 (P FA )] 2 /2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp NA 2 /(2σ 2 ) [Q 1 (P FA )] 2 /2 +Q 1 (P FA ) } NA 2 /σ 2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [Q 1 (P FA )] 2 /2 + Q 1 (P FA ) } NA 2 /σ 2 exp[ NA 2 /(2σ 2 )] } {{ } as predicted by the Chernoff-Stein lemma (21) where we have used the asymptotic formula (20). When P FA EE 527, Detection and Estimation Theory, # 5 45

46 is small and N is large, the first two (green) terms in the above expression make a slowly-varying function f(n) of N. Note that the exponential term in (21) does not depend on the false-alarm probability P FA (or, equivalently, on the choice of the decision threshold). The Chernoff-Stein lemma asserts that this is not a coincidence. Comment. For detecting a known DC level in AWGN: The slope of the exponential-decay term of P M is A 2 /(2σ 2 ) which is different from (larger than, in this case) the slope of the exponential decay of the minimum average error probability: A 2 /(8σ 2 ). EE 527, Detection and Estimation Theory, # 5 46

47 Decentralized Neyman-Pearson Detection for Simple Hypotheses Consider a sensor-network scenario depicted by Assumption: Observations made at N spatially distributed sensors (nodes) follow the same marginal probabilistic model: H i : x n p(x n θ i ) where n = 1, 2,..., N and i {0, 1} for binary hypotheses. Each node n makes a hard local decision d n based on its local observation x n and sends it to the headquarters (fusion center), which collects all the local decisions and makes the final global EE 527, Detection and Estimation Theory, # 5 47

48 decision about H 0 or H 1. This structure is clearly suboptimal it is easy to construct a better decision strategy in which each node sends its (quantized, in practice) likelihood ratio to the fusion center, rather than the decision only. However, such a strategy would have a higher communication (energy) cost. We now go back to the decentralized detection problem. Suppose that each node n makes a local decision d n {0, 1}, n = 1, 2,..., N and transmits it to the fusion center. Then, the fusion center makes the global decision based on the likelihood ratio formed from the d n s. The simplest fusion scheme is based on the assumption that the d n s are conditionally independent given θ (which may not always be reasonable, but leads to an easy solution). We can now write p(d n θ 1 ) = P d n D,n (1 P D,n ) 1 d n } {{ } Bernoulli pmf where P D,n Similarly, is the nth sensor s local detection probability. p(d n θ 0 ) = P d n FA,n (1 P FA,n ) 1 d n } {{ } Bernoulli pmf where P FA,n is the nth sensor s local detection false-alarm EE 527, Detection and Estimation Theory, # 5 48

49 probability. Now, log Λ(d) = = N log n=1 N log n=1 [ p(dn θ 1 ) ] p(d n θ 0 [ P d n D,n (1 P D,n ) 1 d n P d n FA,n (1 P FA,n ) 1 d n ] H1 log τ. To further simplify the exposition, we assume that all sensors have identical performance: P D,n = P D, P FA,n = P FA. Define the number of sensors having d n = 1: u 1 = N n=1 d n Then, the log-likelihood ratio becomes log Λ(d) = u 1 log ( PD P FA ) + (N u 1 ) log ( 1 PD 1 P FA ) H1 log τ or u 1 log [ PD (1 P FA ) ] H1 ( 1 PFA ) log τ + N log. (22) P FA (1 P D ) 1 P D EE 527, Detection and Estimation Theory, # 5 49

50 Clearly, each node s local decision d n P D > P FA, which implies is meaningful only if P D (1 P FA ) P FA (1 P D ) > 1 the logarithm of which is therefore positive, and the decision rule (22) further simplifies to u 1 H 1 τ. The Neyman-Person performance analysis of this detector is easy: the random variable U 1 is binomial given θ (i.e. conditional on the hypothesis) and, therefore, P [U 1 = u 1 ] = ( ) N p u 1 (1 p) N u 1 u 1 where p = P FA under H 0 and p = P D under H 1. Hence, the global false-alarm probability is P FA,global = P [U 1 > τ θ 0 ] = N u 1 = τ ( N u 1 ) P u 1 FA (1 P FA ) N u 1. Clearly, P FA,global will be a discontinuous function of τ and therefore, we should choose our P FA,global specification from EE 527, Detection and Estimation Theory, # 5 50

51 the available discrete choices. But, if none of the candidate choices is acceptable, this means that our current system does not satisfy the requirements and a remedial action is needed, e.g. increasing the quantity (N) or improving the quality of sensors (through changing P D and P FA ), or both. EE 527, Detection and Estimation Theory, # 5 51

52 Testing Multiple Hypotheses Suppose now that we choose Θ 0, Θ 1,..., Θ M 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 Θ M 1 = Θ, Θ i Θ j = i j. We wish to distinguish among M > 2 hypotheses, i.e. identify which hypothesis is true: H 0 : θ Θ 0 versus H 1 : θ Θ 1 versus. versus H M 1 : θ Θ M 1 and, consequently, our action space consists of M choices. We design a decision rule φ : X (0, 1,..., M 1): φ(x) = 0, decide H 0, 1,. decide H 1, M 1, decide H M 1 where φ partitions the data space X [i.e. the support of p(x θ)] into M regions: Rule φ: X 0 = {x : φ(x) = 0},..., X M 1 = {x : φ(x) = M 1}. EE 527, Detection and Estimation Theory, # 5 52

53 We specify the loss function using L(i m), where, typically, the losses due to correct decisions are set to zero: L(i i) = 0, i = 0, 1,..., M 1. Here, we adopt zero losses for correct decisions. posterior expected loss takes M values: Now, our ρ m (x) = = M 1 i=0 M 1 i=0 Θ i L(m i) p(θ x) dθ L(m i) Θ i p(θ x) dθ, m = 0, 1,..., M 1. Then, the Bayes decision rule φ is defined via the following data-space partitioning: X m = {x : ρ m (x) = min ρ l(x)}, m = 0, 1,..., M 1 0 l M 1 or, equivalently, upon applying the Bayes rule X m = = { M 1 } x : m = arg min L(l i) p(x θ) π(θ) dθ 0 l M 1 i=0 Θ i { M 1 x : m = arg min L(l i) 0 l M 1 i=0 Θ i p(x θ) π(θ) dθ }. EE 527, Detection and Estimation Theory, # 5 53

54 The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = = M 1 m=0 M 1 m=0 M 1 i=0 X m X m Θ i L(m i) p(x θ)π(θ) dθ dx M 1 L(m i) p(x θ)π(θ) dθ i=0 Θ } {{ i } h m (θ) dx. Then, for an arbitrary h m (x), [ M 1 m=0 ] h m (x) dx X m [ M 1 m=0 X m ] h m (x) dx 0 which verifies that the Bayes decision rule φ minimizes the preposterior (Bayes) risk. EE 527, Detection and Estimation Theory, # 5 54

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched

More information

The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review. Pierre Granjon The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

More information

Lecture 8: Signal Detection and Noise Assumption

Lecture 8: Signal Detection and Noise Assumption ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341 Multinode Cooperative Communications in Wireless Networks Ahmed K. Sadek, Student Member, IEEE, Weifeng Su, Member, IEEE, and K.

More information

Lecture 9: Bayesian hypothesis testing

Lecture 9: Bayesian hypothesis testing Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

33. STATISTICS. 33. Statistics 1

33. STATISTICS. 33. Statistics 1 33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

More information

Introduction to Probability

Introduction to Probability Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence

More information

[Chapter 10. Hypothesis Testing]

[Chapter 10. Hypothesis Testing] [Chapter 10. Hypothesis Testing] 10.1 Introduction 10.2 Elements of a Statistical Test 10.3 Common Large-Sample Tests 10.4 Calculating Type II Error Probabilities and Finding the Sample Size for Z Tests

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test. Neyman-Pearson lemma 9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

More information

UNIVERSITY OF MINNESOTA. Alejandro Ribeiro

UNIVERSITY OF MINNESOTA. Alejandro Ribeiro UNIVERSITY OF MINNESOTA This is to certify that I have examined this copy of a Master s thesis by Alejandro Ribeiro and have found that it is complete and satisfactory in all respects, and that any and

More information

Optimal Stopping in Software Testing

Optimal Stopping in Software Testing Optimal Stopping in Software Testing Nilgun Morali, 1 Refik Soyer 2 1 Department of Statistics, Dokuz Eylal Universitesi, Turkey 2 Department of Management Science, The George Washington University, 2115

More information

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i. Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Bayesian probability theory

Bayesian probability theory Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

A New Interpretation of Information Rate

A New Interpretation of Information Rate A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes

More information

CPC/CPA Hybrid Bidding in a Second Price Auction

CPC/CPA Hybrid Bidding in a Second Price Auction CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09-074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks Jia-Qi Jin, Tracey Ho California Institute of Technology Pasadena, CA Email: {jin,tho}@caltech.edu Harish Viswanathan

More information

Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING

Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING C H A P T E R 4 Signal Detection 4. SIGNAL DETECTION AS HYPOTHESIS TESTING In Chapter 3 we considered hypothesis testing in the context of random variables. The detector resulting in the minimum probability

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Sampling and Hypothesis Testing

Sampling and Hypothesis Testing Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

More information

Basics of Floating-Point Quantization

Basics of Floating-Point Quantization Chapter 2 Basics of Floating-Point Quantization Representation of physical quantities in terms of floating-point numbers allows one to cover a very wide dynamic range with a relatively small number of

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information Finance 400 A. Penati - G. Pennacchi Notes on On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information by Sanford Grossman This model shows how the heterogeneous information

More information

Recursive Estimation

Recursive Estimation Recursive Estimation Raffaello D Andrea Spring 04 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March 8, 05 Notes: Notation: Unlessotherwisenoted,x, y,andz denoterandomvariables, f x

More information

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference 0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

More information

Distributed Detection Systems. Hamidreza Ahmadi

Distributed Detection Systems. Hamidreza Ahmadi Channel Estimation Error in Distributed Detection Systems Hamidreza Ahmadi Outline Detection Theory Neyman-Pearson Method Classical l Distributed ib t Detection ti Fusion and local sensor rules Channel

More information

LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

More information

A direct approach to false discovery rates

A direct approach to false discovery rates J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

More information

ALOHA Performs Delay-Optimum Power Control

ALOHA Performs Delay-Optimum Power Control ALOHA Performs Delay-Optimum Power Control Xinchen Zhang and Martin Haenggi Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556, USA {xzhang7,mhaenggi}@nd.edu Abstract As

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

A Statistical Framework for Operational Infrasound Monitoring

A Statistical Framework for Operational Infrasound Monitoring A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,

More information

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize

More information

Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

More information

Radar Systems Engineering Lecture 6 Detection of Signals in Noise

Radar Systems Engineering Lecture 6 Detection of Signals in Noise Radar Systems Engineering Lecture 6 Detection of Signals in Noise Dr. Robert M. O Donnell Guest Lecturer Radar Systems Course 1 Detection 1/1/010 Block Diagram of Radar System Target Radar Cross Section

More information

Three-Stage Phase II Clinical Trials

Three-Stage Phase II Clinical Trials Chapter 130 Three-Stage Phase II Clinical Trials Introduction Phase II clinical trials determine whether a drug or regimen has sufficient activity against disease to warrant more extensive study and development.

More information

THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM TORONTO THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

More information

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3

Stat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3 Stat26: Bayesian Modeling and Inference Lecture Date: February 1, 21 Lecture 3 Lecturer: Michael I. Jordan Scribe: Joshua G. Schraiber 1 Decision theory Recall that decision theory provides a quantification

More information

Optimal Performance for Detection Systems in Wireless Passive Sensor Networks

Optimal Performance for Detection Systems in Wireless Passive Sensor Networks Optimal Performance for Detection Systems in Wireless Passive Sensor Networks Ashraf Tantawy, Xenofon Koutsoukos, and Gautam Biswas Institute for Software Integrated Systems (ISIS) Department of Electrical

More information

Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

More information

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i ) Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 Principal-Agent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically

More information

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

More information

On Decentralized Detection with Partial Information Sharing among Sensors

On Decentralized Detection with Partial Information Sharing among Sensors On Decentralized Detection with Partial Information Sharing among Sensors The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

Uncertainty quantification for the family-wise error rate in multivariate copula models

Uncertainty quantification for the family-wise error rate in multivariate copula models Uncertainty quantification for the family-wise error rate in multivariate copula models Thorsten Dickhaus (joint work with Taras Bodnar, Jakob Gierl and Jens Stange) University of Bremen Institute for

More information

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Recent Advances in Electrical Engineering and Electronic Devices Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network Ahmed El-Mahdy and Ahmed Walid Faculty of Information Engineering

More information

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach J. R. Statist. Soc. B (2004) 66, Part 1, pp. 187 205 Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach John D. Storey,

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

The Bonferonni and Šidák Corrections for Multiple Comparisons

The Bonferonni and Šidák Corrections for Multiple Comparisons The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Practice problems for Homework 11 - Point Estimation

Practice problems for Homework 11 - Point Estimation Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:

More information

Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

More information

Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab

Tutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational

More information

MIMO CHANNEL CAPACITY

MIMO CHANNEL CAPACITY MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use

More information

Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

More information

arxiv:1112.0829v1 [math.pr] 5 Dec 2011

arxiv:1112.0829v1 [math.pr] 5 Dec 2011 How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

More information

How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

IN most approaches to binary classification, classifiers are

IN most approaches to binary classification, classifiers are 3806 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 11, NOVEMBER 2005 A Neyman Pearson Approach to Statistical Learning Clayton Scott, Member, IEEE, Robert Nowak, Senior Member, IEEE Abstract The

More information

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Greedy Link Scheduler for Wireless Networks With Gaussian Multiple-Access and Broadcast Channels Arun Sridharan, Student Member, IEEE, C Emre Koksal, Member, IEEE,

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Exercises with solutions (1)

Exercises with solutions (1) Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

More information

13.3 Inference Using Full Joint Distribution

13.3 Inference Using Full Joint Distribution 191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent

More information

Statistics in Astronomy

Statistics in Astronomy Statistics in Astronomy Initial question: How do you maximize the information you get from your data? Statistics is an art as well as a science, so it s important to use it as you would any other tool:

More information

OPRE 6201 : 2. Simplex Method

OPRE 6201 : 2. Simplex Method OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2

More information

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11 CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

Sums of Independent Random Variables

Sums of Independent Random Variables Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables

More information

Random variables, probability distributions, binomial random variable

Random variables, probability distributions, binomial random variable Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

More information