Introduction to Detection Theory

Transcription

1 Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see Ch. 10 in Wasserman. EE 527, Detection and Estimation Theory, # 5 1

2 Introduction to Detection Theory (cont.) We wish to make a decision on a signal of interest using noisy measurements. Statistical tools enable systematic solutions and optimal design. Application areas include: Communications, Radar and sonar, Nondestructive evaluation (NDE) of materials, Biomedicine, etc. EE 527, Detection and Estimation Theory, # 5 2

3 Example: Radar Detection. We wish to decide on the presence or absence of a target. EE 527, Detection and Estimation Theory, # 5 3

4 Introduction to Detection Theory We assume a parametric measurement model p(x θ) [or p(x ; θ), which is the notation that we sometimes use in the classical setting]. In point estimation theory, we estimated the parameter θ Θ given the data x. Suppose now that we choose Θ 0 and Θ 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 = Θ, Θ 0 Θ 1 =. In detection theory, we wish to identify which hypothesis is true (i.e. make the appropriate decision): H 0 : θ Θ 0, null hypothesis H 1 : θ Θ 1, alternative hypothesis. Terminology: If θ can only take two values, Θ = {θ 0, θ 1 }, Θ 0 = {θ 0 }, Θ 1 = {θ 1 } we say that the hypotheses are simple. Otherwise, we say that they are composite. Composite Hypothesis Example: H 0 : θ = 0 versus H 1 : θ (0, ). EE 527, Detection and Estimation Theory, # 5 4

5 The Decision Rule We wish to design a decision rule (function) φ(x) : X (0, 1): φ(x) = { 1, decide H1, 0, decide H 0. which partitions the data space X [i.e. the support of p(x θ)] into two regions: Rule φ(x): X 0 = {x : φ(x) = 0}, X 1 = {x : φ(x) = 1}. Let us define probabilities of false alarm and miss: P FA = E x θ [φ(x) θ] = p(x θ) dx for θ in Θ 0 X 1 P M = E x θ [1 φ(x) θ] = 1 p(x θ) dx X 1 = p(x θ) dx for θ in Θ 1. X 0 Then, the probability of detection (correctly deciding H 1 ) is P D = 1 P M = E x θ [φ(x) θ] = X 1 p(x θ) dx for θ in Θ 1. EE 527, Detection and Estimation Theory, # 5 5

6 Note: P FA and P D /P M are generally functions of the parameter θ (where θ Θ 0 when computing P FA and θ Θ 1 when computing P D /P M ). More Terminology. Statisticians use the following terminology: False alarm Type I error Miss Type II error Probability of detection Power Probability of false alarm Significance level. EE 527, Detection and Estimation Theory, # 5 6

7 Bayesian Decision-theoretic Detection Theory Recall (a slightly generalized version of) the posterior expected loss: ρ(action x) = L(θ, action) p(θ x) dθ Θ that we introduced in handout # 4 when we discussed Bayesian decision theory. Let us now apply this theory to our easy example discussed here: hypothesis testing, where our action space consists of only two choices. We first assign a loss table: decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) x X 0 L(0 1) L(0 0) = 0 with the loss function described by the quantities L(declared true): L(1 0) quantifies loss due to a false alarm, L(0 1) quantifies loss due to a miss, L(1 1) and L(0 0) (losses due to correct decisions) typically set to zero in real life. Here, we adopt zero losses for correct decisions. EE 527, Detection and Estimation Theory, # 5 7

8 Now, our posterior expected loss takes two values: ρ 0 (x) = L(0 1) p(θ x) dθ Θ 1 + L(0 0) p(θ x) dθ } {{ } Θ 0 0 = L(0 1) p(θ x) dθ Θ 1 L(0 1) is constant = L(0 1) p(θ x) dθ Θ } 1 {{ } P [θ Θ 1 x] and, similarly, ρ 1 (x) = L(1 0) p(θ x) dθ Θ 0 L(1 0) is constant = L(1 0) p(θ x) dθ. Θ } 0 {{ } P [θ Θ 0 x] We define the Bayes decision rule as the rule that minimizes the posterior expected loss; this rule corresponds to choosing the data-space partitioning as follows: X 1 = {x : ρ 1 (x) ρ 0 (x)} EE 527, Detection and Estimation Theory, # 5 8

9 or P [θ Θ 1 x] { }} { { p(θ x) dθ } Θ X 1 = x : 1 L(1 0) L(0 1) p(θ x) dθ Θ } 0 {{ } P [θ Θ 0 x] or, equivalently, upon applying the Bayes rule: X 1 = { x : Θ 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 0-1 loss: For L(1 0) = L(0 1) = 1, we have (1) } L(1 0) L(0 1). (2) decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) = 1 x X 0 L(0 1) = 1 L(0 0) = 0 yielding the following Bayes decision rule, called the maximum a posteriori (MAP) rule: X 1 = { x : P [θ Θ 1 x] P [θ Θ 0 x] 1 } or, equivalently, upon applying the Bayes rule: { } Θ X 1 = x : 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 1. (4) (3) EE 527, Detection and Estimation Theory, # 5 9

10 Simple hypotheses. Let us specialize (1) to the case of simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): X 1 = { x : p(θ 1 x) p(θ 0 x) } {{ } posterior-odds ratio } L(1 0) L(0 1). (5) We can rewrite (5) using the Bayes rule: X 1 = { x : p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio } π 0 L(1 0) π 1 L(0 1) (6) where π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 describe the prior probability mass function (pmf) of the binary random variable θ (recall that θ {θ 0, θ 1 }). Hence, for binary simple hypotheses, the prior pmf of θ is the Bernoulli pmf. EE 527, Detection and Estimation Theory, # 5 10

11 Preposterior (Bayes) Risk The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 L(1 0) p(x θ)π(θ) dθ dx Θ 0 Θ 1 L(0 1) p(x θ)π(θ) dθ dx. How do we choose the rule φ(x) that minimizes the preposterior EE 527, Detection and Estimation Theory, # 5 11

12 risk? L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 = L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 = const } {{ } not dependent on φ(x) { + L(1 0) p(x θ)π(θ) dθ X 1 Θ 0 } L(0 1) p(x θ)π(θ) dθ dx Θ 1 implying that X 1 should be chosen as { } X 1 : L(1 0) p(x θ)π(θ) dθ L(0 1) p(x θ)π(θ) < 0 Θ 0 Θ 1 EE 527, Detection and Estimation Theory, # 5 12

13 which, as expected, is the same as (2), since minimizing the posterior expected loss minimizing the preposterior risk for every x as showed earlier in handout # loss: For the 0-1 loss, i.e. L(1 0) = L(0 1) = 1, the preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx (7) which is simply the average error probability, with averaging performed over the joint probability density or mass function (pdf/pmf) or the data x and parameters θ. EE 527, Detection and Estimation Theory, # 5 13

14 Bayesian Decision-theoretic Detection for Simple Hypotheses The Bayes decision rule for simple hypotheses is (6): Λ(x) } {{ } likelihood ratio = p(x θ 1) p(x θ 0 ) H 1 π 0 L(1 0) π 1 L(0 1) τ (8) see also Ch. 3.7 in Kay-II. (Recall that Λ(x) is the sufficient statistic for the detection problem, see p. 37 in handout # 1.) Equivalently, log Λ(x) = log[p(x θ 1 )] log[p(x θ 0 )] H 1 log τ τ. Minimum Average Error Probability Detection: In the familiar 0-1 loss case where L(1 0) = L(0 1) = 1, we know that the preposterior (Bayes) risk is equal to the average error probability, see (7). This average error probability greatly EE 527, Detection and Estimation Theory, # 5 14

15 simplifies in the simple hypothesis testing case: av. error probability = L(1 0) p(x θ } {{ } 0 )π 0 dx X L(0 1) p(x θ } {{ } 1 )π 1 dx X 0 1 = π 0 p(x θ 0 ) dx +π 1 p(x θ 1 ) dx X } 1 X {{ } } 0 {{ } P FA P M where, as before, the averaging is performed over the joint pdf/pmf of the data x and parameters θ, and π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 (the Bernoulli pmf). In this case, our Bayes decision rule simplifies to the MAP rule (as expected, see (5) and Ch. 3.6 in Kay-II): p(θ 1 x) p(θ 0 x) } {{ } posterior-odds ratio or, equivalently, upon applying the Bayes rule: p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio H 1 1 (9) H 1 π 0 π 1. (10) EE 527, Detection and Estimation Theory, # 5 15

16 which is the same as (4), upon substituting the Bernoulli pmf as the prior pmf for θ and (8), upon substituting L(1 0) = L(0 1) = 1. EE 527, Detection and Estimation Theory, # 5 16

17 Bayesian Decision-theoretic Detection Theory: Handling Nuisance Parameters We apply the same approach as before integrate the nuisance parameters (ϕ, say) out! Therefore, (1) still holds for testing H 0 : θ Θ 0 versus H 1 : θ Θ 1 but p θ x (θ x) is computed as follows: p θ x (θ x) = p θ,ϕ x (θ, ϕ x) dϕ and, therefore, Θ 1 Θ 0 p(θ x) { }} { p θ,ϕ x (θ, ϕ x) dϕ dθ p θ,ϕ x (θ, ϕ x) dϕ } {{ } p(θ x) dθ H 1 L(1 0) L(0 1) (11) EE 527, Detection and Estimation Theory, # 5 17

18 or, equivalently, upon applying the Bayes rule: Θ 1 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ Θ 0 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ H 1 L(1 0) L(0 1). (12) Now, if θ and ϕ are independent a priori, i.e. π θ,ϕ (θ, ϕ) = π θ (θ) π ϕ (ϕ) (13) then (12) can be rewritten as Θ 1 π θ (θ) Θ 0 π θ (θ) p(x θ) { }} { p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) dϕ dθ p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) } {{ } p(x θ) dϕ dθ H 1 L(1 0) L(0 1). (14) Simple hypotheses and independent priors for θ and ϕ: Let us specialize (11) to the simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): pθ,ϕ x (θ 1, ϕ x) dϕ pθ,ϕ x (θ 0, ϕ x) dϕ H 1 L(1 0) L(0 1). (15) Now, if θ and ϕ are independent a priori, i.e. (13) holds, then EE 527, Detection and Estimation Theory, # 5 18

19 we can rewrite (14) [or (15) using the Bayes rule]: px θ,ϕ (x θ 1, ϕ) π ϕ (ϕ) dϕ px θ,ϕ (x θ 0, ϕ) π ϕ (ϕ) dϕ } {{ } integrated likelihood ratio = same as (6) { }} { p(x θ 1 ) H 1 π 0 L(1 0) p(x θ 0 ) π 1 L(0 1) (16) where π 0 = π θ (θ 0 ), π 1 = π θ (θ 1 ) = 1 π 0. EE 527, Detection and Estimation Theory, # 5 19

20 Chernoff Bound on Average Error Probability for Simple Hypotheses Recall that minimizing the average error probability av. error probability = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx leads to the MAP decision rule: X 1 = { x : Θ 0 p(x θ) π(θ) dθ } p(x θ) π(θ) dθ < 0. Θ 1 In many applications, we may not be able to obtain a simple closed-form expression for the minimum average error EE 527, Detection and Estimation Theory, # 5 20

21 probability, but we can bound it as follows: min av. error probability = X1 p(x θ)π(θ) dθ dx Θ 0 + p(x θ)π(θ) dθ dx Θ 1 X 0 using the def. of X 1 = X }{{} data space { } min p(x θ)π(θ) dθ, p(x θ)π(θ) dθ) dx Θ 0 Θ 1 = X X [ = q 0 (x) = q { }} { 1 (x) ] λ [ { }} { ] 1 λ p(x θ)π(θ) dθ p(x θ)π(θ) dθ dx Θ 0 Θ 1 [q 0 (x)] λ [q 1 (x)] 1 λ dx which is the Chernoff bound on the minimum average error probability. Here, we have used the fact that min{a, b} a λ b 1 λ, for 0 λ 1, a, b 0. When x = x 1 x 2. x N EE 527, Detection and Estimation Theory, # 5 21

22 with x 1, x 2,... x N conditionally independent, identically distributed (i.i.d.) given θ and simple hypotheses (i.e. Θ 0 = {θ 0 }, Θ 1 = {θ 1 }) then N q 0 (x) = p(x θ 0 ) π 0 = π 0 p(x n θ 0 ) yielding n=1 N q 1 (x) = p(x θ 1 ) π 1 = π 1 p(x n θ 1 ) n=1 Chernoff bound for N conditionally i.i.d. measurements (given θ) and simple hyp. [ N ] λ N 1 λ = π 0 p(x n θ 0 ) [π 1 p(x n θ 1 )] dx n=1 = π λ 0 π 1 λ 1 = π λ 0 π 1 λ 1 N n=1 n=1 { [p(x n θ 0 )] λ [p(x n θ 1 )] 1 λ dx n } { [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1 } N EE 527, Detection and Estimation Theory, # 5 22

23 or, in other words, 1 N log(min av. error probability) log(πλ 0 π1 1 λ ) + log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1, λ [0, 1]. If π 0 = π 1 = 1/2 (which is almost always the case of interest when evaluating average error probabilities), we can say that, as N, min av. error probability for N cond. i.i.d. measurements (given θ) and simple hypotheses ( { f(n) exp N min log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ} ) λ [0,1] } {{ } Chernoff information for a single observation where f(n) is a slowly-varying function compared with the exponential term: lim N log f(n) N = 0. Note that the Chernoff information in the exponent term of the above expression quantifies the asymptotic behavior of the minimum average error probability. We now give a useful result, taken from EE 527, Detection and Estimation Theory, # 5 23

24 K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., San Diego, CA: Academic Press, 1990 for evaluating a class of Chernoff bounds. Lemma 1. Consider p 1 (x) = N (µ 1, Σ 1 ) and p 2 (x) = N (µ 2, Σ 2 ). Then [p 1 (x)] λ [p 2 (x)] 1 λ dx = exp[ g(λ)] where g(λ) = λ (1 λ) 2 (µ 2 µ 1 ) T [λ Σ 1 + (1 λ) Σ 2 ] 1 (µ 2 µ 1 ) [ λ log Σ1 + (1 λ) Σ 2 ] Σ 1 λ Σ 2 1 λ. EE 527, Detection and Estimation Theory, # 5 24

25 Probabilities of False Alarm (P FA ) and Detection (P D ) for Simple Hypotheses 0.2 PDF under H 0 PDF under H 1 Probability Density Function P FA P D τ Test Statistic Comments: Λ(x) { }} { P FA = P [ test } statistic {{ > τ} θ = θ 0 ] X X 1 P D = P [test statistic > τ } {{ } X X 0 θ = θ 1 ]. (i) As the region X 1 shrinks (i.e. τ ), both of the above probabilities shrink towards zero. EE 527, Detection and Estimation Theory, # 5 25

26 (ii) As the region X 1 grows (i.e. τ 0), both of these probabilities grow towards unity. (iii) Observations (i) and (ii) do not imply equality between P FA and P D ; in most cases, as R 1 grows, P D grows more rapidly than P FA (i.e. we better be right more often than we are wrong). (iv) However, the perfect case where our rule is always right and never wrong (P D = 1 and P FA = 0) cannot occur when the conditional pdfs/pmfs p(x θ 0 ) and p(x θ 1 ) overlap. (v) Thus, to increase the detection probability P D, we must also allow for the false-alarm probability P FA to increase. This behavior represents the fundamental tradeoff in hypothesis testing and detection theory and motivates us to introduce a (classical) approach to testing simple hypotheses, pioneered by Neyman and Pearson (to be discussed next). EE 527, Detection and Estimation Theory, # 5 26

27 Neyman-Pearson Test for Simple Hypotheses Setup: Parametric data models p(x ; θ 0 ), p(x ; θ 1 ), Simple hypothesis testing: H 0 : θ = θ 0 versus H 1 : θ = θ 1. No prior pdf/pmf on θ is available. Goal: Design a test that maximizes the probability of detection P D = P [X X 1 ; θ = θ 0 ] (equivalently, minimizes the miss probability P M ) under the constraint P FA = P [X X 1 ; θ = θ 0 ] = α α. Here, we consider simple hypotheses; classical version of testing composite hypotheses is much more complicated. The Bayesian version of testing composite hypotheses is trivial (as EE 527, Detection and Estimation Theory, # 5 27

28 is everything else Bayesian, at least conceptually) and we have already seen it. Solution. We apply the Lagrange-multiplier approach: maximize L = P D + λ (P FA α ) [ ] = p(x ; θ 1 ) dx + λ p(x; θ 0 ) dx α X 1 X 1 = [p(x ; θ 1 ) λ p(x ; θ 0 )] dx λ α. X 1 To maximize L, set X 1 = {x : p(x ; θ 1 ) λ p(x ; θ 0 ) > 0} = { x : p(x ; θ 1) p(x ; θ 0 ) > λ }. Again, we find the likelihood ratio: Recall our constraint: Λ(x) = p(x ; θ 1) p(x ; θ 0 ). X 1 p(x ; θ 0 ) dx = P FA = α α. EE 527, Detection and Estimation Theory, # 5 28

29 If we increase λ, P FA and P D go down. Similarly, if we decrease λ, P FA and P D go up. Hence, to maximize P D, choose λ so that P FA is as big as possible under the constraint. Two useful ways for determining the threshold that achieves a specified false-alarm rate: Find λ that satisfies or, x : Λ(x)>λ p(x ; θ 0 ) dx = P FA = α expressing in terms of the pdf/pmf of Λ(x) under H 0 : λ p Λ;θ0 (l ; θ 0 ) dl = α. or, perhaps, in terms of a monotonic function of Λ(x), say T (x) = monotonic function(λ(x)). Warning: We have been implicitly assuming that P FA is a continuous function of λ. Some (not insightful) technical adjustments are needed if this is not the case. A way of handling nuisance parameters: We can utilize the integrated (marginal) likelihood ratio (16) under the Neyman- Pearson setup as well. EE 527, Detection and Estimation Theory, # 5 29

30 Chernoff-Stein Lemma for Bounding the Miss Probability in Neyman-Pearson Tests of Simple Hypotheses Recall the definition of the Kullback-Leibler (K-L) distance D(p q) from one pmf (p) to another (p): D(p q) = k p k log p k q k. The complete proof of this lemma for the discrete (pmf) case is given in Additional Reading: T.M. Cover and J.A. Thomas, Elements of Information Theory. Second ed., New York: Wiley, EE 527, Detection and Estimation Theory, # 5 30

31 Setup for the Chernoff-Stein Lemma Assume that x 1, x 2,..., x N are conditionally i.i.d. given θ. We adopt the Neyman-Pearson framework, i.e. obtain a decision threshold to achieve a fixed P FA. Let us study the asymptotic P M = 1 P D as the number of observations N gets large. To keep P FA constant as N increases, we need to make our decision threshold (γ, say) vary with N, i.e. Now, the miss probability is γ = γ N (P FA ) ) P M = P M (γ) = P M (γ N (P FA ). EE 527, Detection and Estimation Theory, # 5 31

32 Chernoff-Stein Lemma The Chernoff-Stein lemma says: lim P FA 0 lim N 1 N log P M = ( ) D p(x n θ 0 ) p(x n θ 1 ) } {{ } K-L distance for a single observation where the K-L distance between p(x n θ 0 ) and p(x n θ 1 ) ( ) D p(x n θ 0 ) p(x n θ 1 ) discrete (pmf) case = x n p(x n θ 0 ) log [ = E p(xn θ 0 ) log p(x n θ 1 ) ] p(x n θ 0 ) [ p(xn θ 1 ) ] p(x n θ 0 ) does not depend on the observation index n, since x n conditionally i.i.d. given θ. are Equivalently, we can state that [ N + P M f(n) exp N D ( p(x n θ 0 ) p(x n θ 1 ) )] as P FA 0 and N, where f(n) is a slowly-varying function compared with the exponential term (when P FA 0 and N ). EE 527, Detection and Estimation Theory, # 5 32

33 Detection for Simple Hypotheses: Example Known positive DC level in additive white Gaussian noise (AWGN), Example 3.2 in Kay-II. Consider where H 0 : x[n] = w[n], n = 1, 2,..., N versus H 1 : x[n] = A + w[n], n = 1, 2,..., N A > 0 is a known constant, w[n] is zero-mean white Gaussian noise with known variance σ 2, i.e. w[n] N (0, σ 2 ). The above hypothesis-testing formulation is EE-like: noise versus signal plus noise. It is similar to the on-off keying scheme in communications, which gives us an idea to rephrase it so that it fits our formulation on p. 4 (for which we have developed all the theory so far). Here is such an EE 527, Detection and Estimation Theory, # 5 33

34 alternative formulation: consider a family of pdfs p(x ; a) = 1 [ (2πσ2 ) exp N 1 2 σ 2 N (x[n] a) 2] (17) n=1 and the following (equivalent) hypotheses: H 0 : a = 0 (off) versus H 1 : a = A (on). Then, the likelihood ratio is Λ(x) = p(x ; a = A) p(x ; a = 0) = 1/(2πσ2 ) N/2 exp[ 1 N 2σ 2 n=1 (x[n] A)2 ] 1/(2πσ 2 ) N/2 exp( 1 N. 2σ 2 n=1 x[n]2 ) Now, take the logarithm and, after simple manipulations, reduce our likelihood-ratio test to comparing T (x) = x = 1 N N x[n] n=1 with a threshold γ. [Here, T (x) is a monotonic function of Λ(x).] If T (x) > γ accept H 1 (i.e. reject H 0 ), otherwise accept EE 527, Detection and Estimation Theory, # 5 34

35 H 0 (well, not exactly, we will talk more about this decision on p. 59). The choice of γ depends on the approach that we take. For the Bayes decision rule, γ is a function of π 0 and π 1. For the Neyman-Pearson test, γ is chosen to achieve (control) a desired P FA. Bayesian decision-theoretic detection for 0-1 loss (corresponding to minimizing the average error EE 527, Detection and Estimation Theory, # 5 35

36 probability): log Λ(x) = 1 2 σ σ 2 2 A ( N n=1 N (x[n] A) σ 2 n=1 N (x[n] x[n] } {{ } 0 n=1 ( N n=1 ) x[n] N n=1 (x[n]) 2 H ( 1 π0 ) log π 1 +A) (x[n] + x[n] A) H ( 1 π0 ) log π 1 ) x[n] A 2 N H ( 1 2 σ 2 π0 ) log π 1 AN 2 H 1 σ2 A log ( π0 π 1 ) σ2 since A > 0 and, finally, x H 1 N A log(π 0/π 1 ) + A } {{ 2} γ which, for the practically most interesting case of equiprobable hypotheses π 0 = π 1 = 1 2 (18) simplifies to x H 1 A 2 }{{} γ known as the maximum-likelihood test (i.e. the Bayes decision rule for 0-1 loss and a priori equiprobable hypotheses is defined as the maximum-likelihood test). This maximum-likelihood detector does not require the knowledge of the noise variance EE 527, Detection and Estimation Theory, # 5 36

37 σ 2 to declare its decision. However, the knowledge of σ 2 is key to assessing the detection performance. Interestingly, these observations will carry over to a few maximum-likelihood tests that we will derive in the future. Assuming (18), we now derive the minimum average error probability. First, note that X a = 0 N (0, σ 2 /N) and X a = A N (A, σ 2 /N). Then min av. error prob. = 1 2 P [X > A } {{ 2 a = 0] } +1 P FA = 1 2 P [ P [ X σ2 /N } {{ } standard normal random var. X A σ2 /N } {{ } standard normal random var. = 1 2 ( N A Q ) } {{ 4 σ 2 } standard normal complementary cdf > A/2 σ2 /N ; a = 0 ] < A/2 A σ2 /N ; a = A ] ( N A Φ 2) } {{ 4 σ 2 } standard normal cdf 2 P [X < A } {{ 2 a = A] } P M ( = Q 1 2 N A 2 σ 2 ) EE 527, Detection and Estimation Theory, # 5 37

38 since Φ( x) = Q(x). The minimum probability of error decreases monotonically with N A 2 /σ 2, which is known as the deflection coefficient. In this case, the Chernoff information is equal to A2 (see Lemma 1): 8 σ 2 { min log λ [0,1] [ λ (1 λ) = min λ [0,1] 2 } [N (0, σ 2 /N)] } {{ } λ [N (A, σ 2 /N)] } {{ } 1 λ dx p 0 (x) p 1 (x) A2 σ 2 ] = A2 8 σ 2 and, therefore, we expect the following asymptotic behavior: min av. error probability N + ) f(n) exp ( N A2 8 σ 2 (19) where f(n) varies slowly with N when N is large, compared with the exponential term: lim N log f(n) N = 0. Indeed, in our example, ( N A Q 2 ) N + 4 σ 2 1 N A 2 2π 4 σ } 2 {{ } f(n) ( exp N ) A2 8 σ 2 EE 527, Detection and Estimation Theory, # 5 38

39 where we have used the asymptotic formula Q(x) x + 1 x 2π exp( x2 /2) (20) given e.g. in equation (2.4) of Kay-II. EE 527, Detection and Estimation Theory, # 5 39

40 Known DC Level in AWGN: Neyman-Pearson Approach We now derive the Neyman-Pearson test for detecting a known DC level in AWGN. Recall that our likelihood-ratio test compares with a threshold γ. T (x) = x = 1 N N 1 n=0 x[n] If T (x) > γ, decide H 1 (i.e. reject H 0 ), otherwise decide H 0 (see also the discussion on p. 59). Performance evaluation: Assuming (17), we have T (x) a N (a, σ 2 /N). Therefore, T (X) a = 0 N (0, σ 2 /N), implying ( γ ) P FA = P [T (X) > γ ; a = 0] = Q σ2 /N EE 527, Detection and Estimation Theory, # 5 40

41 and we obtain the decision threshold as follows: γ = σ 2 N Q 1 (P FA ). Now, T (X) a = A N (A, σ 2 /N), implying ( γ A ) P D = P (T (X) > γ a = A) = Q σ2 /N ( ) = Q Q 1 A (P FA ) 2 σ 2 /N ( ) = Q Q 1 (P FA ) NA2 /σ } {{ } 2. = SNR = d 2 Given the false-alarm probability P FA, the detection probability P D depends only on the deflection coefficient: d 2 = NA2 σ 2 = {E [T (X) a = A] E [T (X) a = 0]}2 var[t (X a = 0)] which is also (a reasonable definition for) the signal-to-noise ratio (SNR). EE 527, Detection and Estimation Theory, # 5 41

42 Receiver Operating Characteristics (ROC) Comments: P D = Q ( Q 1 (P FA ) d ). As we raise the threshold γ, P FA goes down but so does P D. ROC should be above the 45 line otherwise we can do better by flipping a coin. Performance improves with d 2. EE 527, Detection and Estimation Theory, # 5 42

43 Typical Ways of Depicting the Detection Performance Under the Neyman-Pearson Setup To analyze the performance of a Neyman-Pearson detector, we examine two relationships: Between P D and P FA, for a given SNR, called the operating characteristics (ROC). receiver Between P D and SNR, for a given P FA. Here are examples of the two: see Figs. 3.8 and 3.5 in Kay-II, respectively. EE 527, Detection and Estimation Theory, # 5 43

44 Asymptotic (as N and P FA 0) P D and P M for a Known DC Level in AWGN We apply the Chernoff-Stein lemma, for which we need to compute the following K-L distance: log where ( ) D p(x n a = 0) p(x n a = A) { = E p(xn a=0) log [ p(xn a = A) ]} p(x n a = 0) p(x n a = 0) = N (0, σ 2 ) [ p(xn a = A) ] = (x n A) 2 p(x n a = 0) 2 σ 2 + x2 n 2 σ 2 = 1 2σ 2 (2 A x n A 2 ). Therefore, ( ) D p(x n θ 0 ) p(x n θ 1 ) = A2 2 σ 2 and the Chernoff-Stein lemma predicts the following behavior of the detection probability as N and P FA 0: ( ) P D 1 f(n) exp N A2 } {{ 2 σ 2 } P M EE 527, Detection and Estimation Theory, # 5 44

45 where f(n) is a slowly-varying function of N compared with the exponential term. In this case, the exact expression for P M (P D ) is available and consistent with the Chernoff-Stein lemma: P M = 1 Q (Q 1 (P FA ) ) NA 2 /σ 2 ) = Q( NA2 /σ 2 Q 1 (P FA ) N + = = 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [ } NA 2 /σ 2 Q 1 (P FA )] 2 /2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp NA 2 /(2σ 2 ) [Q 1 (P FA )] 2 /2 +Q 1 (P FA ) } NA 2 /σ 2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [Q 1 (P FA )] 2 /2 + Q 1 (P FA ) } NA 2 /σ 2 exp[ NA 2 /(2σ 2 )] } {{ } as predicted by the Chernoff-Stein lemma (21) where we have used the asymptotic formula (20). When P FA EE 527, Detection and Estimation Theory, # 5 45

46 is small and N is large, the first two (green) terms in the above expression make a slowly-varying function f(n) of N. Note that the exponential term in (21) does not depend on the false-alarm probability P FA (or, equivalently, on the choice of the decision threshold). The Chernoff-Stein lemma asserts that this is not a coincidence. Comment. For detecting a known DC level in AWGN: The slope of the exponential-decay term of P M is A 2 /(2σ 2 ) which is different from (larger than, in this case) the slope of the exponential decay of the minimum average error probability: A 2 /(8σ 2 ). EE 527, Detection and Estimation Theory, # 5 46

47 Decentralized Neyman-Pearson Detection for Simple Hypotheses Consider a sensor-network scenario depicted by Assumption: Observations made at N spatially distributed sensors (nodes) follow the same marginal probabilistic model: H i : x n p(x n θ i ) where n = 1, 2,..., N and i {0, 1} for binary hypotheses. Each node n makes a hard local decision d n based on its local observation x n and sends it to the headquarters (fusion center), which collects all the local decisions and makes the final global EE 527, Detection and Estimation Theory, # 5 47

48 decision about H 0 or H 1. This structure is clearly suboptimal it is easy to construct a better decision strategy in which each node sends its (quantized, in practice) likelihood ratio to the fusion center, rather than the decision only. However, such a strategy would have a higher communication (energy) cost. We now go back to the decentralized detection problem. Suppose that each node n makes a local decision d n {0, 1}, n = 1, 2,..., N and transmits it to the fusion center. Then, the fusion center makes the global decision based on the likelihood ratio formed from the d n s. The simplest fusion scheme is based on the assumption that the d n s are conditionally independent given θ (which may not always be reasonable, but leads to an easy solution). We can now write p(d n θ 1 ) = P d n D,n (1 P D,n ) 1 d n } {{ } Bernoulli pmf where P D,n Similarly, is the nth sensor s local detection probability. p(d n θ 0 ) = P d n FA,n (1 P FA,n ) 1 d n } {{ } Bernoulli pmf where P FA,n is the nth sensor s local detection false-alarm EE 527, Detection and Estimation Theory, # 5 48

49 probability. Now, log Λ(d) = = N log n=1 N log n=1 [ p(dn θ 1 ) ] p(d n θ 0 [ P d n D,n (1 P D,n ) 1 d n P d n FA,n (1 P FA,n ) 1 d n ] H1 log τ. To further simplify the exposition, we assume that all sensors have identical performance: P D,n = P D, P FA,n = P FA. Define the number of sensors having d n = 1: u 1 = N n=1 d n Then, the log-likelihood ratio becomes log Λ(d) = u 1 log ( PD P FA ) + (N u 1 ) log ( 1 PD 1 P FA ) H1 log τ or u 1 log [ PD (1 P FA ) ] H1 ( 1 PFA ) log τ + N log. (22) P FA (1 P D ) 1 P D EE 527, Detection and Estimation Theory, # 5 49

50 Clearly, each node s local decision d n P D > P FA, which implies is meaningful only if P D (1 P FA ) P FA (1 P D ) > 1 the logarithm of which is therefore positive, and the decision rule (22) further simplifies to u 1 H 1 τ. The Neyman-Person performance analysis of this detector is easy: the random variable U 1 is binomial given θ (i.e. conditional on the hypothesis) and, therefore, P [U 1 = u 1 ] = ( ) N p u 1 (1 p) N u 1 u 1 where p = P FA under H 0 and p = P D under H 1. Hence, the global false-alarm probability is P FA,global = P [U 1 > τ θ 0 ] = N u 1 = τ ( N u 1 ) P u 1 FA (1 P FA ) N u 1. Clearly, P FA,global will be a discontinuous function of τ and therefore, we should choose our P FA,global specification from EE 527, Detection and Estimation Theory, # 5 50

51 the available discrete choices. But, if none of the candidate choices is acceptable, this means that our current system does not satisfy the requirements and a remedial action is needed, e.g. increasing the quantity (N) or improving the quality of sensors (through changing P D and P FA ), or both. EE 527, Detection and Estimation Theory, # 5 51

52 Testing Multiple Hypotheses Suppose now that we choose Θ 0, Θ 1,..., Θ M 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 Θ M 1 = Θ, Θ i Θ j = i j. We wish to distinguish among M > 2 hypotheses, i.e. identify which hypothesis is true: H 0 : θ Θ 0 versus H 1 : θ Θ 1 versus. versus H M 1 : θ Θ M 1 and, consequently, our action space consists of M choices. We design a decision rule φ : X (0, 1,..., M 1): φ(x) = 0, decide H 0, 1,. decide H 1, M 1, decide H M 1 where φ partitions the data space X [i.e. the support of p(x θ)] into M regions: Rule φ: X 0 = {x : φ(x) = 0},..., X M 1 = {x : φ(x) = M 1}. EE 527, Detection and Estimation Theory, # 5 52

53 We specify the loss function using L(i m), where, typically, the losses due to correct decisions are set to zero: L(i i) = 0, i = 0, 1,..., M 1. Here, we adopt zero losses for correct decisions. posterior expected loss takes M values: Now, our ρ m (x) = = M 1 i=0 M 1 i=0 Θ i L(m i) p(θ x) dθ L(m i) Θ i p(θ x) dθ, m = 0, 1,..., M 1. Then, the Bayes decision rule φ is defined via the following data-space partitioning: X m = {x : ρ m (x) = min ρ l(x)}, m = 0, 1,..., M 1 0 l M 1 or, equivalently, upon applying the Bayes rule X m = = { M 1 } x : m = arg min L(l i) p(x θ) π(θ) dθ 0 l M 1 i=0 Θ i { M 1 x : m = arg min L(l i) 0 l M 1 i=0 Θ i p(x θ) π(θ) dθ }. EE 527, Detection and Estimation Theory, # 5 53

54 The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = = M 1 m=0 M 1 m=0 M 1 i=0 X m X m Θ i L(m i) p(x θ)π(θ) dθ dx M 1 L(m i) p(x θ)π(θ) dθ i=0 Θ } {{ i } h m (θ) dx. Then, for an arbitrary h m (x), [ M 1 m=0 ] h m (x) dx X m [ M 1 m=0 X m ] h m (x) dx 0 which verifies that the Bayes decision rule φ minimizes the preposterior (Bayes) risk. EE 527, Detection and Estimation Theory, # 5 54