# Introduction to Detection Theory

Size: px
Start display at page:

## Transcription

1 Introduction to Detection Theory Reading: Ch. 3 in Kay-II. Notes by Prof. Don Johnson on detection theory, see Ch. 10 in Wasserman. EE 527, Detection and Estimation Theory, # 5 1

2 Introduction to Detection Theory (cont.) We wish to make a decision on a signal of interest using noisy measurements. Statistical tools enable systematic solutions and optimal design. Application areas include: Communications, Radar and sonar, Nondestructive evaluation (NDE) of materials, Biomedicine, etc. EE 527, Detection and Estimation Theory, # 5 2

3 Example: Radar Detection. We wish to decide on the presence or absence of a target. EE 527, Detection and Estimation Theory, # 5 3

4 Introduction to Detection Theory We assume a parametric measurement model p(x θ) [or p(x ; θ), which is the notation that we sometimes use in the classical setting]. In point estimation theory, we estimated the parameter θ Θ given the data x. Suppose now that we choose Θ 0 and Θ 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 = Θ, Θ 0 Θ 1 =. In detection theory, we wish to identify which hypothesis is true (i.e. make the appropriate decision): H 0 : θ Θ 0, null hypothesis H 1 : θ Θ 1, alternative hypothesis. Terminology: If θ can only take two values, Θ = {θ 0, θ 1 }, Θ 0 = {θ 0 }, Θ 1 = {θ 1 } we say that the hypotheses are simple. Otherwise, we say that they are composite. Composite Hypothesis Example: H 0 : θ = 0 versus H 1 : θ (0, ). EE 527, Detection and Estimation Theory, # 5 4

5 The Decision Rule We wish to design a decision rule (function) φ(x) : X (0, 1): φ(x) = { 1, decide H1, 0, decide H 0. which partitions the data space X [i.e. the support of p(x θ)] into two regions: Rule φ(x): X 0 = {x : φ(x) = 0}, X 1 = {x : φ(x) = 1}. Let us define probabilities of false alarm and miss: P FA = E x θ [φ(x) θ] = p(x θ) dx for θ in Θ 0 X 1 P M = E x θ [1 φ(x) θ] = 1 p(x θ) dx X 1 = p(x θ) dx for θ in Θ 1. X 0 Then, the probability of detection (correctly deciding H 1 ) is P D = 1 P M = E x θ [φ(x) θ] = X 1 p(x θ) dx for θ in Θ 1. EE 527, Detection and Estimation Theory, # 5 5

6 Note: P FA and P D /P M are generally functions of the parameter θ (where θ Θ 0 when computing P FA and θ Θ 1 when computing P D /P M ). More Terminology. Statisticians use the following terminology: False alarm Type I error Miss Type II error Probability of detection Power Probability of false alarm Significance level. EE 527, Detection and Estimation Theory, # 5 6

7 Bayesian Decision-theoretic Detection Theory Recall (a slightly generalized version of) the posterior expected loss: ρ(action x) = L(θ, action) p(θ x) dθ Θ that we introduced in handout # 4 when we discussed Bayesian decision theory. Let us now apply this theory to our easy example discussed here: hypothesis testing, where our action space consists of only two choices. We first assign a loss table: decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) x X 0 L(0 1) L(0 0) = 0 with the loss function described by the quantities L(declared true): L(1 0) quantifies loss due to a false alarm, L(0 1) quantifies loss due to a miss, L(1 1) and L(0 0) (losses due to correct decisions) typically set to zero in real life. Here, we adopt zero losses for correct decisions. EE 527, Detection and Estimation Theory, # 5 7

8 Now, our posterior expected loss takes two values: ρ 0 (x) = L(0 1) p(θ x) dθ Θ 1 + L(0 0) p(θ x) dθ } {{ } Θ 0 0 = L(0 1) p(θ x) dθ Θ 1 L(0 1) is constant = L(0 1) p(θ x) dθ Θ } 1 {{ } P [θ Θ 1 x] and, similarly, ρ 1 (x) = L(1 0) p(θ x) dθ Θ 0 L(1 0) is constant = L(1 0) p(θ x) dθ. Θ } 0 {{ } P [θ Θ 0 x] We define the Bayes decision rule as the rule that minimizes the posterior expected loss; this rule corresponds to choosing the data-space partitioning as follows: X 1 = {x : ρ 1 (x) ρ 0 (x)} EE 527, Detection and Estimation Theory, # 5 8

9 or P [θ Θ 1 x] { }} { { p(θ x) dθ } Θ X 1 = x : 1 L(1 0) L(0 1) p(θ x) dθ Θ } 0 {{ } P [θ Θ 0 x] or, equivalently, upon applying the Bayes rule: X 1 = { x : Θ 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 0-1 loss: For L(1 0) = L(0 1) = 1, we have (1) } L(1 0) L(0 1). (2) decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) = 1 x X 0 L(0 1) = 1 L(0 0) = 0 yielding the following Bayes decision rule, called the maximum a posteriori (MAP) rule: X 1 = { x : P [θ Θ 1 x] P [θ Θ 0 x] 1 } or, equivalently, upon applying the Bayes rule: { } Θ X 1 = x : 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 1. (4) (3) EE 527, Detection and Estimation Theory, # 5 9

10 Simple hypotheses. Let us specialize (1) to the case of simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): X 1 = { x : p(θ 1 x) p(θ 0 x) } {{ } posterior-odds ratio } L(1 0) L(0 1). (5) We can rewrite (5) using the Bayes rule: X 1 = { x : p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio } π 0 L(1 0) π 1 L(0 1) (6) where π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 describe the prior probability mass function (pmf) of the binary random variable θ (recall that θ {θ 0, θ 1 }). Hence, for binary simple hypotheses, the prior pmf of θ is the Bernoulli pmf. EE 527, Detection and Estimation Theory, # 5 10

11 Preposterior (Bayes) Risk The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 L(1 0) p(x θ)π(θ) dθ dx Θ 0 Θ 1 L(0 1) p(x θ)π(θ) dθ dx. How do we choose the rule φ(x) that minimizes the preposterior EE 527, Detection and Estimation Theory, # 5 11

12 risk? L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 = L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 = const } {{ } not dependent on φ(x) { + L(1 0) p(x θ)π(θ) dθ X 1 Θ 0 } L(0 1) p(x θ)π(θ) dθ dx Θ 1 implying that X 1 should be chosen as { } X 1 : L(1 0) p(x θ)π(θ) dθ L(0 1) p(x θ)π(θ) < 0 Θ 0 Θ 1 EE 527, Detection and Estimation Theory, # 5 12

13 which, as expected, is the same as (2), since minimizing the posterior expected loss minimizing the preposterior risk for every x as showed earlier in handout # loss: For the 0-1 loss, i.e. L(1 0) = L(0 1) = 1, the preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx (7) which is simply the average error probability, with averaging performed over the joint probability density or mass function (pdf/pmf) or the data x and parameters θ. EE 527, Detection and Estimation Theory, # 5 13

14 Bayesian Decision-theoretic Detection for Simple Hypotheses The Bayes decision rule for simple hypotheses is (6): Λ(x) } {{ } likelihood ratio = p(x θ 1) p(x θ 0 ) H 1 π 0 L(1 0) π 1 L(0 1) τ (8) see also Ch. 3.7 in Kay-II. (Recall that Λ(x) is the sufficient statistic for the detection problem, see p. 37 in handout # 1.) Equivalently, log Λ(x) = log[p(x θ 1 )] log[p(x θ 0 )] H 1 log τ τ. Minimum Average Error Probability Detection: In the familiar 0-1 loss case where L(1 0) = L(0 1) = 1, we know that the preposterior (Bayes) risk is equal to the average error probability, see (7). This average error probability greatly EE 527, Detection and Estimation Theory, # 5 14

15 simplifies in the simple hypothesis testing case: av. error probability = L(1 0) p(x θ } {{ } 0 )π 0 dx X L(0 1) p(x θ } {{ } 1 )π 1 dx X 0 1 = π 0 p(x θ 0 ) dx +π 1 p(x θ 1 ) dx X } 1 X {{ } } 0 {{ } P FA P M where, as before, the averaging is performed over the joint pdf/pmf of the data x and parameters θ, and π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 (the Bernoulli pmf). In this case, our Bayes decision rule simplifies to the MAP rule (as expected, see (5) and Ch. 3.6 in Kay-II): p(θ 1 x) p(θ 0 x) } {{ } posterior-odds ratio or, equivalently, upon applying the Bayes rule: p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio H 1 1 (9) H 1 π 0 π 1. (10) EE 527, Detection and Estimation Theory, # 5 15

16 which is the same as (4), upon substituting the Bernoulli pmf as the prior pmf for θ and (8), upon substituting L(1 0) = L(0 1) = 1. EE 527, Detection and Estimation Theory, # 5 16

17 Bayesian Decision-theoretic Detection Theory: Handling Nuisance Parameters We apply the same approach as before integrate the nuisance parameters (ϕ, say) out! Therefore, (1) still holds for testing H 0 : θ Θ 0 versus H 1 : θ Θ 1 but p θ x (θ x) is computed as follows: p θ x (θ x) = p θ,ϕ x (θ, ϕ x) dϕ and, therefore, Θ 1 Θ 0 p(θ x) { }} { p θ,ϕ x (θ, ϕ x) dϕ dθ p θ,ϕ x (θ, ϕ x) dϕ } {{ } p(θ x) dθ H 1 L(1 0) L(0 1) (11) EE 527, Detection and Estimation Theory, # 5 17

18 or, equivalently, upon applying the Bayes rule: Θ 1 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ Θ 0 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ H 1 L(1 0) L(0 1). (12) Now, if θ and ϕ are independent a priori, i.e. π θ,ϕ (θ, ϕ) = π θ (θ) π ϕ (ϕ) (13) then (12) can be rewritten as Θ 1 π θ (θ) Θ 0 π θ (θ) p(x θ) { }} { p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) dϕ dθ p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) } {{ } p(x θ) dϕ dθ H 1 L(1 0) L(0 1). (14) Simple hypotheses and independent priors for θ and ϕ: Let us specialize (11) to the simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): pθ,ϕ x (θ 1, ϕ x) dϕ pθ,ϕ x (θ 0, ϕ x) dϕ H 1 L(1 0) L(0 1). (15) Now, if θ and ϕ are independent a priori, i.e. (13) holds, then EE 527, Detection and Estimation Theory, # 5 18

19 we can rewrite (14) [or (15) using the Bayes rule]: px θ,ϕ (x θ 1, ϕ) π ϕ (ϕ) dϕ px θ,ϕ (x θ 0, ϕ) π ϕ (ϕ) dϕ } {{ } integrated likelihood ratio = same as (6) { }} { p(x θ 1 ) H 1 π 0 L(1 0) p(x θ 0 ) π 1 L(0 1) (16) where π 0 = π θ (θ 0 ), π 1 = π θ (θ 1 ) = 1 π 0. EE 527, Detection and Estimation Theory, # 5 19

20 Chernoff Bound on Average Error Probability for Simple Hypotheses Recall that minimizing the average error probability av. error probability = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx leads to the MAP decision rule: X 1 = { x : Θ 0 p(x θ) π(θ) dθ } p(x θ) π(θ) dθ < 0. Θ 1 In many applications, we may not be able to obtain a simple closed-form expression for the minimum average error EE 527, Detection and Estimation Theory, # 5 20

21 probability, but we can bound it as follows: min av. error probability = X1 p(x θ)π(θ) dθ dx Θ 0 + p(x θ)π(θ) dθ dx Θ 1 X 0 using the def. of X 1 = X }{{} data space { } min p(x θ)π(θ) dθ, p(x θ)π(θ) dθ) dx Θ 0 Θ 1 = X X [ = q 0 (x) = q { }} { 1 (x) ] λ [ { }} { ] 1 λ p(x θ)π(θ) dθ p(x θ)π(θ) dθ dx Θ 0 Θ 1 [q 0 (x)] λ [q 1 (x)] 1 λ dx which is the Chernoff bound on the minimum average error probability. Here, we have used the fact that min{a, b} a λ b 1 λ, for 0 λ 1, a, b 0. When x = x 1 x 2. x N EE 527, Detection and Estimation Theory, # 5 21

22 with x 1, x 2,... x N conditionally independent, identically distributed (i.i.d.) given θ and simple hypotheses (i.e. Θ 0 = {θ 0 }, Θ 1 = {θ 1 }) then N q 0 (x) = p(x θ 0 ) π 0 = π 0 p(x n θ 0 ) yielding n=1 N q 1 (x) = p(x θ 1 ) π 1 = π 1 p(x n θ 1 ) n=1 Chernoff bound for N conditionally i.i.d. measurements (given θ) and simple hyp. [ N ] λ N 1 λ = π 0 p(x n θ 0 ) [π 1 p(x n θ 1 )] dx n=1 = π λ 0 π 1 λ 1 = π λ 0 π 1 λ 1 N n=1 n=1 { [p(x n θ 0 )] λ [p(x n θ 1 )] 1 λ dx n } { [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1 } N EE 527, Detection and Estimation Theory, # 5 22

23 or, in other words, 1 N log(min av. error probability) log(πλ 0 π1 1 λ ) + log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1, λ [0, 1]. If π 0 = π 1 = 1/2 (which is almost always the case of interest when evaluating average error probabilities), we can say that, as N, min av. error probability for N cond. i.i.d. measurements (given θ) and simple hypotheses ( { f(n) exp N min log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ} ) λ [0,1] } {{ } Chernoff information for a single observation where f(n) is a slowly-varying function compared with the exponential term: lim N log f(n) N = 0. Note that the Chernoff information in the exponent term of the above expression quantifies the asymptotic behavior of the minimum average error probability. We now give a useful result, taken from EE 527, Detection and Estimation Theory, # 5 23

24 K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., San Diego, CA: Academic Press, 1990 for evaluating a class of Chernoff bounds. Lemma 1. Consider p 1 (x) = N (µ 1, Σ 1 ) and p 2 (x) = N (µ 2, Σ 2 ). Then [p 1 (x)] λ [p 2 (x)] 1 λ dx = exp[ g(λ)] where g(λ) = λ (1 λ) 2 (µ 2 µ 1 ) T [λ Σ 1 + (1 λ) Σ 2 ] 1 (µ 2 µ 1 ) [ λ log Σ1 + (1 λ) Σ 2 ] Σ 1 λ Σ 2 1 λ. EE 527, Detection and Estimation Theory, # 5 24

25 Probabilities of False Alarm (P FA ) and Detection (P D ) for Simple Hypotheses 0.2 PDF under H 0 PDF under H 1 Probability Density Function P FA P D τ Test Statistic Comments: Λ(x) { }} { P FA = P [ test } statistic {{ > τ} θ = θ 0 ] X X 1 P D = P [test statistic > τ } {{ } X X 0 θ = θ 1 ]. (i) As the region X 1 shrinks (i.e. τ ), both of the above probabilities shrink towards zero. EE 527, Detection and Estimation Theory, # 5 25

26 (ii) As the region X 1 grows (i.e. τ 0), both of these probabilities grow towards unity. (iii) Observations (i) and (ii) do not imply equality between P FA and P D ; in most cases, as R 1 grows, P D grows more rapidly than P FA (i.e. we better be right more often than we are wrong). (iv) However, the perfect case where our rule is always right and never wrong (P D = 1 and P FA = 0) cannot occur when the conditional pdfs/pmfs p(x θ 0 ) and p(x θ 1 ) overlap. (v) Thus, to increase the detection probability P D, we must also allow for the false-alarm probability P FA to increase. This behavior represents the fundamental tradeoff in hypothesis testing and detection theory and motivates us to introduce a (classical) approach to testing simple hypotheses, pioneered by Neyman and Pearson (to be discussed next). EE 527, Detection and Estimation Theory, # 5 26

27 Neyman-Pearson Test for Simple Hypotheses Setup: Parametric data models p(x ; θ 0 ), p(x ; θ 1 ), Simple hypothesis testing: H 0 : θ = θ 0 versus H 1 : θ = θ 1. No prior pdf/pmf on θ is available. Goal: Design a test that maximizes the probability of detection P D = P [X X 1 ; θ = θ 0 ] (equivalently, minimizes the miss probability P M ) under the constraint P FA = P [X X 1 ; θ = θ 0 ] = α α. Here, we consider simple hypotheses; classical version of testing composite hypotheses is much more complicated. The Bayesian version of testing composite hypotheses is trivial (as EE 527, Detection and Estimation Theory, # 5 27

28 is everything else Bayesian, at least conceptually) and we have already seen it. Solution. We apply the Lagrange-multiplier approach: maximize L = P D + λ (P FA α ) [ ] = p(x ; θ 1 ) dx + λ p(x; θ 0 ) dx α X 1 X 1 = [p(x ; θ 1 ) λ p(x ; θ 0 )] dx λ α. X 1 To maximize L, set X 1 = {x : p(x ; θ 1 ) λ p(x ; θ 0 ) > 0} = { x : p(x ; θ 1) p(x ; θ 0 ) > λ }. Again, we find the likelihood ratio: Recall our constraint: Λ(x) = p(x ; θ 1) p(x ; θ 0 ). X 1 p(x ; θ 0 ) dx = P FA = α α. EE 527, Detection and Estimation Theory, # 5 28

29 If we increase λ, P FA and P D go down. Similarly, if we decrease λ, P FA and P D go up. Hence, to maximize P D, choose λ so that P FA is as big as possible under the constraint. Two useful ways for determining the threshold that achieves a specified false-alarm rate: Find λ that satisfies or, x : Λ(x)>λ p(x ; θ 0 ) dx = P FA = α expressing in terms of the pdf/pmf of Λ(x) under H 0 : λ p Λ;θ0 (l ; θ 0 ) dl = α. or, perhaps, in terms of a monotonic function of Λ(x), say T (x) = monotonic function(λ(x)). Warning: We have been implicitly assuming that P FA is a continuous function of λ. Some (not insightful) technical adjustments are needed if this is not the case. A way of handling nuisance parameters: We can utilize the integrated (marginal) likelihood ratio (16) under the Neyman- Pearson setup as well. EE 527, Detection and Estimation Theory, # 5 29

30 Chernoff-Stein Lemma for Bounding the Miss Probability in Neyman-Pearson Tests of Simple Hypotheses Recall the definition of the Kullback-Leibler (K-L) distance D(p q) from one pmf (p) to another (p): D(p q) = k p k log p k q k. The complete proof of this lemma for the discrete (pmf) case is given in Additional Reading: T.M. Cover and J.A. Thomas, Elements of Information Theory. Second ed., New York: Wiley, EE 527, Detection and Estimation Theory, # 5 30

31 Setup for the Chernoff-Stein Lemma Assume that x 1, x 2,..., x N are conditionally i.i.d. given θ. We adopt the Neyman-Pearson framework, i.e. obtain a decision threshold to achieve a fixed P FA. Let us study the asymptotic P M = 1 P D as the number of observations N gets large. To keep P FA constant as N increases, we need to make our decision threshold (γ, say) vary with N, i.e. Now, the miss probability is γ = γ N (P FA ) ) P M = P M (γ) = P M (γ N (P FA ). EE 527, Detection and Estimation Theory, # 5 31

32 Chernoff-Stein Lemma The Chernoff-Stein lemma says: lim P FA 0 lim N 1 N log P M = ( ) D p(x n θ 0 ) p(x n θ 1 ) } {{ } K-L distance for a single observation where the K-L distance between p(x n θ 0 ) and p(x n θ 1 ) ( ) D p(x n θ 0 ) p(x n θ 1 ) discrete (pmf) case = x n p(x n θ 0 ) log [ = E p(xn θ 0 ) log p(x n θ 1 ) ] p(x n θ 0 ) [ p(xn θ 1 ) ] p(x n θ 0 ) does not depend on the observation index n, since x n conditionally i.i.d. given θ. are Equivalently, we can state that [ N + P M f(n) exp N D ( p(x n θ 0 ) p(x n θ 1 ) )] as P FA 0 and N, where f(n) is a slowly-varying function compared with the exponential term (when P FA 0 and N ). EE 527, Detection and Estimation Theory, # 5 32

33 Detection for Simple Hypotheses: Example Known positive DC level in additive white Gaussian noise (AWGN), Example 3.2 in Kay-II. Consider where H 0 : x[n] = w[n], n = 1, 2,..., N versus H 1 : x[n] = A + w[n], n = 1, 2,..., N A > 0 is a known constant, w[n] is zero-mean white Gaussian noise with known variance σ 2, i.e. w[n] N (0, σ 2 ). The above hypothesis-testing formulation is EE-like: noise versus signal plus noise. It is similar to the on-off keying scheme in communications, which gives us an idea to rephrase it so that it fits our formulation on p. 4 (for which we have developed all the theory so far). Here is such an EE 527, Detection and Estimation Theory, # 5 33

34 alternative formulation: consider a family of pdfs p(x ; a) = 1 [ (2πσ2 ) exp N 1 2 σ 2 N (x[n] a) 2] (17) n=1 and the following (equivalent) hypotheses: H 0 : a = 0 (off) versus H 1 : a = A (on). Then, the likelihood ratio is Λ(x) = p(x ; a = A) p(x ; a = 0) = 1/(2πσ2 ) N/2 exp[ 1 N 2σ 2 n=1 (x[n] A)2 ] 1/(2πσ 2 ) N/2 exp( 1 N. 2σ 2 n=1 x[n]2 ) Now, take the logarithm and, after simple manipulations, reduce our likelihood-ratio test to comparing T (x) = x = 1 N N x[n] n=1 with a threshold γ. [Here, T (x) is a monotonic function of Λ(x).] If T (x) > γ accept H 1 (i.e. reject H 0 ), otherwise accept EE 527, Detection and Estimation Theory, # 5 34

35 H 0 (well, not exactly, we will talk more about this decision on p. 59). The choice of γ depends on the approach that we take. For the Bayes decision rule, γ is a function of π 0 and π 1. For the Neyman-Pearson test, γ is chosen to achieve (control) a desired P FA. Bayesian decision-theoretic detection for 0-1 loss (corresponding to minimizing the average error EE 527, Detection and Estimation Theory, # 5 35

36 probability): log Λ(x) = 1 2 σ σ 2 2 A ( N n=1 N (x[n] A) σ 2 n=1 N (x[n] x[n] } {{ } 0 n=1 ( N n=1 ) x[n] N n=1 (x[n]) 2 H ( 1 π0 ) log π 1 +A) (x[n] + x[n] A) H ( 1 π0 ) log π 1 ) x[n] A 2 N H ( 1 2 σ 2 π0 ) log π 1 AN 2 H 1 σ2 A log ( π0 π 1 ) σ2 since A > 0 and, finally, x H 1 N A log(π 0/π 1 ) + A } {{ 2} γ which, for the practically most interesting case of equiprobable hypotheses π 0 = π 1 = 1 2 (18) simplifies to x H 1 A 2 }{{} γ known as the maximum-likelihood test (i.e. the Bayes decision rule for 0-1 loss and a priori equiprobable hypotheses is defined as the maximum-likelihood test). This maximum-likelihood detector does not require the knowledge of the noise variance EE 527, Detection and Estimation Theory, # 5 36

37 σ 2 to declare its decision. However, the knowledge of σ 2 is key to assessing the detection performance. Interestingly, these observations will carry over to a few maximum-likelihood tests that we will derive in the future. Assuming (18), we now derive the minimum average error probability. First, note that X a = 0 N (0, σ 2 /N) and X a = A N (A, σ 2 /N). Then min av. error prob. = 1 2 P [X > A } {{ 2 a = 0] } +1 P FA = 1 2 P [ P [ X σ2 /N } {{ } standard normal random var. X A σ2 /N } {{ } standard normal random var. = 1 2 ( N A Q ) } {{ 4 σ 2 } standard normal complementary cdf > A/2 σ2 /N ; a = 0 ] < A/2 A σ2 /N ; a = A ] ( N A Φ 2) } {{ 4 σ 2 } standard normal cdf 2 P [X < A } {{ 2 a = A] } P M ( = Q 1 2 N A 2 σ 2 ) EE 527, Detection and Estimation Theory, # 5 37

38 since Φ( x) = Q(x). The minimum probability of error decreases monotonically with N A 2 /σ 2, which is known as the deflection coefficient. In this case, the Chernoff information is equal to A2 (see Lemma 1): 8 σ 2 { min log λ [0,1] [ λ (1 λ) = min λ [0,1] 2 } [N (0, σ 2 /N)] } {{ } λ [N (A, σ 2 /N)] } {{ } 1 λ dx p 0 (x) p 1 (x) A2 σ 2 ] = A2 8 σ 2 and, therefore, we expect the following asymptotic behavior: min av. error probability N + ) f(n) exp ( N A2 8 σ 2 (19) where f(n) varies slowly with N when N is large, compared with the exponential term: lim N log f(n) N = 0. Indeed, in our example, ( N A Q 2 ) N + 4 σ 2 1 N A 2 2π 4 σ } 2 {{ } f(n) ( exp N ) A2 8 σ 2 EE 527, Detection and Estimation Theory, # 5 38

39 where we have used the asymptotic formula Q(x) x + 1 x 2π exp( x2 /2) (20) given e.g. in equation (2.4) of Kay-II. EE 527, Detection and Estimation Theory, # 5 39

40 Known DC Level in AWGN: Neyman-Pearson Approach We now derive the Neyman-Pearson test for detecting a known DC level in AWGN. Recall that our likelihood-ratio test compares with a threshold γ. T (x) = x = 1 N N 1 n=0 x[n] If T (x) > γ, decide H 1 (i.e. reject H 0 ), otherwise decide H 0 (see also the discussion on p. 59). Performance evaluation: Assuming (17), we have T (x) a N (a, σ 2 /N). Therefore, T (X) a = 0 N (0, σ 2 /N), implying ( γ ) P FA = P [T (X) > γ ; a = 0] = Q σ2 /N EE 527, Detection and Estimation Theory, # 5 40

41 and we obtain the decision threshold as follows: γ = σ 2 N Q 1 (P FA ). Now, T (X) a = A N (A, σ 2 /N), implying ( γ A ) P D = P (T (X) > γ a = A) = Q σ2 /N ( ) = Q Q 1 A (P FA ) 2 σ 2 /N ( ) = Q Q 1 (P FA ) NA2 /σ } {{ } 2. = SNR = d 2 Given the false-alarm probability P FA, the detection probability P D depends only on the deflection coefficient: d 2 = NA2 σ 2 = {E [T (X) a = A] E [T (X) a = 0]}2 var[t (X a = 0)] which is also (a reasonable definition for) the signal-to-noise ratio (SNR). EE 527, Detection and Estimation Theory, # 5 41

42 Receiver Operating Characteristics (ROC) Comments: P D = Q ( Q 1 (P FA ) d ). As we raise the threshold γ, P FA goes down but so does P D. ROC should be above the 45 line otherwise we can do better by flipping a coin. Performance improves with d 2. EE 527, Detection and Estimation Theory, # 5 42

43 Typical Ways of Depicting the Detection Performance Under the Neyman-Pearson Setup To analyze the performance of a Neyman-Pearson detector, we examine two relationships: Between P D and P FA, for a given SNR, called the operating characteristics (ROC). receiver Between P D and SNR, for a given P FA. Here are examples of the two: see Figs. 3.8 and 3.5 in Kay-II, respectively. EE 527, Detection and Estimation Theory, # 5 43

44 Asymptotic (as N and P FA 0) P D and P M for a Known DC Level in AWGN We apply the Chernoff-Stein lemma, for which we need to compute the following K-L distance: log where ( ) D p(x n a = 0) p(x n a = A) { = E p(xn a=0) log [ p(xn a = A) ]} p(x n a = 0) p(x n a = 0) = N (0, σ 2 ) [ p(xn a = A) ] = (x n A) 2 p(x n a = 0) 2 σ 2 + x2 n 2 σ 2 = 1 2σ 2 (2 A x n A 2 ). Therefore, ( ) D p(x n θ 0 ) p(x n θ 1 ) = A2 2 σ 2 and the Chernoff-Stein lemma predicts the following behavior of the detection probability as N and P FA 0: ( ) P D 1 f(n) exp N A2 } {{ 2 σ 2 } P M EE 527, Detection and Estimation Theory, # 5 44

45 where f(n) is a slowly-varying function of N compared with the exponential term. In this case, the exact expression for P M (P D ) is available and consistent with the Chernoff-Stein lemma: P M = 1 Q (Q 1 (P FA ) ) NA 2 /σ 2 ) = Q( NA2 /σ 2 Q 1 (P FA ) N + = = 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [ } NA 2 /σ 2 Q 1 (P FA )] 2 /2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp NA 2 /(2σ 2 ) [Q 1 (P FA )] 2 /2 +Q 1 (P FA ) } NA 2 /σ 2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [Q 1 (P FA )] 2 /2 + Q 1 (P FA ) } NA 2 /σ 2 exp[ NA 2 /(2σ 2 )] } {{ } as predicted by the Chernoff-Stein lemma (21) where we have used the asymptotic formula (20). When P FA EE 527, Detection and Estimation Theory, # 5 45

46 is small and N is large, the first two (green) terms in the above expression make a slowly-varying function f(n) of N. Note that the exponential term in (21) does not depend on the false-alarm probability P FA (or, equivalently, on the choice of the decision threshold). The Chernoff-Stein lemma asserts that this is not a coincidence. Comment. For detecting a known DC level in AWGN: The slope of the exponential-decay term of P M is A 2 /(2σ 2 ) which is different from (larger than, in this case) the slope of the exponential decay of the minimum average error probability: A 2 /(8σ 2 ). EE 527, Detection and Estimation Theory, # 5 46

47 Decentralized Neyman-Pearson Detection for Simple Hypotheses Consider a sensor-network scenario depicted by Assumption: Observations made at N spatially distributed sensors (nodes) follow the same marginal probabilistic model: H i : x n p(x n θ i ) where n = 1, 2,..., N and i {0, 1} for binary hypotheses. Each node n makes a hard local decision d n based on its local observation x n and sends it to the headquarters (fusion center), which collects all the local decisions and makes the final global EE 527, Detection and Estimation Theory, # 5 47

48 decision about H 0 or H 1. This structure is clearly suboptimal it is easy to construct a better decision strategy in which each node sends its (quantized, in practice) likelihood ratio to the fusion center, rather than the decision only. However, such a strategy would have a higher communication (energy) cost. We now go back to the decentralized detection problem. Suppose that each node n makes a local decision d n {0, 1}, n = 1, 2,..., N and transmits it to the fusion center. Then, the fusion center makes the global decision based on the likelihood ratio formed from the d n s. The simplest fusion scheme is based on the assumption that the d n s are conditionally independent given θ (which may not always be reasonable, but leads to an easy solution). We can now write p(d n θ 1 ) = P d n D,n (1 P D,n ) 1 d n } {{ } Bernoulli pmf where P D,n Similarly, is the nth sensor s local detection probability. p(d n θ 0 ) = P d n FA,n (1 P FA,n ) 1 d n } {{ } Bernoulli pmf where P FA,n is the nth sensor s local detection false-alarm EE 527, Detection and Estimation Theory, # 5 48

49 probability. Now, log Λ(d) = = N log n=1 N log n=1 [ p(dn θ 1 ) ] p(d n θ 0 [ P d n D,n (1 P D,n ) 1 d n P d n FA,n (1 P FA,n ) 1 d n ] H1 log τ. To further simplify the exposition, we assume that all sensors have identical performance: P D,n = P D, P FA,n = P FA. Define the number of sensors having d n = 1: u 1 = N n=1 d n Then, the log-likelihood ratio becomes log Λ(d) = u 1 log ( PD P FA ) + (N u 1 ) log ( 1 PD 1 P FA ) H1 log τ or u 1 log [ PD (1 P FA ) ] H1 ( 1 PFA ) log τ + N log. (22) P FA (1 P D ) 1 P D EE 527, Detection and Estimation Theory, # 5 49

50 Clearly, each node s local decision d n P D > P FA, which implies is meaningful only if P D (1 P FA ) P FA (1 P D ) > 1 the logarithm of which is therefore positive, and the decision rule (22) further simplifies to u 1 H 1 τ. The Neyman-Person performance analysis of this detector is easy: the random variable U 1 is binomial given θ (i.e. conditional on the hypothesis) and, therefore, P [U 1 = u 1 ] = ( ) N p u 1 (1 p) N u 1 u 1 where p = P FA under H 0 and p = P D under H 1. Hence, the global false-alarm probability is P FA,global = P [U 1 > τ θ 0 ] = N u 1 = τ ( N u 1 ) P u 1 FA (1 P FA ) N u 1. Clearly, P FA,global will be a discontinuous function of τ and therefore, we should choose our P FA,global specification from EE 527, Detection and Estimation Theory, # 5 50

51 the available discrete choices. But, if none of the candidate choices is acceptable, this means that our current system does not satisfy the requirements and a remedial action is needed, e.g. increasing the quantity (N) or improving the quality of sensors (through changing P D and P FA ), or both. EE 527, Detection and Estimation Theory, # 5 51

52 Testing Multiple Hypotheses Suppose now that we choose Θ 0, Θ 1,..., Θ M 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 Θ M 1 = Θ, Θ i Θ j = i j. We wish to distinguish among M > 2 hypotheses, i.e. identify which hypothesis is true: H 0 : θ Θ 0 versus H 1 : θ Θ 1 versus. versus H M 1 : θ Θ M 1 and, consequently, our action space consists of M choices. We design a decision rule φ : X (0, 1,..., M 1): φ(x) = 0, decide H 0, 1,. decide H 1, M 1, decide H M 1 where φ partitions the data space X [i.e. the support of p(x θ)] into M regions: Rule φ: X 0 = {x : φ(x) = 0},..., X M 1 = {x : φ(x) = M 1}. EE 527, Detection and Estimation Theory, # 5 52

53 We specify the loss function using L(i m), where, typically, the losses due to correct decisions are set to zero: L(i i) = 0, i = 0, 1,..., M 1. Here, we adopt zero losses for correct decisions. posterior expected loss takes M values: Now, our ρ m (x) = = M 1 i=0 M 1 i=0 Θ i L(m i) p(θ x) dθ L(m i) Θ i p(θ x) dθ, m = 0, 1,..., M 1. Then, the Bayes decision rule φ is defined via the following data-space partitioning: X m = {x : ρ m (x) = min ρ l(x)}, m = 0, 1,..., M 1 0 l M 1 or, equivalently, upon applying the Bayes rule X m = = { M 1 } x : m = arg min L(l i) p(x θ) π(θ) dθ 0 l M 1 i=0 Θ i { M 1 x : m = arg min L(l i) 0 l M 1 i=0 Θ i p(x θ) π(θ) dθ }. EE 527, Detection and Estimation Theory, # 5 53

54 The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = = M 1 m=0 M 1 m=0 M 1 i=0 X m X m Θ i L(m i) p(x θ)π(θ) dθ dx M 1 L(m i) p(x θ)π(θ) dθ i=0 Θ } {{ i } h m (θ) dx. Then, for an arbitrary h m (x), [ M 1 m=0 ] h m (x) dx X m [ M 1 m=0 X m ] h m (x) dx 0 which verifies that the Bayes decision rule φ minimizes the preposterior (Bayes) risk. EE 527, Detection and Estimation Theory, # 5 54

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. Neyman-Pearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched

### The CUSUM algorithm a small review. Pierre Granjon

The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................

### Lecture 8: Signal Detection and Noise Assumption

ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma

Interpreting Kullback-Leibler Divergence with the Neyman-Pearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minami-azabu

### Hypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 One-Sided and Two-Sided Tests

Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis

### Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

### L10: Probability, statistics, and estimation theory

L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian

### ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 8 Hypothesis Testing Robert Vanderbei Fall 2015 Slides last edited on December 11, 2015 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

### False Discovery Rates

False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

### Chapter 9 Testing Hypothesis and Assesing Goodness of Fit

Chapter 9 Testing Hypothesis and Assesing Goodness of Fit 9.1-9.3 The Neyman-Pearson Paradigm and Neyman Pearson Lemma We begin by reviewing some basic terms in hypothesis testing. Let H 0 denote the null

### Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation

Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015

### Prep Course in Statistics Prof. Massimo Guidolin

MSc. Finance & CLEFIN A.A. 3/ ep Course in Statistics of. Massimo Guidolin A Few Review Questions and oblems concerning, Hypothesis Testing and Confidence Intervals SUGGESTION: try to approach the questions

### A crash course in probability and Naïve Bayes classification

Probability theory A crash course in probability and Naïve Bayes classification Chapter 9 Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s

### Multiple Testing. Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Abstract

Multiple Testing Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf Abstract Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about

### IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341 Multinode Cooperative Communications in Wireless Networks Ahmed K. Sadek, Student Member, IEEE, Weifeng Su, Member, IEEE, and K.

### Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0

Probability & Statistics Primer Gregory J. Hakim University of Washington 2 January 2009 v2.0 This primer provides an overview of basic concepts and definitions in probability and statistics. We shall

### Lecture 9: Bayesian hypothesis testing

Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### [Chapter 10. Hypothesis Testing]

[Chapter 10. Hypothesis Testing] 10.1 Introduction 10.2 Elements of a Statistical Test 10.3 Common Large-Sample Tests 10.4 Calculating Type II Error Probabilities and Finding the Sample Size for Z Tests

### Principle of Data Reduction

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

### Introduction to Probability

Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence

### 33. STATISTICS. 33. Statistics 1

33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### 3.6: General Hypothesis Tests

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

### Optimal Stopping in Software Testing

Optimal Stopping in Software Testing Nilgun Morali, 1 Refik Soyer 2 1 Department of Statistics, Dokuz Eylal Universitesi, Turkey 2 Department of Management Science, The George Washington University, 2115

### . (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

### 1. R In this and the next section we are going to study the properties of sequences of real numbers.

+a 1. R In this and the next section we are going to study the properties of sequences of real numbers. Definition 1.1. (Sequence) A sequence is a function with domain N. Example 1.2. A sequence of real

### Bayesian probability theory

Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations

### Sufficient Statistics and Exponential Family. 1 Statistics and Sufficient Statistics. Math 541: Statistical Theory II. Lecturer: Songfeng Zheng

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Sufficient Statistics and Exponential Family 1 Statistics and Sufficient Statistics Suppose we have a random sample X 1,, X n taken from a distribution

### Classical Hypothesis Testing and GWAS

Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 15, 2014 Outline General Principles Classical statistical hypothesis testing involves the test of a null hypothesis

### TRANSFORMATIONS OF RANDOM VARIABLES

TRANSFORMATIONS OF RANDOM VARIABLES 1. INTRODUCTION 1.1. Definition. We are often interested in the probability distributions or densities of functions of one or more random variables. Suppose we have

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Recursive Estimation

Recursive Estimation Raffaello D Andrea Spring 04 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March 8, 05 Notes: Notation: Unlessotherwisenoted,x, y,andz denoterandomvariables, f x

### ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

### Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

### Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

### UNIVERSITY OF MINNESOTA. Alejandro Ribeiro

UNIVERSITY OF MINNESOTA This is to certify that I have examined this copy of a Master s thesis by Alejandro Ribeiro and have found that it is complete and satisfactory in all respects, and that any and

### A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### 1 Prior Probability and Posterior Probability

Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

### Lecture 2: Statistical Estimation and Testing

Bioinformatics: In-depth PROBABILITY AND STATISTICS Spring Semester 2012 Lecture 2: Statistical Estimation and Testing Stefanie Muff (stefanie.muff@ifspm.uzh.ch) 1 Problems in statistics 2 The three main

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Signal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING

C H A P T E R 4 Signal Detection 4. SIGNAL DETECTION AS HYPOTHESIS TESTING In Chapter 3 we considered hypothesis testing in the context of random variables. The detector resulting in the minimum probability

### LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

### Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks

Comparison of Network Coding and Non-Network Coding Schemes for Multi-hop Wireless Networks Jia-Qi Jin, Tracey Ho California Institute of Technology Pasadena, CA Email: {jin,tho}@caltech.edu Harish Viswanathan

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### A New Interpretation of Information Rate

A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes

### WE consider a decentralized detection problem and study

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 4, APRIL 2011 1759 On Decentralized Detection With Partial Information Sharing Among Sensors O. Patrick Kreidl, John N. Tsitsiklis, Fellow, IEEE, and

### Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### MATH4427 Notebook 3 Spring MATH4427 Notebook Hypotheses: Definitions and Examples... 3

MATH4427 Notebook 3 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MATH4427 Notebook 3 3 3.1 Hypotheses: Definitions and Examples............................

### What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

### Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Basics of Floating-Point Quantization

Chapter 2 Basics of Floating-Point Quantization Representation of physical quantities in terms of floating-point numbers allows one to cover a very wide dynamic range with a relatively small number of

### A direct approach to false discovery rates

J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis

### On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Finance 400 A. Penati - G. Pennacchi Notes on On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information by Sanford Grossman This model shows how the heterogeneous information

### Statistical foundations of machine learning

Machine learning p. 1/45 Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Machine learning p. 2/45

### CPC/CPA Hybrid Bidding in a Second Price Auction

CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09-074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,

### Distributed Detection Systems. Hamidreza Ahmadi

Channel Estimation Error in Distributed Detection Systems Hamidreza Ahmadi Outline Detection Theory Neyman-Pearson Method Classical l Distributed ib t Detection ti Fusion and local sensor rules Channel

### A Statistical Framework for Operational Infrasound Monitoring

A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,

### Machine Learning. Topic: Evaluating Hypotheses

Machine Learning Topic: Evaluating Hypotheses Bryan Pardo, Machine Learning: EECS 349 Fall 2011 How do you tell something is better? Assume we have an error measure. How do we tell if it measures something

### THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

### Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### Random variables, probability distributions, binomial random variable

Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that

### Hypothesis Testing. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 10

Hypothesis Testing Mark Huiskes, LIACS mark.huiskes@liacs.nl Estimating sample standard deviation Suppose we have a sample x_1,, x_n Average: xbar = (x_1 + + x_n) / n Standard deviation estimate s? Two

### Exact Nonparametric Tests for Comparing Means - A Personal Summary

Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola

### Important Probability Distributions OPRE 6301

Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in real-life applications that they have been given their own names.

### Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize

### ALOHA Performs Delay-Optimum Power Control

ALOHA Performs Delay-Optimum Power Control Xinchen Zhang and Martin Haenggi Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556, USA {xzhang7,mhaenggi}@nd.edu Abstract As

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

### Radar Systems Engineering Lecture 6 Detection of Signals in Noise

Radar Systems Engineering Lecture 6 Detection of Signals in Noise Dr. Robert M. O Donnell Guest Lecturer Radar Systems Course 1 Detection 1/1/010 Block Diagram of Radar System Target Radar Cross Section

### Stat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3

Stat26: Bayesian Modeling and Inference Lecture Date: February 1, 21 Lecture 3 Lecturer: Michael I. Jordan Scribe: Joshua G. Schraiber 1 Decision theory Recall that decision theory provides a quantification

### On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

### Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01 Probability and Statistics for Engineers Winter 2010-2011 Contents 1 Joint Probability Distributions 1 1.1 Two Discrete

### Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify

### P (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )

Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

### Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

### Maximum Likelihood Estimation

Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

### Optimal Performance for Detection Systems in Wireless Passive Sensor Networks

Optimal Performance for Detection Systems in Wireless Passive Sensor Networks Ashraf Tantawy, Xenofon Koutsoukos, and Gautam Biswas Institute for Software Integrated Systems (ISIS) Department of Electrical

### Non Parametric Inference

Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

### Three-Stage Phase II Clinical Trials

Chapter 130 Three-Stage Phase II Clinical Trials Introduction Phase II clinical trials determine whether a drug or regimen has sufficient activity against disease to warrant more extensive study and development.

### On Decentralized Detection with Partial Information Sharing among Sensors

On Decentralized Detection with Partial Information Sharing among Sensors The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation