Introduction to Detection Theory


 Augustus Walters
 1 years ago
 Views:
Transcription
1 Introduction to Detection Theory Reading: Ch. 3 in KayII. Notes by Prof. Don Johnson on detection theory, see Ch. 10 in Wasserman. EE 527, Detection and Estimation Theory, # 5 1
2 Introduction to Detection Theory (cont.) We wish to make a decision on a signal of interest using noisy measurements. Statistical tools enable systematic solutions and optimal design. Application areas include: Communications, Radar and sonar, Nondestructive evaluation (NDE) of materials, Biomedicine, etc. EE 527, Detection and Estimation Theory, # 5 2
3 Example: Radar Detection. We wish to decide on the presence or absence of a target. EE 527, Detection and Estimation Theory, # 5 3
4 Introduction to Detection Theory We assume a parametric measurement model p(x θ) [or p(x ; θ), which is the notation that we sometimes use in the classical setting]. In point estimation theory, we estimated the parameter θ Θ given the data x. Suppose now that we choose Θ 0 and Θ 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 = Θ, Θ 0 Θ 1 =. In detection theory, we wish to identify which hypothesis is true (i.e. make the appropriate decision): H 0 : θ Θ 0, null hypothesis H 1 : θ Θ 1, alternative hypothesis. Terminology: If θ can only take two values, Θ = {θ 0, θ 1 }, Θ 0 = {θ 0 }, Θ 1 = {θ 1 } we say that the hypotheses are simple. Otherwise, we say that they are composite. Composite Hypothesis Example: H 0 : θ = 0 versus H 1 : θ (0, ). EE 527, Detection and Estimation Theory, # 5 4
5 The Decision Rule We wish to design a decision rule (function) φ(x) : X (0, 1): φ(x) = { 1, decide H1, 0, decide H 0. which partitions the data space X [i.e. the support of p(x θ)] into two regions: Rule φ(x): X 0 = {x : φ(x) = 0}, X 1 = {x : φ(x) = 1}. Let us define probabilities of false alarm and miss: P FA = E x θ [φ(x) θ] = p(x θ) dx for θ in Θ 0 X 1 P M = E x θ [1 φ(x) θ] = 1 p(x θ) dx X 1 = p(x θ) dx for θ in Θ 1. X 0 Then, the probability of detection (correctly deciding H 1 ) is P D = 1 P M = E x θ [φ(x) θ] = X 1 p(x θ) dx for θ in Θ 1. EE 527, Detection and Estimation Theory, # 5 5
6 Note: P FA and P D /P M are generally functions of the parameter θ (where θ Θ 0 when computing P FA and θ Θ 1 when computing P D /P M ). More Terminology. Statisticians use the following terminology: False alarm Type I error Miss Type II error Probability of detection Power Probability of false alarm Significance level. EE 527, Detection and Estimation Theory, # 5 6
7 Bayesian Decisiontheoretic Detection Theory Recall (a slightly generalized version of) the posterior expected loss: ρ(action x) = L(θ, action) p(θ x) dθ Θ that we introduced in handout # 4 when we discussed Bayesian decision theory. Let us now apply this theory to our easy example discussed here: hypothesis testing, where our action space consists of only two choices. We first assign a loss table: decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) x X 0 L(0 1) L(0 0) = 0 with the loss function described by the quantities L(declared true): L(1 0) quantifies loss due to a false alarm, L(0 1) quantifies loss due to a miss, L(1 1) and L(0 0) (losses due to correct decisions) typically set to zero in real life. Here, we adopt zero losses for correct decisions. EE 527, Detection and Estimation Theory, # 5 7
8 Now, our posterior expected loss takes two values: ρ 0 (x) = L(0 1) p(θ x) dθ Θ 1 + L(0 0) p(θ x) dθ } {{ } Θ 0 0 = L(0 1) p(θ x) dθ Θ 1 L(0 1) is constant = L(0 1) p(θ x) dθ Θ } 1 {{ } P [θ Θ 1 x] and, similarly, ρ 1 (x) = L(1 0) p(θ x) dθ Θ 0 L(1 0) is constant = L(1 0) p(θ x) dθ. Θ } 0 {{ } P [θ Θ 0 x] We define the Bayes decision rule as the rule that minimizes the posterior expected loss; this rule corresponds to choosing the dataspace partitioning as follows: X 1 = {x : ρ 1 (x) ρ 0 (x)} EE 527, Detection and Estimation Theory, # 5 8
9 or P [θ Θ 1 x] { }} { { p(θ x) dθ } Θ X 1 = x : 1 L(1 0) L(0 1) p(θ x) dθ Θ } 0 {{ } P [θ Θ 0 x] or, equivalently, upon applying the Bayes rule: X 1 = { x : Θ 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 01 loss: For L(1 0) = L(0 1) = 1, we have (1) } L(1 0) L(0 1). (2) decision rule φ true state Θ 1 Θ 0 x X 1 L(1 1) = 0 L(1 0) = 1 x X 0 L(0 1) = 1 L(0 0) = 0 yielding the following Bayes decision rule, called the maximum a posteriori (MAP) rule: X 1 = { x : P [θ Θ 1 x] P [θ Θ 0 x] 1 } or, equivalently, upon applying the Bayes rule: { } Θ X 1 = x : 1 p(x θ) π(θ) dθ Θ 0 p(x θ) π(θ) dθ 1. (4) (3) EE 527, Detection and Estimation Theory, # 5 9
10 Simple hypotheses. Let us specialize (1) to the case of simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): X 1 = { x : p(θ 1 x) p(θ 0 x) } {{ } posteriorodds ratio } L(1 0) L(0 1). (5) We can rewrite (5) using the Bayes rule: X 1 = { x : p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio } π 0 L(1 0) π 1 L(0 1) (6) where π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 describe the prior probability mass function (pmf) of the binary random variable θ (recall that θ {θ 0, θ 1 }). Hence, for binary simple hypotheses, the prior pmf of θ is the Bernoulli pmf. EE 527, Detection and Estimation Theory, # 5 10
11 Preposterior (Bayes) Risk The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 L(1 0) p(x θ)π(θ) dθ dx Θ 0 Θ 1 L(0 1) p(x θ)π(θ) dθ dx. How do we choose the rule φ(x) that minimizes the preposterior EE 527, Detection and Estimation Theory, # 5 11
12 risk? L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 = L(1 0) p(x θ)π(θ) dθ dx X 1 Θ 0 L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 0 Θ 1 + L(0 1) p(x θ)π(θ) dθ dx X 1 Θ 1 = const } {{ } not dependent on φ(x) { + L(1 0) p(x θ)π(θ) dθ X 1 Θ 0 } L(0 1) p(x θ)π(θ) dθ dx Θ 1 implying that X 1 should be chosen as { } X 1 : L(1 0) p(x θ)π(θ) dθ L(0 1) p(x θ)π(θ) < 0 Θ 0 Θ 1 EE 527, Detection and Estimation Theory, # 5 12
13 which, as expected, is the same as (2), since minimizing the posterior expected loss minimizing the preposterior risk for every x as showed earlier in handout # loss: For the 01 loss, i.e. L(1 0) = L(0 1) = 1, the preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx (7) which is simply the average error probability, with averaging performed over the joint probability density or mass function (pdf/pmf) or the data x and parameters θ. EE 527, Detection and Estimation Theory, # 5 13
14 Bayesian Decisiontheoretic Detection for Simple Hypotheses The Bayes decision rule for simple hypotheses is (6): Λ(x) } {{ } likelihood ratio = p(x θ 1) p(x θ 0 ) H 1 π 0 L(1 0) π 1 L(0 1) τ (8) see also Ch. 3.7 in KayII. (Recall that Λ(x) is the sufficient statistic for the detection problem, see p. 37 in handout # 1.) Equivalently, log Λ(x) = log[p(x θ 1 )] log[p(x θ 0 )] H 1 log τ τ. Minimum Average Error Probability Detection: In the familiar 01 loss case where L(1 0) = L(0 1) = 1, we know that the preposterior (Bayes) risk is equal to the average error probability, see (7). This average error probability greatly EE 527, Detection and Estimation Theory, # 5 14
15 simplifies in the simple hypothesis testing case: av. error probability = L(1 0) p(x θ } {{ } 0 )π 0 dx X L(0 1) p(x θ } {{ } 1 )π 1 dx X 0 1 = π 0 p(x θ 0 ) dx +π 1 p(x θ 1 ) dx X } 1 X {{ } } 0 {{ } P FA P M where, as before, the averaging is performed over the joint pdf/pmf of the data x and parameters θ, and π 0 = π(θ 0 ), π 1 = π(θ 1 ) = 1 π 0 (the Bernoulli pmf). In this case, our Bayes decision rule simplifies to the MAP rule (as expected, see (5) and Ch. 3.6 in KayII): p(θ 1 x) p(θ 0 x) } {{ } posteriorodds ratio or, equivalently, upon applying the Bayes rule: p(x θ 1 ) p(x θ 0 ) } {{ } likelihood ratio H 1 1 (9) H 1 π 0 π 1. (10) EE 527, Detection and Estimation Theory, # 5 15
16 which is the same as (4), upon substituting the Bernoulli pmf as the prior pmf for θ and (8), upon substituting L(1 0) = L(0 1) = 1. EE 527, Detection and Estimation Theory, # 5 16
17 Bayesian Decisiontheoretic Detection Theory: Handling Nuisance Parameters We apply the same approach as before integrate the nuisance parameters (ϕ, say) out! Therefore, (1) still holds for testing H 0 : θ Θ 0 versus H 1 : θ Θ 1 but p θ x (θ x) is computed as follows: p θ x (θ x) = p θ,ϕ x (θ, ϕ x) dϕ and, therefore, Θ 1 Θ 0 p(θ x) { }} { p θ,ϕ x (θ, ϕ x) dϕ dθ p θ,ϕ x (θ, ϕ x) dϕ } {{ } p(θ x) dθ H 1 L(1 0) L(0 1) (11) EE 527, Detection and Estimation Theory, # 5 17
18 or, equivalently, upon applying the Bayes rule: Θ 1 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ Θ 0 px θ,ϕ (x θ, ϕ) π θ,ϕ (θ, ϕ) dϕ dθ H 1 L(1 0) L(0 1). (12) Now, if θ and ϕ are independent a priori, i.e. π θ,ϕ (θ, ϕ) = π θ (θ) π ϕ (ϕ) (13) then (12) can be rewritten as Θ 1 π θ (θ) Θ 0 π θ (θ) p(x θ) { }} { p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) dϕ dθ p x θ,ϕ (x θ, ϕ) π ϕ (ϕ) } {{ } p(x θ) dϕ dθ H 1 L(1 0) L(0 1). (14) Simple hypotheses and independent priors for θ and ϕ: Let us specialize (11) to the simple hypotheses (Θ 0 = {θ 0 }, Θ 1 = {θ 1 }): pθ,ϕ x (θ 1, ϕ x) dϕ pθ,ϕ x (θ 0, ϕ x) dϕ H 1 L(1 0) L(0 1). (15) Now, if θ and ϕ are independent a priori, i.e. (13) holds, then EE 527, Detection and Estimation Theory, # 5 18
19 we can rewrite (14) [or (15) using the Bayes rule]: px θ,ϕ (x θ 1, ϕ) π ϕ (ϕ) dϕ px θ,ϕ (x θ 0, ϕ) π ϕ (ϕ) dϕ } {{ } integrated likelihood ratio = same as (6) { }} { p(x θ 1 ) H 1 π 0 L(1 0) p(x θ 0 ) π 1 L(0 1) (16) where π 0 = π θ (θ 0 ), π 1 = π θ (θ 1 ) = 1 π 0. EE 527, Detection and Estimation Theory, # 5 19
20 Chernoff Bound on Average Error Probability for Simple Hypotheses Recall that minimizing the average error probability av. error probability = X 1 + X 0 p(x θ)π(θ) dθ dx Θ 0 Θ 1 p(x θ)π(θ) dθ dx leads to the MAP decision rule: X 1 = { x : Θ 0 p(x θ) π(θ) dθ } p(x θ) π(θ) dθ < 0. Θ 1 In many applications, we may not be able to obtain a simple closedform expression for the minimum average error EE 527, Detection and Estimation Theory, # 5 20
21 probability, but we can bound it as follows: min av. error probability = X1 p(x θ)π(θ) dθ dx Θ 0 + p(x θ)π(θ) dθ dx Θ 1 X 0 using the def. of X 1 = X }{{} data space { } min p(x θ)π(θ) dθ, p(x θ)π(θ) dθ) dx Θ 0 Θ 1 = X X [ = q 0 (x) = q { }} { 1 (x) ] λ [ { }} { ] 1 λ p(x θ)π(θ) dθ p(x θ)π(θ) dθ dx Θ 0 Θ 1 [q 0 (x)] λ [q 1 (x)] 1 λ dx which is the Chernoff bound on the minimum average error probability. Here, we have used the fact that min{a, b} a λ b 1 λ, for 0 λ 1, a, b 0. When x = x 1 x 2. x N EE 527, Detection and Estimation Theory, # 5 21
22 with x 1, x 2,... x N conditionally independent, identically distributed (i.i.d.) given θ and simple hypotheses (i.e. Θ 0 = {θ 0 }, Θ 1 = {θ 1 }) then N q 0 (x) = p(x θ 0 ) π 0 = π 0 p(x n θ 0 ) yielding n=1 N q 1 (x) = p(x θ 1 ) π 1 = π 1 p(x n θ 1 ) n=1 Chernoff bound for N conditionally i.i.d. measurements (given θ) and simple hyp. [ N ] λ N 1 λ = π 0 p(x n θ 0 ) [π 1 p(x n θ 1 )] dx n=1 = π λ 0 π 1 λ 1 = π λ 0 π 1 λ 1 N n=1 n=1 { [p(x n θ 0 )] λ [p(x n θ 1 )] 1 λ dx n } { [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1 } N EE 527, Detection and Estimation Theory, # 5 22
23 or, in other words, 1 N log(min av. error probability) log(πλ 0 π1 1 λ ) + log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ dx 1, λ [0, 1]. If π 0 = π 1 = 1/2 (which is almost always the case of interest when evaluating average error probabilities), we can say that, as N, min av. error probability for N cond. i.i.d. measurements (given θ) and simple hypotheses ( { f(n) exp N min log [p(x 1 θ 0 )] λ [p(x 1 θ 1 )] 1 λ} ) λ [0,1] } {{ } Chernoff information for a single observation where f(n) is a slowlyvarying function compared with the exponential term: lim N log f(n) N = 0. Note that the Chernoff information in the exponent term of the above expression quantifies the asymptotic behavior of the minimum average error probability. We now give a useful result, taken from EE 527, Detection and Estimation Theory, # 5 23
24 K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., San Diego, CA: Academic Press, 1990 for evaluating a class of Chernoff bounds. Lemma 1. Consider p 1 (x) = N (µ 1, Σ 1 ) and p 2 (x) = N (µ 2, Σ 2 ). Then [p 1 (x)] λ [p 2 (x)] 1 λ dx = exp[ g(λ)] where g(λ) = λ (1 λ) 2 (µ 2 µ 1 ) T [λ Σ 1 + (1 λ) Σ 2 ] 1 (µ 2 µ 1 ) [ λ log Σ1 + (1 λ) Σ 2 ] Σ 1 λ Σ 2 1 λ. EE 527, Detection and Estimation Theory, # 5 24
25 Probabilities of False Alarm (P FA ) and Detection (P D ) for Simple Hypotheses 0.2 PDF under H 0 PDF under H 1 Probability Density Function P FA P D τ Test Statistic Comments: Λ(x) { }} { P FA = P [ test } statistic {{ > τ} θ = θ 0 ] X X 1 P D = P [test statistic > τ } {{ } X X 0 θ = θ 1 ]. (i) As the region X 1 shrinks (i.e. τ ), both of the above probabilities shrink towards zero. EE 527, Detection and Estimation Theory, # 5 25
26 (ii) As the region X 1 grows (i.e. τ 0), both of these probabilities grow towards unity. (iii) Observations (i) and (ii) do not imply equality between P FA and P D ; in most cases, as R 1 grows, P D grows more rapidly than P FA (i.e. we better be right more often than we are wrong). (iv) However, the perfect case where our rule is always right and never wrong (P D = 1 and P FA = 0) cannot occur when the conditional pdfs/pmfs p(x θ 0 ) and p(x θ 1 ) overlap. (v) Thus, to increase the detection probability P D, we must also allow for the falsealarm probability P FA to increase. This behavior represents the fundamental tradeoff in hypothesis testing and detection theory and motivates us to introduce a (classical) approach to testing simple hypotheses, pioneered by Neyman and Pearson (to be discussed next). EE 527, Detection and Estimation Theory, # 5 26
27 NeymanPearson Test for Simple Hypotheses Setup: Parametric data models p(x ; θ 0 ), p(x ; θ 1 ), Simple hypothesis testing: H 0 : θ = θ 0 versus H 1 : θ = θ 1. No prior pdf/pmf on θ is available. Goal: Design a test that maximizes the probability of detection P D = P [X X 1 ; θ = θ 0 ] (equivalently, minimizes the miss probability P M ) under the constraint P FA = P [X X 1 ; θ = θ 0 ] = α α. Here, we consider simple hypotheses; classical version of testing composite hypotheses is much more complicated. The Bayesian version of testing composite hypotheses is trivial (as EE 527, Detection and Estimation Theory, # 5 27
28 is everything else Bayesian, at least conceptually) and we have already seen it. Solution. We apply the Lagrangemultiplier approach: maximize L = P D + λ (P FA α ) [ ] = p(x ; θ 1 ) dx + λ p(x; θ 0 ) dx α X 1 X 1 = [p(x ; θ 1 ) λ p(x ; θ 0 )] dx λ α. X 1 To maximize L, set X 1 = {x : p(x ; θ 1 ) λ p(x ; θ 0 ) > 0} = { x : p(x ; θ 1) p(x ; θ 0 ) > λ }. Again, we find the likelihood ratio: Recall our constraint: Λ(x) = p(x ; θ 1) p(x ; θ 0 ). X 1 p(x ; θ 0 ) dx = P FA = α α. EE 527, Detection and Estimation Theory, # 5 28
29 If we increase λ, P FA and P D go down. Similarly, if we decrease λ, P FA and P D go up. Hence, to maximize P D, choose λ so that P FA is as big as possible under the constraint. Two useful ways for determining the threshold that achieves a specified falsealarm rate: Find λ that satisfies or, x : Λ(x)>λ p(x ; θ 0 ) dx = P FA = α expressing in terms of the pdf/pmf of Λ(x) under H 0 : λ p Λ;θ0 (l ; θ 0 ) dl = α. or, perhaps, in terms of a monotonic function of Λ(x), say T (x) = monotonic function(λ(x)). Warning: We have been implicitly assuming that P FA is a continuous function of λ. Some (not insightful) technical adjustments are needed if this is not the case. A way of handling nuisance parameters: We can utilize the integrated (marginal) likelihood ratio (16) under the Neyman Pearson setup as well. EE 527, Detection and Estimation Theory, # 5 29
30 ChernoffStein Lemma for Bounding the Miss Probability in NeymanPearson Tests of Simple Hypotheses Recall the definition of the KullbackLeibler (KL) distance D(p q) from one pmf (p) to another (p): D(p q) = k p k log p k q k. The complete proof of this lemma for the discrete (pmf) case is given in Additional Reading: T.M. Cover and J.A. Thomas, Elements of Information Theory. Second ed., New York: Wiley, EE 527, Detection and Estimation Theory, # 5 30
31 Setup for the ChernoffStein Lemma Assume that x 1, x 2,..., x N are conditionally i.i.d. given θ. We adopt the NeymanPearson framework, i.e. obtain a decision threshold to achieve a fixed P FA. Let us study the asymptotic P M = 1 P D as the number of observations N gets large. To keep P FA constant as N increases, we need to make our decision threshold (γ, say) vary with N, i.e. Now, the miss probability is γ = γ N (P FA ) ) P M = P M (γ) = P M (γ N (P FA ). EE 527, Detection and Estimation Theory, # 5 31
32 ChernoffStein Lemma The ChernoffStein lemma says: lim P FA 0 lim N 1 N log P M = ( ) D p(x n θ 0 ) p(x n θ 1 ) } {{ } KL distance for a single observation where the KL distance between p(x n θ 0 ) and p(x n θ 1 ) ( ) D p(x n θ 0 ) p(x n θ 1 ) discrete (pmf) case = x n p(x n θ 0 ) log [ = E p(xn θ 0 ) log p(x n θ 1 ) ] p(x n θ 0 ) [ p(xn θ 1 ) ] p(x n θ 0 ) does not depend on the observation index n, since x n conditionally i.i.d. given θ. are Equivalently, we can state that [ N + P M f(n) exp N D ( p(x n θ 0 ) p(x n θ 1 ) )] as P FA 0 and N, where f(n) is a slowlyvarying function compared with the exponential term (when P FA 0 and N ). EE 527, Detection and Estimation Theory, # 5 32
33 Detection for Simple Hypotheses: Example Known positive DC level in additive white Gaussian noise (AWGN), Example 3.2 in KayII. Consider where H 0 : x[n] = w[n], n = 1, 2,..., N versus H 1 : x[n] = A + w[n], n = 1, 2,..., N A > 0 is a known constant, w[n] is zeromean white Gaussian noise with known variance σ 2, i.e. w[n] N (0, σ 2 ). The above hypothesistesting formulation is EElike: noise versus signal plus noise. It is similar to the onoff keying scheme in communications, which gives us an idea to rephrase it so that it fits our formulation on p. 4 (for which we have developed all the theory so far). Here is such an EE 527, Detection and Estimation Theory, # 5 33
34 alternative formulation: consider a family of pdfs p(x ; a) = 1 [ (2πσ2 ) exp N 1 2 σ 2 N (x[n] a) 2] (17) n=1 and the following (equivalent) hypotheses: H 0 : a = 0 (off) versus H 1 : a = A (on). Then, the likelihood ratio is Λ(x) = p(x ; a = A) p(x ; a = 0) = 1/(2πσ2 ) N/2 exp[ 1 N 2σ 2 n=1 (x[n] A)2 ] 1/(2πσ 2 ) N/2 exp( 1 N. 2σ 2 n=1 x[n]2 ) Now, take the logarithm and, after simple manipulations, reduce our likelihoodratio test to comparing T (x) = x = 1 N N x[n] n=1 with a threshold γ. [Here, T (x) is a monotonic function of Λ(x).] If T (x) > γ accept H 1 (i.e. reject H 0 ), otherwise accept EE 527, Detection and Estimation Theory, # 5 34
35 H 0 (well, not exactly, we will talk more about this decision on p. 59). The choice of γ depends on the approach that we take. For the Bayes decision rule, γ is a function of π 0 and π 1. For the NeymanPearson test, γ is chosen to achieve (control) a desired P FA. Bayesian decisiontheoretic detection for 01 loss (corresponding to minimizing the average error EE 527, Detection and Estimation Theory, # 5 35
36 probability): log Λ(x) = 1 2 σ σ 2 2 A ( N n=1 N (x[n] A) σ 2 n=1 N (x[n] x[n] } {{ } 0 n=1 ( N n=1 ) x[n] N n=1 (x[n]) 2 H ( 1 π0 ) log π 1 +A) (x[n] + x[n] A) H ( 1 π0 ) log π 1 ) x[n] A 2 N H ( 1 2 σ 2 π0 ) log π 1 AN 2 H 1 σ2 A log ( π0 π 1 ) σ2 since A > 0 and, finally, x H 1 N A log(π 0/π 1 ) + A } {{ 2} γ which, for the practically most interesting case of equiprobable hypotheses π 0 = π 1 = 1 2 (18) simplifies to x H 1 A 2 }{{} γ known as the maximumlikelihood test (i.e. the Bayes decision rule for 01 loss and a priori equiprobable hypotheses is defined as the maximumlikelihood test). This maximumlikelihood detector does not require the knowledge of the noise variance EE 527, Detection and Estimation Theory, # 5 36
37 σ 2 to declare its decision. However, the knowledge of σ 2 is key to assessing the detection performance. Interestingly, these observations will carry over to a few maximumlikelihood tests that we will derive in the future. Assuming (18), we now derive the minimum average error probability. First, note that X a = 0 N (0, σ 2 /N) and X a = A N (A, σ 2 /N). Then min av. error prob. = 1 2 P [X > A } {{ 2 a = 0] } +1 P FA = 1 2 P [ P [ X σ2 /N } {{ } standard normal random var. X A σ2 /N } {{ } standard normal random var. = 1 2 ( N A Q ) } {{ 4 σ 2 } standard normal complementary cdf > A/2 σ2 /N ; a = 0 ] < A/2 A σ2 /N ; a = A ] ( N A Φ 2) } {{ 4 σ 2 } standard normal cdf 2 P [X < A } {{ 2 a = A] } P M ( = Q 1 2 N A 2 σ 2 ) EE 527, Detection and Estimation Theory, # 5 37
38 since Φ( x) = Q(x). The minimum probability of error decreases monotonically with N A 2 /σ 2, which is known as the deflection coefficient. In this case, the Chernoff information is equal to A2 (see Lemma 1): 8 σ 2 { min log λ [0,1] [ λ (1 λ) = min λ [0,1] 2 } [N (0, σ 2 /N)] } {{ } λ [N (A, σ 2 /N)] } {{ } 1 λ dx p 0 (x) p 1 (x) A2 σ 2 ] = A2 8 σ 2 and, therefore, we expect the following asymptotic behavior: min av. error probability N + ) f(n) exp ( N A2 8 σ 2 (19) where f(n) varies slowly with N when N is large, compared with the exponential term: lim N log f(n) N = 0. Indeed, in our example, ( N A Q 2 ) N + 4 σ 2 1 N A 2 2π 4 σ } 2 {{ } f(n) ( exp N ) A2 8 σ 2 EE 527, Detection and Estimation Theory, # 5 38
39 where we have used the asymptotic formula Q(x) x + 1 x 2π exp( x2 /2) (20) given e.g. in equation (2.4) of KayII. EE 527, Detection and Estimation Theory, # 5 39
40 Known DC Level in AWGN: NeymanPearson Approach We now derive the NeymanPearson test for detecting a known DC level in AWGN. Recall that our likelihoodratio test compares with a threshold γ. T (x) = x = 1 N N 1 n=0 x[n] If T (x) > γ, decide H 1 (i.e. reject H 0 ), otherwise decide H 0 (see also the discussion on p. 59). Performance evaluation: Assuming (17), we have T (x) a N (a, σ 2 /N). Therefore, T (X) a = 0 N (0, σ 2 /N), implying ( γ ) P FA = P [T (X) > γ ; a = 0] = Q σ2 /N EE 527, Detection and Estimation Theory, # 5 40
41 and we obtain the decision threshold as follows: γ = σ 2 N Q 1 (P FA ). Now, T (X) a = A N (A, σ 2 /N), implying ( γ A ) P D = P (T (X) > γ a = A) = Q σ2 /N ( ) = Q Q 1 A (P FA ) 2 σ 2 /N ( ) = Q Q 1 (P FA ) NA2 /σ } {{ } 2. = SNR = d 2 Given the falsealarm probability P FA, the detection probability P D depends only on the deflection coefficient: d 2 = NA2 σ 2 = {E [T (X) a = A] E [T (X) a = 0]}2 var[t (X a = 0)] which is also (a reasonable definition for) the signaltonoise ratio (SNR). EE 527, Detection and Estimation Theory, # 5 41
42 Receiver Operating Characteristics (ROC) Comments: P D = Q ( Q 1 (P FA ) d ). As we raise the threshold γ, P FA goes down but so does P D. ROC should be above the 45 line otherwise we can do better by flipping a coin. Performance improves with d 2. EE 527, Detection and Estimation Theory, # 5 42
43 Typical Ways of Depicting the Detection Performance Under the NeymanPearson Setup To analyze the performance of a NeymanPearson detector, we examine two relationships: Between P D and P FA, for a given SNR, called the operating characteristics (ROC). receiver Between P D and SNR, for a given P FA. Here are examples of the two: see Figs. 3.8 and 3.5 in KayII, respectively. EE 527, Detection and Estimation Theory, # 5 43
44 Asymptotic (as N and P FA 0) P D and P M for a Known DC Level in AWGN We apply the ChernoffStein lemma, for which we need to compute the following KL distance: log where ( ) D p(x n a = 0) p(x n a = A) { = E p(xn a=0) log [ p(xn a = A) ]} p(x n a = 0) p(x n a = 0) = N (0, σ 2 ) [ p(xn a = A) ] = (x n A) 2 p(x n a = 0) 2 σ 2 + x2 n 2 σ 2 = 1 2σ 2 (2 A x n A 2 ). Therefore, ( ) D p(x n θ 0 ) p(x n θ 1 ) = A2 2 σ 2 and the ChernoffStein lemma predicts the following behavior of the detection probability as N and P FA 0: ( ) P D 1 f(n) exp N A2 } {{ 2 σ 2 } P M EE 527, Detection and Estimation Theory, # 5 44
45 where f(n) is a slowlyvarying function of N compared with the exponential term. In this case, the exact expression for P M (P D ) is available and consistent with the ChernoffStein lemma: P M = 1 Q (Q 1 (P FA ) ) NA 2 /σ 2 ) = Q( NA2 /σ 2 Q 1 (P FA ) N + = = 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [ } NA 2 /σ 2 Q 1 (P FA )] 2 /2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp NA 2 /(2σ 2 ) [Q 1 (P FA )] 2 /2 +Q 1 (P FA ) } NA 2 /σ 2 1 [ NA 2 /σ 2 Q 1 (P FA )] 2π { exp [Q 1 (P FA )] 2 /2 + Q 1 (P FA ) } NA 2 /σ 2 exp[ NA 2 /(2σ 2 )] } {{ } as predicted by the ChernoffStein lemma (21) where we have used the asymptotic formula (20). When P FA EE 527, Detection and Estimation Theory, # 5 45
46 is small and N is large, the first two (green) terms in the above expression make a slowlyvarying function f(n) of N. Note that the exponential term in (21) does not depend on the falsealarm probability P FA (or, equivalently, on the choice of the decision threshold). The ChernoffStein lemma asserts that this is not a coincidence. Comment. For detecting a known DC level in AWGN: The slope of the exponentialdecay term of P M is A 2 /(2σ 2 ) which is different from (larger than, in this case) the slope of the exponential decay of the minimum average error probability: A 2 /(8σ 2 ). EE 527, Detection and Estimation Theory, # 5 46
47 Decentralized NeymanPearson Detection for Simple Hypotheses Consider a sensornetwork scenario depicted by Assumption: Observations made at N spatially distributed sensors (nodes) follow the same marginal probabilistic model: H i : x n p(x n θ i ) where n = 1, 2,..., N and i {0, 1} for binary hypotheses. Each node n makes a hard local decision d n based on its local observation x n and sends it to the headquarters (fusion center), which collects all the local decisions and makes the final global EE 527, Detection and Estimation Theory, # 5 47
48 decision about H 0 or H 1. This structure is clearly suboptimal it is easy to construct a better decision strategy in which each node sends its (quantized, in practice) likelihood ratio to the fusion center, rather than the decision only. However, such a strategy would have a higher communication (energy) cost. We now go back to the decentralized detection problem. Suppose that each node n makes a local decision d n {0, 1}, n = 1, 2,..., N and transmits it to the fusion center. Then, the fusion center makes the global decision based on the likelihood ratio formed from the d n s. The simplest fusion scheme is based on the assumption that the d n s are conditionally independent given θ (which may not always be reasonable, but leads to an easy solution). We can now write p(d n θ 1 ) = P d n D,n (1 P D,n ) 1 d n } {{ } Bernoulli pmf where P D,n Similarly, is the nth sensor s local detection probability. p(d n θ 0 ) = P d n FA,n (1 P FA,n ) 1 d n } {{ } Bernoulli pmf where P FA,n is the nth sensor s local detection falsealarm EE 527, Detection and Estimation Theory, # 5 48
49 probability. Now, log Λ(d) = = N log n=1 N log n=1 [ p(dn θ 1 ) ] p(d n θ 0 [ P d n D,n (1 P D,n ) 1 d n P d n FA,n (1 P FA,n ) 1 d n ] H1 log τ. To further simplify the exposition, we assume that all sensors have identical performance: P D,n = P D, P FA,n = P FA. Define the number of sensors having d n = 1: u 1 = N n=1 d n Then, the loglikelihood ratio becomes log Λ(d) = u 1 log ( PD P FA ) + (N u 1 ) log ( 1 PD 1 P FA ) H1 log τ or u 1 log [ PD (1 P FA ) ] H1 ( 1 PFA ) log τ + N log. (22) P FA (1 P D ) 1 P D EE 527, Detection and Estimation Theory, # 5 49
50 Clearly, each node s local decision d n P D > P FA, which implies is meaningful only if P D (1 P FA ) P FA (1 P D ) > 1 the logarithm of which is therefore positive, and the decision rule (22) further simplifies to u 1 H 1 τ. The NeymanPerson performance analysis of this detector is easy: the random variable U 1 is binomial given θ (i.e. conditional on the hypothesis) and, therefore, P [U 1 = u 1 ] = ( ) N p u 1 (1 p) N u 1 u 1 where p = P FA under H 0 and p = P D under H 1. Hence, the global falsealarm probability is P FA,global = P [U 1 > τ θ 0 ] = N u 1 = τ ( N u 1 ) P u 1 FA (1 P FA ) N u 1. Clearly, P FA,global will be a discontinuous function of τ and therefore, we should choose our P FA,global specification from EE 527, Detection and Estimation Theory, # 5 50
51 the available discrete choices. But, if none of the candidate choices is acceptable, this means that our current system does not satisfy the requirements and a remedial action is needed, e.g. increasing the quantity (N) or improving the quality of sensors (through changing P D and P FA ), or both. EE 527, Detection and Estimation Theory, # 5 51
52 Testing Multiple Hypotheses Suppose now that we choose Θ 0, Θ 1,..., Θ M 1 that form a partition of the parameter space Θ: Θ 0 Θ 1 Θ M 1 = Θ, Θ i Θ j = i j. We wish to distinguish among M > 2 hypotheses, i.e. identify which hypothesis is true: H 0 : θ Θ 0 versus H 1 : θ Θ 1 versus. versus H M 1 : θ Θ M 1 and, consequently, our action space consists of M choices. We design a decision rule φ : X (0, 1,..., M 1): φ(x) = 0, decide H 0, 1,. decide H 1, M 1, decide H M 1 where φ partitions the data space X [i.e. the support of p(x θ)] into M regions: Rule φ: X 0 = {x : φ(x) = 0},..., X M 1 = {x : φ(x) = M 1}. EE 527, Detection and Estimation Theory, # 5 52
53 We specify the loss function using L(i m), where, typically, the losses due to correct decisions are set to zero: L(i i) = 0, i = 0, 1,..., M 1. Here, we adopt zero losses for correct decisions. posterior expected loss takes M values: Now, our ρ m (x) = = M 1 i=0 M 1 i=0 Θ i L(m i) p(θ x) dθ L(m i) Θ i p(θ x) dθ, m = 0, 1,..., M 1. Then, the Bayes decision rule φ is defined via the following dataspace partitioning: X m = {x : ρ m (x) = min ρ l(x)}, m = 0, 1,..., M 1 0 l M 1 or, equivalently, upon applying the Bayes rule X m = = { M 1 } x : m = arg min L(l i) p(x θ) π(θ) dθ 0 l M 1 i=0 Θ i { M 1 x : m = arg min L(l i) 0 l M 1 i=0 Θ i p(x θ) π(θ) dθ }. EE 527, Detection and Estimation Theory, # 5 53
54 The preposterior (Bayes) risk for rule φ(x) is E x,θ [loss] = = M 1 m=0 M 1 m=0 M 1 i=0 X m X m Θ i L(m i) p(x θ)π(θ) dθ dx M 1 L(m i) p(x θ)π(θ) dθ i=0 Θ } {{ i } h m (θ) dx. Then, for an arbitrary h m (x), [ M 1 m=0 ] h m (x) dx X m [ M 1 m=0 X m ] h m (x) dx 0 which verifies that the Bayes decision rule φ minimizes the preposterior (Bayes) risk. EE 527, Detection and Estimation Theory, # 5 54
Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory
Outline Signal Detection M. Sami Fadali Professor of lectrical ngineering University of Nevada, Reno Hypothesis testing. NeymanPearson (NP) detector for a known signal in white Gaussian noise (WGN). Matched
More informationThe CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
More informationLecture 8: Signal Detection and Noise Assumption
ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationInterpreting KullbackLeibler Divergence with the NeymanPearson Lemma
Interpreting KullbackLeibler Divergence with the NeymanPearson Lemma Shinto Eguchi a, and John Copas b a Institute of Statistical Mathematics and Graduate University of Advanced Studies, Minamiazabu
More informationMessagepassing sequential detection of multiple change points in networks
Messagepassing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationHypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 OneSided and TwoSided Tests
Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis
More informationFalse Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 1, JANUARY 2007 341 Multinode Cooperative Communications in Wireless Networks Ahmed K. Sadek, Student Member, IEEE, Weifeng Su, Member, IEEE, and K.
More informationLecture 9: Bayesian hypothesis testing
Lecture 9: Bayesian hypothesis testing 5 November 27 In this lecture we ll learn about Bayesian hypothesis testing. 1 Introduction to Bayesian hypothesis testing Before we go into the details of Bayesian
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More information33. STATISTICS. 33. Statistics 1
33. STATISTICS 33. Statistics 1 Revised September 2011 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in highenergy physics. In statistics, we are interested in using a
More informationIntroduction to Probability
Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence
More information[Chapter 10. Hypothesis Testing]
[Chapter 10. Hypothesis Testing] 10.1 Introduction 10.2 Elements of a Statistical Test 10.3 Common LargeSample Tests 10.4 Calculating Type II Error Probabilities and Finding the Sample Size for Z Tests
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationUNIVERSITY OF MINNESOTA. Alejandro Ribeiro
UNIVERSITY OF MINNESOTA This is to certify that I have examined this copy of a Master s thesis by Alejandro Ribeiro and have found that it is complete and satisfactory in all respects, and that any and
More informationOptimal Stopping in Software Testing
Optimal Stopping in Software Testing Nilgun Morali, 1 Refik Soyer 2 1 Department of Statistics, Dokuz Eylal Universitesi, Turkey 2 Department of Management Science, The George Washington University, 2115
More information. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.
Chapter 3 KolmogorovSmirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationBayesian probability theory
Bayesian probability theory Bruno A. Olshausen arch 1, 2004 Abstract Bayesian probability theory provides a mathematical framework for peforming inference, or reasoning, using probability. The foundations
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I  1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationA New Interpretation of Information Rate
A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes
More informationCPC/CPA Hybrid Bidding in a Second Price Auction
CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationA Coefficient of Variation for Skewed and HeavyTailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University
A Coefficient of Variation for Skewed and HeavyTailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationComparison of Network Coding and NonNetwork Coding Schemes for Multihop Wireless Networks
Comparison of Network Coding and NonNetwork Coding Schemes for Multihop Wireless Networks JiaQi Jin, Tracey Ho California Institute of Technology Pasadena, CA Email: {jin,tho}@caltech.edu Harish Viswanathan
More informationSignal Detection C H A P T E R 14 14.1 SIGNAL DETECTION AS HYPOTHESIS TESTING
C H A P T E R 4 Signal Detection 4. SIGNAL DETECTION AS HYPOTHESIS TESTING In Chapter 3 we considered hypothesis testing in the context of random variables. The detector resulting in the minimum probability
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More informationDiscussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.
Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine GenonCatalot To cite this version: Fabienne
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More informationBasics of FloatingPoint Quantization
Chapter 2 Basics of FloatingPoint Quantization Representation of physical quantities in terms of floatingpoint numbers allows one to cover a very wide dynamic range with a relatively small number of
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationOn the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information
Finance 400 A. Penati  G. Pennacchi Notes on On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information by Sanford Grossman This model shows how the heterogeneous information
More informationRecursive Estimation
Recursive Estimation Raffaello D Andrea Spring 04 Problem Set : Bayes Theorem and Bayesian Tracking Last updated: March 8, 05 Notes: Notation: Unlessotherwisenoted,x, y,andz denoterandomvariables, f x
More informationWhat is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
More informationDistributed Detection Systems. Hamidreza Ahmadi
Channel Estimation Error in Distributed Detection Systems Hamidreza Ahmadi Outline Detection Theory NeymanPearson Method Classical l Distributed ib t Detection ti Fusion and local sensor rules Channel
More informationLOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
More informationA direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiplehypothesis
More informationStochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More informationALOHA Performs DelayOptimum Power Control
ALOHA Performs DelayOptimum Power Control Xinchen Zhang and Martin Haenggi Department of Electrical Engineering University of Notre Dame Notre Dame, IN 46556, USA {xzhang7,mhaenggi}@nd.edu Abstract As
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics: Behavioural
More informationA Statistical Framework for Operational Infrasound Monitoring
A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LAUR 1103040 The views expressed here do not necessarily reflect the views of the United States Government,
More informationIntroduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize
More informationMonte Carlo testing with Big Data
Monte Carlo testing with Big Data Patrick RubinDelanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationRadar Systems Engineering Lecture 6 Detection of Signals in Noise
Radar Systems Engineering Lecture 6 Detection of Signals in Noise Dr. Robert M. O Donnell Guest Lecturer Radar Systems Course 1 Detection 1/1/010 Block Diagram of Radar System Target Radar Cross Section
More informationThreeStage Phase II Clinical Trials
Chapter 130 ThreeStage Phase II Clinical Trials Introduction Phase II clinical trials determine whether a drug or regimen has sufficient activity against disease to warrant more extensive study and development.
More informationTHE CENTRAL LIMIT THEOREM TORONTO
THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................
More informationCommunication on the Grassmann Manifold: A Geometric Approach to the Noncoherent MultipleAntenna Channel
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent MultipleAntenna Channel Lizhong Zheng, Student
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 1, Lecture 3
Stat26: Bayesian Modeling and Inference Lecture Date: February 1, 21 Lecture 3 Lecturer: Michael I. Jordan Scribe: Joshua G. Schraiber 1 Decision theory Recall that decision theory provides a quantification
More informationOptimal Performance for Detection Systems in Wireless Passive Sensor Networks
Optimal Performance for Detection Systems in Wireless Passive Sensor Networks Ashraf Tantawy, Xenofon Koutsoukos, and Gautam Biswas Institute for Software Integrated Systems (ISIS) Department of Electrical
More informationImportant Probability Distributions OPRE 6301
Important Probability Distributions OPRE 6301 Important Distributions... Certain probability distributions occur with such regularity in reallife applications that they have been given their own names.
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationMoral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania
Moral Hazard Itay Goldstein Wharton School, University of Pennsylvania 1 PrincipalAgent Problem Basic problem in corporate finance: separation of ownership and control: o The owners of the firm are typically
More informationChapter 6: Point Estimation. Fall 2011.  Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6  Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationBayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom
1 Learning Goals Bayesian Updating with Discrete Priors Class 11, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1. Be able to apply Bayes theorem to compute probabilities. 2. Be able to identify
More informationOn Decentralized Detection with Partial Information Sharing among Sensors
On Decentralized Detection with Partial Information Sharing among Sensors The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More informationOn Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationNon Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
More informationUncertainty quantification for the familywise error rate in multivariate copula models
Uncertainty quantification for the familywise error rate in multivariate copula models Thorsten Dickhaus (joint work with Taras Bodnar, Jakob Gierl and Jens Stange) University of Bremen Institute for
More informationLogLikelihood Ratiobased Relay Selection Algorithm in Wireless Network
Recent Advances in Electrical Engineering and Electronic Devices LogLikelihood Ratiobased Relay Selection Algorithm in Wireless Network Ahmed ElMahdy and Ahmed Walid Faculty of Information Engineering
More informationStrong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach
J. R. Statist. Soc. B (2004) 66, Part 1, pp. 187 205 Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach John D. Storey,
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20thcentury statistics dealt with maximum likelihood
More informationThe Bonferonni and Šidák Corrections for Multiple Comparisons
The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationPractice problems for Homework 11  Point Estimation
Practice problems for Homework 11  Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
More informationChapter 4: NonParametric Classification
Chapter 4: NonParametric Classification Introduction Density Estimation Parzen Windows KnNearest Neighbor Density Estimation KNearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination
More informationTutorial on variational approximation methods. Tommi S. Jaakkola MIT AI Lab
Tutorial on variational approximation methods Tommi S. Jaakkola MIT AI Lab tommi@ai.mit.edu Tutorial topics A bit of history Examples of variational methods A brief intro to graphical models Variational
More informationMIMO CHANNEL CAPACITY
MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use
More informationProbabilistic Methods for TimeSeries Analysis
Probabilistic Methods for TimeSeries Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
More informationarxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal  the stuff biology is not
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationIN most approaches to binary classification, classifiers are
3806 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 11, NOVEMBER 2005 A Neyman Pearson Approach to Statistical Learning Clayton Scott, Member, IEEE, Robert Nowak, Senior Member, IEEE Abstract The
More informationThis article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Greedy Link Scheduler for Wireless Networks With Gaussian MultipleAccess and Broadcast Channels Arun Sridharan, Student Member, IEEE, C Emre Koksal, Member, IEEE,
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationConstrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing
Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance
More informationExercises with solutions (1)
Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal
More information13.3 Inference Using Full Joint Distribution
191 The probability distribution on a single variable must sum to 1 It is also true that any joint probability distribution on any set of variables must sum to 1 Recall that any proposition a is equivalent
More informationStatistics in Astronomy
Statistics in Astronomy Initial question: How do you maximize the information you get from your data? Statistics is an art as well as a science, so it s important to use it as you would any other tool:
More informationOPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note 11
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao,David Tse Note Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationSums of Independent Random Variables
Chapter 7 Sums of Independent Random Variables 7.1 Sums of Discrete Random Variables In this chapter we turn to the important question of determining the distribution of a sum of independent random variables
More informationRandom variables, probability distributions, binomial random variable
Week 4 lecture notes. WEEK 4 page 1 Random variables, probability distributions, binomial random variable Eample 1 : Consider the eperiment of flipping a fair coin three times. The number of tails that
More information