How NOT to do a data analysis

Size: px

Start display at page:

Download "How NOT to do a data analysis"

Adela Paul
8 years ago
Views:

1 How NOT to do a data analysis The Physics case: neutrinoless double beta decay The experimental apparatus The reported results A critical review of the analysis The ROOT code A. Fontana and P. Pedroni INFN Pavia

2 It all starts from here

3 and from here

4 Neutrino mass absolute mass? Tritium decay m ve <2.3 ev Cosmology (0n) m i <~1 ev <m n > < ev Mass scale? m 2 m 1 2 m 2 2 m 3 2? Normal hierarchy m 3 > m 2 ~m 1 Inverted hierarchy m 2 ~m 1 >m 3 Degenerate m 1 m 2 m 3» m i -m j F. Piquemal (CENBG) Nuppec Bordeaux, November

5 Nature of the neutrino Dirac neutrino particle antiparticle 4 states n L,n R,n L, n R conservation of global leptonic number Majorana neutrino n=n 2 states n L and n R No conservation of global leptonic number Majorana neutrinos exist in most of GUT and supersymmetric models If n Majorana and CP violation in leptonic sector -> leptogenesis Majorana neutrinos imply global leptonic number violation F. Piquemal (CENBG) CS IN2P3 2005/03/05

6 Neutrinoless Double Beta decay n n n e e - h h ne h e - p h p n=n L=2 Majorana neutrino n L n R massive neutrino Other possible process: V+A, Majoron, SUSY (A,Z) (A,Z+1) (A,Z+2) Phase space factor Nuclear matrix element -1 5 T 1/2 = F(Q,Z) M 2 <m n > 2 effective neutrino Mass: <m n >= m 1 U e1 2 + m 2 U e2 2.e i1 + m 3 U e3 2.e i2 Uei : mixing matrix elements 1 et 2: Majorana phases F. Piquemal (CENBG) Nuppec Bordeaux, November

7 Double Beta Decay Signature 2nDBD: (A,Z) (A,Z+2) + 2e nDBD: (A,Z) (A,Z+2) + 2e - n e Allowed by SM new physics beyond the SM two electrons each with a continuous spectrum and a monochromatic sum energy 2 neutrinos Double Beta Decay continuous spectrum Neutrinoless Double Beta Decay peak enlarged by the detector energy resolution sum electron energy / Q

8 Double Beta Decay sources

9 0nDBD and neutrino physics How 0nDBD is connected to neutrino mixing matrix and masses? In case of dominant mass mechanism: neutrinoless Double Beta Decay rate Phase space Nuclear matrix elements Effective Majorana mass 1/t = G(Q,Z) M nucl 2 M 2 what the experimentalists try to measure what the nuclear theorists try to calculate parameter containing the physics M = U e1 2 M 1 + e i 1 U e2 2 M 2 + e i 2 U e3 2 M 3

10 Experimental parameters How experimental parameters are connected to the Majorana mass sensitivity of experiment? sensitivity F: lifetime corresponding to the minimum detectable number of events over background at a given confidence level b 0 b = 0 b: specific background coefficient [counts/(kev kg y)] source mass live time energy resolution F (MT / be) 1/2 F MT background level importance of the nuclide choice (but large uncertainty due to nuclear physics) sensitivity to m (F/Q M nucl 2 ) 1/2 Q 1/2 1 be M nucl MT 1/4

11 Experimental techniques e - e - Source Detector Easy to approach the ton scale High energy resolution (<2%) No tracking capability Easy to reject 2nDBD background e - detector source e - Source Detector detector Easy to get tracking capability Low energy resolution (>2%) Tracking / topology capability Easy to approach zero backround (with the exception of 2n DBD component)

Heidelberg Moscow Exp and the 0nDBD claim December 2001, 4 authors (KDHK) of HM collaboration claim the 0nDBD of 76 Ge Source = Detector Well known Ge diodes technology 5 Ge diodes with a total

12 Heidelberg Moscow Exp and the 0nDBD claim December 2001, 4 authors (KDHK) of HM collaboration claim the 0nDBD of 76 Ge Source = Detector Well known Ge diodes technology 5 Ge diodes with a total statistic of 10.9 kg - ( 86%) 76 Ge location: Underground Gran Sasso Laboratory (Italy) detectors shielded with lead and N 2 fluxed Reduction of Bkg with Pulse Shape Analysis (PSA) (factor 5) Ge nuclei Multi-site events identification (gamma bkg) Spectrum with 71.7 kg y

13 Heidelberg Moscow Exp and the 0nDBD claim most probable value: 28.7 in 71.7 kg y exposition KKDC claim: m ee = ev (0.44 ev b.v.) t 0n 1/2 (y) = ( ) y ( y b.v.) (99,9973 % c.l. 4.2 σ) H.V. Klapdor-Kleingrothaus et al. NIM. A 522(2004)371 Skepticism of scientific community Aalseth CE et al., Mod. Phys. Lett. A 17 (2002) 1475 Feruglio F et al., Nucl. Phys. B 637 (2002) 345 Zdezenko Yu G et al., Phys. Lett. B546(2002)206 Klapdor-Kleingrothaus HV hep-ph/ H.L. Harney, hep-ph/ Klapdor-Kleingrothaus HV et al., NIM A510(2003)281 Klapdor-Kleingrothaus et al., NIM A 522(2004)371 Comments and analysis HD-M data Independent answers of authors Other articles Not totally accepted result unrecognized peaks dimension of analyzed energy window

14 Experiments

15 GERDA 2013

16 GERDA 2013

17 GERDA 2013

18 Our Minuit fit

19 and the ROOT code

20 A critical review of the analysis «known contributions» from decays Number of contents at the maximum: SL Linear Background

21 Neutrinoless double beta decay Authors were smart : they did not put errors bars How many gaussians are there? Authors were smart : they did not perform an hypotehsis test. They just assumed that there was «something»

22 P(H 0 ) P(H 1 ) Hypothesis test in Physics 1-1- exp value power 22

23 Hypothesis test Level of the test to be estabilished before the measurement Observed significance level 0 (or SL) (p-value) Error of the first kind: is the incorrect rejection of a true null hypothesis. The probability of this error is In general =1% - 5% «The burden of proof for a major new discovery should be as high as the importance of the claim.» Higgs boson discovery 10-6

24 Neutrinoless double beta decay Null hypothesis: histogram generated only from a pure background (according to authors background has a linear decreasing shape) =?? (at least 10-3 ) «Standard» best fit (LS) 2 =91 /59 degrees of freedom: p-value Correct remarks of the authors: LS method should not be used in this case

25 a single random variable X Likelihood function x=(x 1, x 2,., x n ) are the values observed in n indepenent measures p(x;q): probability density of X dependent on a set of parameters q =(q 1, q 2,., q p ) to be experimentally determined n = 1 ; q) p( x2; q) p( xn; q) p( x i ; ) i= 1 L( q; x) = p( x q Maximum likelihood method(ml) The maximum likelihood estimate of is the global maximum point (if existing) of the likelihood function max (R.A. Fischer, 1912) n = i= 1 L( q; x) max p( x ; q) L(ˆ; q x) qˆ i = q

26 The matematical properties of the likelihood function that are of statistocal interest (maximum position) aree th same as his logarithm Loglikelihood function L = -ln( L( q, x)) = - n i= 1 Sign is inverted ln( p( x i ; q)) A maximum of L corresponds to a minimum of L dl dq k = n i= 1 1 p( x ; q) i p( xi; q) = qk 0 ( k = 1,2,, p) Likelihood equations

27 Likelihhood and histograms Number of expected events per channel: i ( θ) = N p( xi;θ) dx = p( x0i;θ) i Npi (θ) «Event by event» likelihood i Random variable = probability for a single event to fall inside the i-th histogram bin. This event has a probability p i (q) and happens n i times L e ni = -ln( L(θ, n)) = -ln[ pi (θ)] = - n k i= 1 k i= 1 i ln( p i (θ))

28 «Globale likelihood» (big samples) Random variabile = number of events n i inside the i-th histogram bin. (multinomial probability) L g = -ln = k i= 1 k i= 1 ln( Np i 2Np (θ)) 1 i 1 ( ni - Npi ( q )) exp - ( q ) 2 Npi ( q ) k i= 1 ( n i - Npi ( q )) Np ( q ) i 2 2 L g k i= 1 ( ni - i ( q )) n i 2 Only for «big» samples: n i > n i Np i - gaussian variable 2 test can be performed after minimization.

29 2 test can not be be performed after minimization. L e L g The LS approximation is not too bad.

30 Fisher method for multiple comaprisons Sometimes a possible effect can be present only within a part of the measured interval ; this interval is split in different (disjoint) zones and a test is performed in each zone. N=20 hypotheses to test (disjoint intervals) and a significance level = 5%. What is the probability of observing at least one «significant» result just due to pure chance? P{effect} = 1- P{no effect} = 1- (1-) Bonferroni rule: 1- (1-) N 1- (1 - N) = N N = 0.64 «Look Elsewhere effect» 1- (1- / N) N New p-value H 0 is rejected if at least a single test gives 0 < /N

31 Fisher method for multiple comparisons 0 (as all variables from cumulative distribution functions) - when H 0 is true - is a uniform random variabile U in [0,1] the random variable -2 ln(u) is distributed as 2 (n=2) P{y = -2ln u < x} = P{u e - x / 2 } = e - x / 2 F( y) = 1- e -x / 2 p( y) = df dx = 1 2 e -x / 2 2 = (2) In a multiple test over k disjoint intervals, with p-values p 1, p k under H 0 F k = -2ln i= 1 p i = 2 (2n)

32 H 0 = linear background, k=12 intervals (5 bins/interval) F = 46 ( n=24)

33 Bin content + 40%

34 «known contributions» fromi Number of contents at the decays maximum: SL points 20 parameters to be fitted (!) straight line+6 gaussians Without giving strong contraints to ALL parameters it is impossible to reproduce the published results

35 n i b Assuming background b to be well known n ( i - n i b %)

Status Report of the Gerda Phase II Startup

Status Report of the Gerda Phase II Startup Valerio D Andrea on behalf of the Gerda Collaboration Gran Sasso Science Institute (INFN), L Aquila ICNPA 2016 18th International Conference on Neutrino Physics