1.2 DISTRIBUTIONS FOR CATEGORICAL DATA

Size: px
Start display at page:

Download "1.2 DISTRIBUTIONS FOR CATEGORICAL DATA"

Transcription

1 DISTRIBUTIONS FOR CATEGORICAL DATA 5 present models for a categorcal response wth matched pars; these apply, for nstance, wth a categorcal response measured for the same subjects at two tmes. Chapter 11 covers models for more general types of repeated categorcal data, such as longtudnal data from several tmes wth explanatory varables. In Chapter 1 we present a broad class of models, generalzed lnear mxed models, that use random effects to account for dependence wth such data. In Chapter 13 further extensons and applcatons of the models from Chapters 10 through 1 are descrbed. The fourth and fnal unt s more theoretcal. In Chapter 14 we develop asymptotc theory for categorcal data models. Ths theory s the bass for large-sample behavor of model parameter estmators and goodness-of-ft statstcs. Maxmum lkelhood estmaton receves prmary attenton here and throughout the book, but Chapter 15 covers alternatve methods of estmaton, such as the Bayesan paradgm. Chapter 16 stands alone from the others, beng a hstorcal overvew of the development of categorcal data methods. Most categorcal data methods requre extensve computatons, and statstcal software s necessary for ther effectve use. In Appendx A we dscuss software that can perform the analyses n ths book and show the use of SAS for text examples. See the Web ste aarcdarcda.html to download sample programs and data sets and fnd nformaton about other software. Chapter 1 provdes background materal. In Secton 1. we revew the key dstrbutons for categorcal data: the bnomal, multnomal, and Posson. In Secton 1.3 we revew the prmary mechansms for statstcal nference, usng maxmum lkelhood. In Sectons 1.4 and 1.5 we llustrate these by presentng sgnfcance tests and confdence ntervals for bnomal and multnomal parameters. 1. DISTRIBUTIONS FOR CATEGORICAL DATA Inferental data analyses requre assumptons about the random mechansm that generated the data. For regresson models wth contnuous responses, the normal dstrbuton plays the central role. In ths secton we revew the three key dstrbutons for categorcal responses: bnomal, multnomal, and Posson Bnomal Dstrbuton Many applcatons refer to a fxed number n of bnary observatons. Let y 1, y,..., yn denote responses for n ndependent and dentcal trals such that PYs Ž 1. s and PYs Ž 0. s 1 y. We use the generc labels success and falure for outcomes 1 and 0. Identcal trals means that the probablty of success s the same for each tral. Independent trals means

2 6 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA that the Y 4 are ndependent random varables. These are often called Bernoull trals. The total number of successes, Y s Ý n s1y, has the bnomal dstrbuton wth ndex n and parameter, denoted by bn Ž n,.. The probablty mass functon for the possble outcomes y for Y s ž / n y nyy pž y. s Ž 1 y., y s 0, 1,,...,n, Ž 1.1. y ž/ n y where the bnomal coeffcent s n!rw y! Ž n y y.!.snce x EY s EY s 1 q 0 Ž 1 y. s, EŽ Y. s and varž Y. s Ž 1 y.. The bnomal dstrbuton for Y s ÝY has mean and varance s EŽ Y. s n and s varž Y. s n Ž 1 y.. The skewness s descrbed by EYy r s 1 y r n Ž 1 y.. The dstrbuton converges to normalty as n ncreases, for fxed. There s no guarantee that successve bnary observatons are ndependent or dentcal. Thus, occasonally, we wll utlze other dstrbutons. One such case s samplng bnary outcomes wthout replacement from a fnte populaton, such as observatons on gender for 10 students sampled from a class of sze 0. The hypergeometrc dstrbuton, studed n Secton 3.5.1, s then relevant. In Secton 1..4 we menton another case that volates these bnomal assumptons. 3 3 ' 1.. Multnomal Dstrbuton Some trals have more than two possble outcomes. Suppose that each of n ndependent, dentcal trals can have outcome n any of c categores. Let yjs 1 f tral has outcome n category j and yjs 0 otherwse. Then y s Ž y, y,..., y. 1 c represents a multnomal tral, wth Ý j yj s 1; for nstance, Ž 0, 0, 1, 0. denotes outcome n category 3 of four possble categores. Note that yc s redundant, beng lnearly dependent on the others. Let n js Ýyj denote the number of trals havng outcome n category j. The counts Ž n, n,..., n. 1 c have the multnomal dstrbuton. Let s PY Ž s 1. j j denote the probablty of outcome n category j for each tral. The multnomal probablty mass functon s ž / 1 c n! n 1 n n pž n 1, n,...,ncy1. s 1 c c. Ž 1.. n! n! n!

3 DISTRIBUTIONS FOR CATEGORICAL DATA 7 Snce Ý n s n, ths s Ž cy1. -dmensonal, wth n s n y Ž j j c n1 q qn. cy1. The bnomal dstrbuton s the specal case wth c s. For the multnomal dstrbuton, EŽ n. s n, varž n. s n 1 y, covž n, n. syn. j j j j j j k j k Ž 1.3. We derve the covarance n Secton The margnal dstrbuton of each n s bnomal. j 1..3 Posson Dstrbuton Sometmes, count data do not result from a fxed number of trals. For nstance, f y s number of deaths due to automoble accdents on motorways n Italy durng ths comng week, there s no fxed upper lmt n for y Žas you are aware f you have drven n Italy.. Snce y must be a nonnegatve nteger, ts dstrbuton should place ts mass on that range. The smplest such dstrbuton s the Posson. Its probabltes depend on a sngle parameter, the mean. The Posson probablty mass functon Ž Posson 1837, p. 06. s e y y pž y. s, y s 0, 1,,.... Ž 1.4. y! It satsfes EY s varž Y. s. It s unmodal wth mode equal to the 3 3 nteger part of. Its skewness s descrbed by EYy r s 1r'. The dstrbuton approaches normalty as ncreases. The Posson dstrbuton s used for counts of events that occur randomly over tme or space, when outcomes n dsjont perods or regons are ndependent. It also apples as an approxmaton for the bnomal when n s large and s small, wth s n. Sofeach of the 50 mllon people drvng n Italy next week s an ndependent tral wth probablty of dyng n a fatal accdent that week, the number of deaths Y s a bnž , varate, or approxmately Posson wth s n s 50,000,000Ž s 100. A key feature of the Posson dstrbuton s that ts varance equals ts mean. Sample counts vary more when ther mean s hgher. When the mean number of weekly fatal accdents equals 100, greater varablty occurs n the weekly counts than when the mean equals Overdsperson In practce, count observatons often exhbt varablty exceedng that predcted by the bnomal or Posson. Ths phenomenon s called o erdsperson. We assumed above that each person has the same probablty of dyng n a fatal accdent n the next week. More realstcally, these probabltes vary,

4 8 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA due to factors such as amount of tme spent drvng, whether the person wears a seat belt, and geographcal locaton. Such varaton causes fatalty counts to dsplay more varaton than predcted by the Posson model. Suppose that Y s a random varable wth varance varžy. for gven, but tself vares because of unmeasured factors such as those just descrbed. Let s EŽ.. Then uncondtonally, EŽ Y. s E EŽ Y., varž Y. s E varž Y. q var EŽ Y.. When Y s condtonally Posson Ž gven., for nstance, then EY s EŽ. s and varž Y. s EŽ. q varž. s q varž.. Assumng a Posson dstrbuton for a count varable s often too smplstc, because of factors that cause overdsperson. The negat e bnomal s a related dstrbuton for count data that permts the varance to exceed the mean. We ntroduce t n Secton Analyses assumng bnomal Ž or multnomal. dstrbutons are also sometmes nvald because of overdsperson. Ths mght happen because the true dstrbuton s a mxture of dfferent bnomal dstrbutons, wth the parameter varyng because of unmeasured varables. To llustrate, suppose that an experment exposes pregnant mce to a toxn and then after a week observes the number of fetuses n each mouse s ltter that show sgns of malformaton. Let n denote the number of fetuses n the ltter for mouse. The mce also vary accordng to other factors that may not be measured, such as ther weght, overall health, and genetc makeup. Extra varaton then occurs because of the varablty from ltter to ltter n the probablty of malformaton. The dstrbuton of the number of fetuses per ltter showng malformatons mght cluster near 0 and near n, showng more dsperson than expected for bnomal samplng wth a sngle value of. Overdsperson could also occur when vares among fetuses n a ltter accordng to some dstrbuton Ž Problem In Chapters 4, 1, and 13 we ntroduce methods for data that are overdspersed relatve to bnomal and Posson assumptons Connecton between Posson and Multnomal Dstrbutons In Italy ths next week, let y1 s number of people who de n automoble accdents, y s number who de n arplane accdents, and y3 s number who de n ralway accdents. A Posson model for Ž Y, Y, Y. 1 3 treats these as ndependent Posson random varables, wth parameters Ž 1,, 3.. The jont probablty mass functon for Y 4 s the product of the three mass functons of form Ž The total n s ÝY also has a Posson dstrbuton, wth parameter Ý. Wth Posson samplng the total count n s random rather than fxed. If we assume a Posson model but condton on n, Y 4 no longer have Posson dstrbutons, snce each Y cannot exceed n. Gvenn, Y 4 are also no longer ndependent, snce the value of one affects the possble range for the others.

5 STATISTICAL INFERENCE FOR CATEGORICAL DATA 9 For c ndependent Posson varates, wth EY s, let s derve ther condtonal dstrbuton gven that ÝY s n. The condtonal probablty of a set of counts n 4 satsfyng ths condton s P Ž Y1s n 1, Ys n,...,ycs nc. Ý Yjs n s PŽ Y1s n 1, Ys n,...,ycs nc. P Ž ÝY s n. j s s, Ž 1.5. n Ł exp y rn! n! n Ł n expž yý Ý rn! Ł n! j.ž j. where s rž Ý.4.Thssthe multnomal Žn, 4. j dstrbuton, charac- terzed by the sample sze n and the probabltes 4. Many categorcal data analyses assume a multnomal dstrbuton. Such analyses usually have the same parameter estmates as those of analyses assumng a Posson dstrbuton, because of the smlarty n the lkelhood functons. 1.3 STATISTICAL INFERENCE FOR CATEGORICAL DATA The choce of dstrbuton for the response varable s but one step of data analyss. In practce, that dstrbuton has unknown parameter values. In ths secton we revew methods of usng sample data to make nferences about the parameters. Sectons 1.4 and 1.5 cover bnomal and multnomal parameters Lkelhood Functons and Maxmum Lkelhood Estmaton In ths book we use maxmum lkelhood for parameter estmaton. Under weak regularty condtons, such as the parameter space havng fxed dmenson wth true value fallng n ts nteror, maxmum lkelhood estmators have desrable propertes: They have large-sample normal dstrbutons; they are asymptotcally consstent, convergng to the parameter as n ncreases; and they are asymptotcally effcent, producng large-sample standard errors no greater than those from other estmaton methods. Gven the data, for a chosen probablty dstrbuton the lkelhood functon s the probablty of those data, treated as a functon of the unknown parameter. The maxmum lkelhood Ž ML. estmate s the parameter value that maxmzes ths functon. Ths s the parameter value under whch the data observed have the hghest probablty of occurrence. The parameter value that maxmzes the lkelhood functon also maxmzes the log of that functon. It s smpler to maxmze the log lkelhood snce t s a sum rather than a product of terms.

6 10 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA We denote a parameter for a generc problem by and ts ML estmate by. ˆ The lkelhood functon s l Ž. and the log-lkelhood functon s LŽ. s logwl Ž.x. For many models, LŽ. has concave shape and ˆ s the pont at whch the dervatve equals 0. The ML estmate s then the soluton of the lkelhood equaton, LŽ. r s 0. Often, s multdmensonal, denoted by, and ˆ s the soluton of a set of lkelhood equatons. Let SE denote the standard error of, ˆ and let covž ˆ. denote the asymptotc covarance matrx of. ˆ Under regularty condtons ŽRao 1973, p. 364., covž ˆ. s the nverse of the nformaton matrx. The Ž j, k. element of the nformaton matrx s ž / L Ž ye.. Ž 1.6. The standard errors are the square roots of the dagonal elements for the nverse nformaton matrx. The greater the curvature of the log lkelhood, the smaller the standard errors. Ths s reasonable, snce large curvature mples that the log lkelhood drops quckly as moves away from ; ˆ hence, the data would have been much more lkely to occur f took a value near ˆ rather than a value far from. ˆ j k 1.3. Lkelhood Functon and ML Estmate for Bnomal Parameter The part of a lkelhood functon nvolvng the parameters s called the kernel. Snce the maxmzaton of the lkelhood s wth respect to the parameters, the rest s rrelevant. To llustrate, consder the bnomal dstrbuton Ž The bnomal coeffn ž/ y cent has no nfluence on where the maxmum occurs wth respect to. Thus, we gnore t and treat the kernel as the lkelhood functon. The bnomal log lkelhood s then y nyy L s log 1 y s ylog q n y y log 1 y. 1.7 Dfferentatng wth respect to yelds LŽ. r s yr y Ž n y y. rž 1 y. s Ž y y n. r Ž 1 y.. Ž 1.8. Equatng ths to 0 gves the lkelhood equaton, whch has soluton ˆ s yrn, the sample proporton of successes for the n trals. Calculatng L r, takng the expectaton, and combnng terms, we get ye L r s E yr q n y y r 1 y s nr 1 y. Ž 1.9.

7 STATISTICAL INFERENCE FOR CATEGORICAL DATA 11 Thus, the asymptotc varance of ˆ s Ž 1 y. rn. Ths s no surprse. Snce EY s n and varž Y. s n Ž 1 y., the dstrbuton of ˆ s Yrn has mean and standard error ( Ž 1 y. EŽ ˆ. s, Ž ˆ. s. n Wald Lkelhood Rato Score Test Trad Three standard ways exst to use the lkelhood functon to perform large-sample nference. We ntroduce these for a sgnfcance test of a null hypothess H 0: s 0 and then dscuss ther relaton to nterval estmaton. They all explot the large-sample normalty of ML estmators. Wth nonnull standard error SE of, ˆ the test statstc Ž 0. z s ˆ y rse has an approxmate standard normal dstrbuton when s 0. One refers z to the standard normal table to obtan one- or two-sded P-values. Equvalently, for the two-sded alternatve, z has a ch-squared null dstrbuton wth 1 degree of freedom Ž df.; the P-value s then the rght-taled ch-squared probablty above the observed value. Ths type of statstc, usng the nonnull standard error, s called a Wald statstc Ž Wald The multvarate extenson for the Wald test of H 0: s 0 has test statstc y1 Ž ˆ. Ž ˆ. Ž ˆ 0 0. W s y cov y. Ž The prme on a vector or matrx denotes the transpose.. The nonnull covarance s based on the curvature Ž 1.6. of the log lkelhood at. ˆ The asymptotc multvarate normal dstrbuton for ˆ mples an asymptotc ch-squared dstrbuton for W. The df equal the rank of covž ˆ., whch s the number of nonredundant parameters n. A second general-purpose method uses the lkelhood functon through the rato of two maxmzatons: Ž. 1 the maxmum over the possble parameter values under H, and Ž. 0 the maxmum over the larger set of parameter values permttng H0 or an alternatve Ha to be true. Let l denote the 0 maxmzed value of the lkelhood functon under H 0, and let l denote the 1 maxmzed value generally e., under H j H. 0 a. For nstance, for parameter vector s Ž,. 0 1 and H 0: 0s 0, l s the lkelhood functon calculated 1 at the value for whch the data would have been most lkely; l s the 0 lkelhood functon calculated at the 1 value for whch the data would have been most lkely, when 0 s 0. Then l s always at least as large as 1 l, snce l results from maxmzng over a restrcted set of the parameter 0 0 values.

8 1 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA The rato s l rl of the maxmzed lkelhoods cannot exceed 1. Wlks 0 1 Ž 1935, showed that y log has a lmtng null ch-squared dstrbuton, as n. The df equal the dfference n the dmensons of the parameter spaces under H0 j Ha and under H 0. The lkelhood-rato test statstc equals y log sy logž l rl. syž L0y L 1., 0 1 where L0 and L1 denote the maxmzed log-lkelhood functons. The thrd method uses the score statstc, due to R. A. Fsher and C. R. Rao. The score test s based on the slope and expected curvature of the log-lkelhood functon LŽ. at the null value 0.Itutlzes the sze of the score functon už. s LŽ. r, evaluated at. The value už. tends to be larger n absolute value when ˆ 0 0 w s farther from. Denote ye LŽ. r x e., the nformaton. 0 evaluated at by Ž.. The score statstc s the rato of už to ts null SE, whch s w Ž.x 0 1r. Ths has an approxmate standard normal null dstrbuton. The ch-squared form of the score statstc s 0 L r 0 už. s, Ž. ye LŽ. r 0 0 where the partal dervatve notaton reflects dervatves wth respect to that are evaluated at 0.Inthe multparameter case, the score statstc s a quadratc form based on the vector of partal dervatves of the log lkelhood wth respect to and the nverse nformaton matrx, both evaluated at the H estmates e., assumng that s Fgure 1.1 s a generc plot of a log-lkelhood LŽ. for the unvarate case. It llustrates the three tests of H 0: s 0. The Wald test uses the ˆ Ž ˆ. behavor of L at the ML estmate, havng ch-squared form rse. The SE of ˆ depends on the curvature of LŽ. at. ˆ The score test s based on the slope and curvature of LŽ. at s 0. The lkelhood-rato test combnes nformaton about LŽ. at both ˆ and 0 s 0. It compares the log-lkelhood values L at ˆ 1 and L0 at 0s 0 usng the ch-squared statstc yž L y L InFgure 1.1, ths statstc s twce the vertcal dstance between values of LŽ. at ˆ and at 0. In a sense, ths statstc uses the most nformaton of the three types of test statstc and s the most versatle. As n, the Wald, lkelhood-rato, and score tests have certan asymptotc equvalences Ž Cox and Hnkley 1974, Sec For small to moderate sample szes, the lkelhood-rato test s usually more relable than the Wald test.

9 STATISTICAL INFERENCE FOR CATEGORICAL DATA 13 FIGURE 1.1 Log-lkelhood functon and nformaton used n three tests of H : s Constructng Confdence Intervals In practce, t s more nformatve to construct confdence ntervals for parameters than to test hypotheses about ther values. For any of the three test methods, a confdence nterval results from nvertng the test. For nstance, a 95% confdence nterval for s the set of 0 for whch the test of H 0: s 0 has a P-value exceedng Let za denote the z-score from the standard normal dstrbuton havng rght-taled probablty a; ths s the 100Ž 1 y a. percentle of that dstrbuton. Let Ž a. denote the 100Ž 1 y a. df percentle of the ch-squared dstrbuton wth degrees of freedom df. 100Ž 1 y.% confdence ntervals based on asymptotc normalty use z r, for nstance z0.05 s 1.96 for 95% confdence. The Wald confdence nterval s the set of for whch ˆ y 0 0 rse z r. Ths gves the nterval ˆ z Ž SE. r. The lkelhood-rato-based confdence w Ž ˆ.x nterval s the set of for whch y L y L Ž.. w Recall that s z. x 1 r When ˆ has a normal dstrbuton, the log-lkelhood functon has a parabolc shape e., a second-degree polynomal.. For small samples wth categorcal data, ˆ may be far from normalty and the log-lkelhood functon can be far from a symmetrc, parabolc-shaped curve. Ths can also happen wth moderate to large samples when a model contans many parameters. In such cases, nference based on asymptotc normalty of ˆ may have nadequate performance. A marked dvergence n results of Wald and lkelhoodrato nference ndcates that the dstrbuton of ˆ may not be close to normalty. The example n Secton llustrates ths wth qute dfferent confdence ntervals for dfferent methods. In many such cases, nference can

10 14 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA nstead utlze an exact small-sample dstrbuton or hgher-order asymptotc methods that mprove on smple normalty Že.g., Perce and Peters The Wald confdence nterval s most common n practce because t s smple to construct usng ML estmates and standard errors reported by statstcal software. The lkelhood-rato-based nterval s becomng more wdely avalable n software and s preferable for categorcal data wth small to moderate n. For the best known statstcal model, regresson for a normal response, the three types of nference necessarly provde dentcal results. 1.4 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS In ths secton we llustrate nference methods for categorcal data by presentng tests and confdence ntervals for the bnomal parameter, based on y successes n n ndependent trals. In Secton 1.3. we obtaned the lkelhood functon and ML estmator ˆ s yrn of Tests about a Bnomal Parameter Consder H 0: s 0. Snce H0 has a sngle parameter, we use the normal rather than ch-squared forms of Wald and score test statstcs. They permt tests aganst one-sded as well as two-sded alternatves. The Wald statstc s ˆ y 0 y ˆ 0 zw s s. Ž SE ' ˆŽ 1 y ˆ. rn Evaluatng the bnomal score Ž 1.8. and nformaton Ž 1.9. at 0 yelds y ny y n už 0. s y, Ž 0. s. 1 y Ž 1 y The normal form of the score statstc smplfes to už 0. y y n 0 y ˆ 0 zs s s s. Ž r. Ž. n 0Ž 1 y 0. 0Ž 1 y 0. rn 0 ' Whereas the Wald statstc zw uses the standard error evaluated at ˆ, the score statstc zs uses t evaluated at 0. The score statstc s preferable, as t uses the actual null SE rather than an estmate. Its null samplng dstrbuton s closer to standard normal than that of the Wald statstc. The bnomal log-lkelhood functon Ž 1.7. equals L0 s ylog 0 q Ž n y y. logž 1 y. under H and L s y log ˆ q Ž n y y. logž 1 y ˆ. more '

11 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 15 generally. The lkelhood-rato test statstc smplfes to Expressed as / 0 0 ˆ 1 y ˆ yž L0y L1. s y log q Ž n y y. log. ž 1 y / y ny y yž L0y L1. s ž y log q Ž n y y. log, n n y n 0 0 t compares observed success and falure counts to ftted.e., null counts by observed Ý observed log. Ž 1.1. ftted We ll see that ths formula also holds for tests about Posson and multnomal parameters. Snce no unknown parameters occur under H0 and one occurs under H, Ž 1.1. has an asymptotc ch-squared dstrbuton wth df s 1. a 1.4. Confdence Intervals for a Bnomal Parameter A sgnfcance test merely ndcates whether a partcular value Žsuch as s 0.5. s plausble. We learn more by usng a confdence nterval to determne the range of plausble values. Invertng the Wald test statstc gves the nterval of 0 values for whch z z,or W r ( ˆ Ž 1 y ˆ. ˆ z r. Ž n Hstorcally, ths was one of the frst confdence ntervals used for any parameter Ž Laplace 181, p Unfortunately, t performs poorly unless n s very large Ž e.g., Brown et al The actual coverage probablty usually falls below the nomnal confdence coeffcent, much below when s near 0 1 or 1. A smple adjustment that adds z r observatons of each type to the sample before usng ths formula performs much better Ž Problem The score confdence nterval contans values for whch z 0 S z r. Its endponts are the solutons to the equatons 0 ˆ y 0 r' 0Ž 1 y 0. rn s z r.

12 16 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA These are quadratc n. Frst dscussed by E. B. Wlson Ž , ths nterval s n 1 z r ˆ q ž / ž / n q z r nq z r ) ž /ž / ž / ž / 1 n 1 1 z r z r ˆ Ž 1 y ˆ. q. n q z nq z nq z r r r 1 The mdpont of the nterval s a weghted average of ˆ and, where the Ž weght nr n q z. r gven ˆ ncreases as n ncreases. Combnng terms, ths Ž. Ž mdpont equals s y q z r r n q z. r r. Ths s the sample proporton for an adjusted sample that adds z r observatons, half of each type. The square of the coeffcent of z r n ths formula s a weghted average of the varance of a sample proporton when s ˆ and the varance of a sample 1 proporton when s, usng the adjusted sample sze n q z r n place of n. Ths nterval has much better performance than the Wald nterval. The lkelhood-rato-based confdence nterval s more complex computatonally, but smple n prncple. It s the set of 0 for whch the lkelhoodrato test has a P-value exceedng. Equvalently, t s the set of 0 for whch double the log lkelhood drops by less than Ž. 1 from ts value at the ML estmate ˆ s yrn Proporton of Vegetarans Example To collect data n an ntroductory statstcs course, recently I gave the students a questonnare. One queston asked each student whether he or she was a vegetaran. Of n s 5 students, y s 0 answered yes. They were not a random sample of a partcular populaton, but we use these data to llustrate 95% confdence ntervals for a bnomal parameter. Snce y s 0, ˆ s 0r5 s 0. Usng the Wald approach, the 95% confdence nterval for s ' Ž r5, or Ž 0, 0.. When the observaton falls at the boundary of the sample space, often Wald methods do not provde sensble answers. By contrast, the 95% score nterval equals Ž 0.0, Ths s a more belevable nference. For H 0: s 0.5, for nstance, the score test statstc s z S s 0 y 0.5 r' Ž r5 sy5.0, so 0.5 does not fall n the nterval. By contrast, for H 0: s 0.10, zs s 0 y 0.10 r' Ž r5 sy1.67, so 0.10 falls n the nterval.

13 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 17 When y s 0 and n s 5, the kernel of the lkelhood functon s l Ž. s 0 Ž 1 y. 5 s Ž 1 y. 5. The log lkelhood Ž 1.7. s LŽ. s 5 logž 1 y.. Note that LŽ ˆ. s LŽ 0. s 0. The 95% lkelhood-rato confdence nterval s the set of for whch the lkelhood-rato statstc 0 yž L y L. sy LŽ. y LŽ ˆ sy50 logž 1 y 0. F 1 Ž s The upper bound s 1 y expž y3.84r50. s 0.074, and the confdence nterval equals Ž 0.0, win ths book, we use the natural logarthm throughout, so ts nverse s the exponental functon expž x. s e x. x Fgure 1. shows the lkelhood and log-lkelhood functons and the correspondng confdence regon for. The three large-sample methods yeld qute dfferent results. When s near 0, the samplng dstrbuton of ˆ s hghly skewed to the rght for small n. It s worth consderng alternatve methods not requrng asymptotc approxmatons. FIGURE 1. Bnomal lkelhood and log lkelhood when y s 0nn s 5 trals, and confdence nterval for.

14 18 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA Exact Small-Sample Inference* 1 Wth modern computatonal power, t s not necessary to rely on large-sample approxmatons for the dstrbuton of statstcs such as. ˆ Tests and confdence ntervals can use the bnomal dstrbuton drectly rather than ts normal approxmaton. Such nferences occur naturally for small samples, but apply for any n. We llustrate by testng H : s 0.5 aganst H : 0.5 for the survey 0 a results on vegetaransm, y s 0 wth n s 5. We noted that the score statstc equals z sy5.0. The exact P-value for ths statstc, based on the null bn 5, 0.5 dstrbuton, s 5 5 PŽ z G 5.0. s PŽ Ys 0orY s 5. s 0.5 q 0.5 s Ž 1 y.% confdence ntervals consst of all 0 for whch P-values exceed n exact bnomal tests. The best known nterval ŽClopper and Pearson uses the tal method for formng confdence ntervals. It requres each one-sded P-value to exceed r. The lower and upper endponts are the solutons n to the equatons 0 n y n k nyk n k Ýž / 0 0 Ý ž / 0 0 k k ksy ks0 nyk Ž 1 y. s r and Ž 1 y. s r, except that the lower bound s 0 when y s 0 and the upper bound s 1 when y s n. When y s 1,,..., n y 1, from connectons between bnomal sums and the ncomplete beta functon and related cumulatve dstrbuton functons Ž cdf s. of beta and F dstrbutons, the confdence nterval equals y1 n y y q 1 n y y 1q 1 q, yf Ž 1 y r. Ž y q 1. F Ž r. y,ž nyyq1. Ž yq1.,ž nyy. where F Ž c. a, b denotes the 1 y c quantle from the F dstrbuton wth degrees of freedom a and b. When y s 0 wth n s 5, the Clopper Pearson 95% confdence nterval for s Ž 0.0, In prncple ths approach seems deal. However, there s a serous complcaton. Because of dscreteness, the actual coverage probablty for any s at least as large as the nomnal confdence level ŽCasella and Berger 001, p. 434; Neyman and t can be much greater. Smlarly, for a test of H 0: s 0 at a fxed desred sze such as 0.05, t s not usually possble to acheve that sze. There s a fnte number of possble samples, and hence a fnte number of possble P-values, of whch 0.05 may not be one. In testng H wth fxed, one can pck a partcular that can occur as a P-value Sectons marked wth an astersk are less mportant for an overvew. y1

15 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 19 FIGURE 1.3 Plot of coverage probabltes for nomnal 95% confdence ntervals for bnomal parameter when n s 5. For nterval estmaton, however, ths s not an opton. Ths s because constructng the nterval corresponds to nvertng an entre range of 0 values n H 0: s 0, and each dstnct 0 value can have ts own set of possble P-values; that s, there s not a sngle null parameter value 0 as n one test. For any fxed parameter value, the actual coverage probablty can be much larger than the nomnal confdence level. When n s 5, Fgure 1.3 plots the coverage probabltes as a functon of for the Clopper Pearson method, the score method, and the Wald method. At a fxed value wth a gven method, the coverage probablty s the sum of the bnomal probabltes of all those samples for whch the resultng nterval contans that. There are 6 possble samples and 6 correspondng confdence ntervals, so the coverage probablty s a sum of somewhere between 0 and 6 bnomal probabltes. As moves from 0 to 1, ths coverage probablty jumps up or down whenever moves nto or out of one of these ntervals. Fgure 1.3 shows that coverage probabltes are too low for the Wald method, whereas the Clopper Pearson method errs n the opposte drecton. The score method behaves well, except for some values close to 0 or 1. Its coverage probabltes tend to be near the nomnal level, not beng consstently conservatve or lberal. Ths s a good method unless s very close to 0 or 1 Ž Problem In dscrete problems usng small-sample dstrbutons, shorter confdence ntervals usually result from nvertng a sngle two-sded test rather than two

16 0 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA one-sded tests. The nterval s then the set of parameter values for whch the P-value of a two-sded test exceeds. For the bnomal parameter, see Blaker Ž 000., Blyth and Stll Ž 1983., and Sterne Ž for methods. For observed outcome y o, wth Blaker s approach the P-value s the mnmum of the two one-taled bnomal probabltes PYG Ž y. and PYF Ž y. o o plus an attanable probablty n the other tal that s as close as possble to, but not greater than, that one-taled probablty. The nterval s computatonally more complex, although avalable n software Ž Blaker gave S-Plus functons.. The result s stll conservatve, but less so than the Clopper Pearson nterval. For the vegetaransm example, the 95% confdence nterval usng the Blaker exact method s Ž 0.0, compared to the Clopper Pearson nterval of Ž 0.0, Inference Based on the Md-P-Value* To adjust for dscreteness n small-sample dstrbutons, one can base nference on the md-p- alue Ž Lancaster For a test statstc T wth observed value toand one-sded Hasuch that large T contradcts H 0, 1 md-p-value s P Ts to q P T t o, wth probabltes calculated from the null dstrbuton. Thus, the md-p-value s less than the ordnary P-value by half the probablty of the observed result. Compared to the ordnary P-value, the md-p-value behaves more lke the P-value for a test statstc havng a contnuous dstrbuton. The sum of ts two one-sded P-values equals 1.0. Although dscrete, under H0 ts null dstrbuton s more lke the unform dstrbuton that occurs n the contnuous case. For nstance, t has a null expected value of 0.5, whereas ths expected value exceeds 0.5 for the ordnary P-value for a dscrete test statstc. Unlke an exact test wth ordnary P-value, a test usng the md-p-value does not guarantee that the probablty of type I error s no greater than a nomnal value Ž Problem However, t usually performs well, typcally beng a bt conservatve. It s less conservatve than the ordnary exact test. Smlarly, one can form less conservatve confdence ntervals by nvertng tests usng the exact dstrbuton wth the md-p-value Že.g., the 95% confdence nterval s the set of parameter values for whch the md-p-value exceeds For testng H 0: s 0.5 aganst H a: 0.5 n the example about the proporton of vegetarans, wth y s 0 for n s 5, the result observed s the most extreme possble. Thus the md-p-value s half the ordnary P-value, or Usng the Clopper Pearson nverson of the exact bnomal test but wth the md-p-value yelds a 95% confdence nterval of Ž 0.000, for, compared to Ž 0.000, for the ordnary Clopper Pearson nterval. The md-p-value seems a sensble compromse between havng overly conservatve nference and usng rrelevant randomzaton to elmnate prob-

17 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 1 lems from dscreteness. We recommend t both for tests and confdence ntervals wth hghly dscrete dstrbutons. 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS We now present nference for multnomal parameters 4 j.ofn observa- tons, n j occur n category j, j s 1,...,c Estmaton of Multnomal Parameters Frst, we obtan ML estmates of 4.Asafuncton of 4 j j,themultnomal probablty mass functon Ž 1.. s proportonal to the kernel Ł j n j where all G 0 and s 1. Ž Ý j j j j The ML estmates are the 4 that maxmze Ž j. The multnomal log-lkelhood functon s LŽ. s Ý n j log j. j To elmnate redundances, we treat L as a functon of Ž,...,. 1 cy1, snce s 1 y Ž q q. c 1 cy1. Thus, cr jsy1, j s 1,...,c y 1. Snce log c 1 c 1 s sy, j c j c dfferentatng L wth respect to gves the lkelhood equaton j LŽ. nj nc s y s0. j j c The ML soluton satsfes ˆ jr ˆcs n jrn c. Now ˆ c ž Ý n j / j ˆ c n Ý ˆ j s 1 s s, n n j so ˆ cs ncrn and then ˆ js n jrn. From general results presented later n the book Ž Secton 8.6., ths soluton does maxmze the lkelhood. Thus, the ML estmates of 4 are the sample proportons. j c c

18 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 1.5. Pearson Statstc for Testng a Specfed Multnomal In 1900 the emnent Brtsh statstcan Karl Pearson ntroduced a hypothess test that was one of the frst nferental methods. It had a revolutonary mpact on categorcal data analyss, whch had focused on descrbng assocatons. Pearson s test evaluates whether multnomal parameters equal certan specfed values. Hs orgnal motvaton n developng ths test was to analyze whether possble outcomes on a partcular Monte Carlo roulette wheel were equally lkely Ž Stgler Consder H 0: js j0, j s 1,...,c, where Ý j j0s 1. When H0 s true, the expected values of n 4 j,called expected frequences, are js n j0, j s 1,..., c. Pearson proposed the test statstc Ž n jy j. X s. Ž Ý j 4 Greater dfferences n jy j produce greater X values, for fxed n. Let Xo Ž denote the observed value of X. The P-value s the null value of P X G X. o. Ths equals the sum of the null multnomal probabltes of all count arrays havng a sum of n wth X G X o. For large samples, X has approxmately a ch-squared dstrbuton wth Ž. df s c y 1. The P-value s approxmated by P cy1 G X o, where cy1 denotes a ch-squared random varable wth df s c y 1. Statstc Ž s called the Pearson ch-squared statstc. j Example: Testng Mendel s Theores Among ts many applcatons, Pearson s test was used n genetcs to test Mendel s theores of natural nhertance. Mendel crossed pea plants of pure yellow stran wth plants of pure green stran. He predcted that second-generaton hybrd seeds would be 75% yellow and 5% green, yellow beng the domnant stran. One experment produced n s 803 seeds, of whch n s 1 60 were yellow and n s 001 were green. The expected frequences for H : s 0.75, s 0.5 are s 803Ž s and s The Pearson statstc X s Ž df s 1. has a P-value of P s Ths does not contradct Mendel s hypothess. Mendel performed several experments of ths type. In 1936, R. A. Fsher summarzed Mendel s results. He used the reproductve property of chsquared: If X1,..., Xk are ndependent ch-squared statstcs wth degrees of freedom,...,, then Ý X has a ch-squared dstrbuton wth df s 1 k Ý. Fsher obtaned a summary ch-squared statstc equal to 4, wth df s 84. A ch-squared dstrbuton wth df s 84 has mean 84 and standard devaton Ž 84. 1r s 13.0, and the rght-taled probablty above 4 s P s In other words, the ch-squared statstc was so small that the ft seemed too good.

19 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 3 Fsher commented: The general level of agreement between Mendel s expectatons and hs reported results shows that t s closer than would be expected n the best of several thousand repettons.... I have no doubt that Mendel was deceved by a gardenng assstant, who knew only too well what hs prncpal expected from each tral made. In a letter wrtten at the tme Ž see Box 1978, p. 97., he stated: Now, when data have been faked, I know very well how generally people underestmate the frequency of wde chance devatons, so that the tendency s always to make them agree too well wth expectatons. In summary, goodness-of-ft tests can reveal not only when a ft s nadequate, but also when t s better than random fluctuatons would have us expect. wr. A. Fsher s daughter, Joan Fsher Box Ž1978, pp , and Freedman et al. Ž 1978, pp , 478. dscussed Fsher s analyss of Mendel s data and the accompanyng controversy. Despte possble dffcultes wth Mendel s data, subsequent work led to general acceptance of hs theores.x Ch-Squared Theoretcal Justfcaton* We now outlne why Pearson s statstc has a lmtng ch-squared dstrbuton. For a multnomal sample Ž n,..., n. 1 c of sze n, the margnal dstrbuton of n s the bnž n,. j j dstrbuton. For large n, bythe normal approxma- ton to the bnomal, n j Ž and ˆ js n jrn. have approxmate normal dstrbutons. More generally, by the central lmt theorem, the sample proportons ˆ s Ž n rn,..., n rn. 1 cy1 have an approxmate multvarate normal dstrbuton Ž Secton Let denote the null covarance matrx of ' 0 n, ˆ and let s Ž,...,.. Under H, snce ' n Ž y cy1,0 0 ˆ 0 converges to a NŽ 0,. dstrbuton, the quadratc form 0 y1 Ž ˆ 0. 0 Ž ˆ 0. n y y Ž has dstrbuton convergng to ch-squared wth df s c y 1. In Secton we show that the covarance matrx of ' n ˆ has elements ½ y j k f j k jk s. j Ž 1 y j. f j s k y1 The matrx has Ž j, k. th element 1r when j k and Ž 1r q 1r. 0 c0 j0 c0 Ž y1 when j s k. You can verfy ths by showng that 0 0 equals the dentty matrx.. Wth ths substtuton, drect calculaton Žwth approprate combnng. of terms shows that 1.16 smplfes to X.InSecton 14.3 we provde a formal proof n a more general settng. Ths argument s smlar to Pearson s n R. A. Fsher Ž 19. gave a smpler justfcaton, the gst of whch follows: Suppose that Ž n,..., n. 1 c are ndependent Posson random varables wth means Ž,...,.. For large 1 c

20 4 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 4, the standardzed values z s Ž n y. r 4 j j j j j have approxmate standard normal dstrbutons. Thus, Ý jz j s X has an approxmate ch-squared dstrbuton wth c degrees of freedom. Addng the sngle lnear constrant Ý Ž n y. j j j s 0, thus convertng the Posson dstrbutons to a multnomal, we lose a degree of freedom. When c s, Pearson s X smplfes to the square of the normal score statstc Ž For Mendel s data, ˆ 1 s 60r803, 10 s 0.75, n s 803, and z S s 0.13, for whch X s 0.13 s In fact, for general c the Pearson test s the score test about multnomal parameters. ' Lkelhood-Rato Ch-Squared An alternatve test for multnomal parameters uses the lkelhood-rato test. The kernel of the multnomal lkelhood s Ž Under H0 the lkelhood s maxmzed when ˆ js j0.inthe general case, t s maxmzed when ˆ j s n jrn. The rato of the lkelhoods equals n j Ł j Ž j0. s n j. Ł n rn j j Thus, the lkelhood-rato statstc, denoted by G,s Ý j j j0 G sy log s n log n rn. Ž Ths statstc, whch has form Ž 1.1., s called the lkelhood-rato ch-squared statstc. The larger the value of G, the greater the evdence aganst H 0. In the general case, the parameter space conssts of 4 j subject to Ý s 1, so the dmensonalty s c y 1. Under H,the 4 j j 0 j are specfed completely, so the dmenson s 0. The dfference n these dmensons equals c y 1.For large n, G has a ch-squared null dstrbuton wth df s c y 1. When H0 holds, the Pearson X and the lkelhood rato G both have asymptotc ch-squared dstrbutons wth df s c y 1. In fact, they are asymptotcally equvalent n that case; specfcally, X y G converges n probablty to zero Ž Secton When H0 s false, they tend to grow proportonally to n; they need not take smlar values, however, even for very large n. For fxed c, as n ncreases the dstrbuton of X usually converges to ch-squared more quckly than that of G. The ch-squared approxmaton s usually poor for G when nrc 5. When c s large, t can be decent for X for nrc as small as 1 f the table does not contan both very small and moderately large expected frequences. We provde further gudelnes n Secton Alternatvely, one can use the multnomal probabltes to generate exact dstrbutons of these test statstcs Ž Good et al

21 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS Testng wth Estmated Expected Frequences Pearson s X Ž compares a sample dstrbuton to a hypothetcal one 4.Insomeapplcatons, s Ž.4 j0 j0 j0 are functons of a smaller set of unknown parameters. ML estmates ˆ of determne ML estmates Ž ˆ.4 of 4 and hence ML estmates s n Ž ˆ.4 j0 j0 ˆ j j0 of expected frequen- 4 4 ces n X. Replacng j by estmates ˆ j affects the dstrbuton of X. When dmž. s p, the true df s Ž c y 1. y p Ž Secton Pearson faled to realze ths Ž Secton We now show a goodness-to-ft test wth estmated expected frequences. A sample of 156 dary calves born n Okeechobee County, Florda, were classfed accordng to whether they caught pneumona wthn 60 days of brth. Calves that got a pneumona nfecton were also classfed accordng to whether they got a secondary nfecton wthn weeks after the frst nfecton cleared up. Table 1.1 shows the data. Calves that dd not get a prmary nfecton could not get a secondary nfecton, so no observatons can fall n the category for no prmary nfecton and yes secondary nfecton. That combnaton s called a structural zero. A goal of ths study was to test whether the probablty of prmary nfecton was the same as the condtonal probablty of secondary nfecton, gven that the calf got the prmary nfecton. In other words, f ab denotes the probablty that a calf s classfed n row a and column b of ths table, the null hypothess s H 0: 11 q 1 s 11rŽ 11 q 1. or 11 s 11 q 1. Let s 11 q 1 denote the probablty of prmary nfecton. The null hypothess states that the probabltes satsfy the structure that Table 1. shows; that s, probabltes n a trnomal for the categores Ž yes yes, yes no, no no. for prmary secondary nfecton equal Ž, Ž 1 y.,1y.. Let n denote the number of observatons n category Ž a, b. ab. The ML estmate of s the value maxmzng the kernel of the multnomal lkelhood n n 11 1 n Ž. Ž y. Ž 1 y.. TABLE 1.1 Prmary and Secondary Pneumona Infectons n Calves Secondary Infecton a Prmary Infecton Yes No Yes 30 Ž Ž No 0 Ž. 63 Ž Source: Data courtesy of Thang Tran and G. A. Donovan, College of Veternary Medcne, Unversty of Florda. a Values n parentheses are estmated expected frequences.

22 6 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA TABLE 1. Probablty Structure for Hypothess Secondary Infecton Prmary Infecton Yes No Total Yes Ž 1 y. No 1 y 1 y The log lkelhood s LŽ. s n11 log q n1 logž y. q n logž 1 y.. Dfferentaton wth respect to gves the lkelhood equaton The soluton s n n n n q y y s0. 1 y 1 y ˆ s Ž n11 q n1. rž n11 q n1 q n.. For Table 1.1, ˆ s Snce n s 156, the estmated expected frequen- ces are ˆ s n s 38.1, s nž y. s 39.0, and s nž 1 y. 11 ˆ ˆ1 ˆ ˆ ˆ ˆ s Table 1.1 shows them. Pearson s statstc s X s Snce the c s 3 possble responses have p s 1 parameter Ž. determnng the expected frequences, df s Ž 3 y 1. y 1 s 1. There s strong evdence aganst H Ž 0 Ps Inspecton of Table 1.1 reveals that many more calves got a prmary nfecton but not a secondary nfecton than H0 predcts. The researchers concluded that the prmary nfecton had an mmunzng effect that reduced the lkelhood of a secondary nfecton. NOTES Secton 1.1: Categorcal Response Data 1.1. Stevens Ž defned Ž nomnal, ordnal, nterval. scales of measurement. Other scales result from mxtures of these types. For nstance, partally ordered scales occur when subjects respond to questons havng categores ordered except for don t know or undecded categores. Secton 1.3: Statstcal Inference for Categorcal Data 1.. The score method does not use. ˆ Thus, when s a model parameter, one can usually compute the score statstc for testng H 0: s 0 wthout fttng the model. Ths s advantageous when fttng several models n an exploratory analyss and model fttng s computatonally ntensve. An advantage of the score and lkelhood-rato methods s that

23 PROBLEMS 7 they apply even when ˆ s. In that case, one cannot compute the Wald statstc. Another dsadvantage of the Wald method s that ts results depend on the parameterzaton; nference based on ˆ and ts SE s not equvalent to nference based on a nonlnear functon of t, such as log ˆ and ts SE. Secton 1.4: Statstcal Inference for Bnomal Parameters 1.3. Among others, Agrest and Coull Ž 1998., Blyth and Stll Ž 1983., Brown et al. Ž 001., Ghosh Ž 1979., and Newcombe Ž 1998a. showed the superorty of the score nterval to the Wald nterval for. Ofthe exact methods, Blaker s Ž 000. has partcularly good propertes. It s contaned n the Clopper Pearson nterval and has a nestedness property whereby an nterval of hgher nomnal confdence level necessarly contans one of lower level Usng contnuty correctons wth large-sample methods provdes approxmatons to exact small-sample methods. Thus, they tend to behave conservatvely. We do not present them, snce f one prefers an exact method, wth modern computatonal power t can be used drectly rather than approxmated In theory, one can elmnate problems wth dscreteness n tests by performng a supplementary randomzaton on the boundary of a crtcal regon Ž see Problem In rejectng the null at the boundary wth a certan probablty, one can obtan a fxed overall type I error probablty even when t s not an achevable P-value. For such randomzaton, the one-sded P y value s randomzed P-value s U PŽ Ts t. q PŽ T t., o o where U denotes a unform Ž 0, 1. random varable Ž Stevens In practce, ths s not used, as t s absurd to let ths random number nfluence a decson. The md P-value replaces the arbtrary unform multple U PTs Ž t. by ts expected value. o Secton 1.5: Statstcal Inference for Multnomal Parameters 1.6. The ch-squared dstrbuton has mean df, varance df, and skewness Ž 8rdf. 1r. It s approxmately normal when df s large. Greenwood and Nkuln Ž 1996., Kendall and Stuart Ž 1979., and Lancaster Ž presented other propertes. Cochran Ž 195. presented a hstorcal survey of ch-squared tests of ft. See also Cresse and Read Ž 1989., Koch and Bhapkar Ž 198., Koehler Ž 1998., and Moore Ž 1986b.. PROBLEMS Applcatons 1.1 Identfy each varable as nomnal, ordnal, or nterval. a. UK poltcal party preference ŽLabour, Conservatve, Socal Democrat. b. Anxety ratng Ž none, mld, moderate, severe, very severe. c. Patent survval Ž n number of months. d. Clnc locaton Ž London, Boston, Madson, Rochester, Montreal.

24 8 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA e. Response of tumor to chemotherapy Žcomplete elmnaton, partal reducton, stable, growth progresson. f. Favorte beverage Ž water, juce, mlk, soft drnk, beer, wne. g. Apprasal of company s nventory level Žtoo low, about rght, too hgh. 1. Each of 100 multple-choce questons on an exam has four possble answers, one of whch s correct. For each queston, a student guesses by selectng an answer randomly. a. Specfy the dstrbuton of the student s number of correct answers. b. Fnd the mean and standard devaton of that dstrbuton. Would t be surprsng f the student made at least 50 correct responses? Why? c. Specfy the dstrbuton of Ž n, n, n, n , where n j s the number of tmes the student pcked choce j. d. Fnd En, varž n., covž n, n., and corrž n, n. j j j k j k. 1.3 An experment studes the number of nsects that survve a certan dose of an nsectcde, usng several batches of nsects of sze n each. The nsects are senstve to factors that vary among batches durng the experment but were not measured, such as temperature level. Explan why the dstrbuton of the number of nsects per batch survvng the experment mght show overdsperson relatve to a bnž n,. dstrbuton. 1.4 In hs autobography A Sort of Lfe, Brtsh author Graham Greene descrbed a perod of severe mental depresson durng whch he played Russan Roulette. Ths game conssts of puttng a bullet n one of the sx chambers of a pstol, spnnng the chambers to select one at random, and then frng the pstol once at one s head. a. Greene played ths game sx tmes and was lucky that none of them resulted n a bullet frng. Fnd the probablty of ths outcome. b. Suppose that he had kept playng ths game untl the bullet fred. Let Y denote the number of the game on whch t fres. Show the probablty mass functon for Y, and justfy. 1.5 Consder the statement, Please tell me whether or not you thnk t should be possble for a pregnant woman to obtan a legal aborton f she s marred and does not want any more chldren. For the 1996 General Socal Survey, conducted by the Natonal Opnon Research Center Ž NORC., 84 repled yes and 98 repled no. Let denote

25 PROBLEMS 9 the populaton proporton who would reply yes. Fnd the P-value for testng H 0: s 0.5 usng the score test, and construct a 95% confdence nterval for. Interpret the results. 1.6 Refer to the vegetaransm example n Secton For testng H 0: s 0.5 aganst H a: 0.5, show that: a. The lkelhood-rato statstc equals w5logž 5r1.5.x s b. The ch-squared form of the score statstc equals 5.0. c. The Wald z or ch-squared statstc s nfnte. 1.7 In a crossover tral comparng a new drug to a standard, denotes the probablty that the new one s judged better. It s desred to estmate and test H 0: s 0.5 aganst H a: 0.5. In 0 ndependent observatons, the new drug s better each tme. a. Fnd and sketch the lkelhood functon. Gve the ML estmate of. b. Conduct a Wald test and construct a 95% Wald confdence nterval for. Are these sensble? c. Conduct a score test, reportng the P-value. Construct a 95% score confdence nterval. Interpret. d. Conduct a lkelhood-rato test and construct a lkelhood-based 95% confdence nterval. Interpret. e. Construct an exact bnomal test and 95% confdence nterval. Interpret. f. Suppose that researchers wanted a suffcently large sample to estmate the probablty of preferrng the new drug to wthn 0.05, wth confdence If the true probablty s 0.90, about how large a sample s needed? 1.8 In an experment on chlorophyll nhertance n maze, for 1103 seedlngs of self-fertlzed heterozygous green plants, 854 seedlngs were green and 49 were yellow. Theory predcts the rato of green to yellow s 3:1. Test the hypothess that 3:1 s the true rato. Report the P-value, and nterpret. 1.9 Table 1.3 contans Ladslaus von Bortkewcz s data on deaths of solders n the Prussan army from kcks by army mules ŽFsher 1934; Qune and Seneta The data refer to 10 army corps, each observed for 0 years. In 109 corps-years of exposure, there were no deaths, n 65 corps-years there was one death, and so on. Estmate the mean and test whether probabltes of occurrences n these fve categores follow a Posson dstrbuton Ž truncated for 4 and above..

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Meta-Analysis of Hazard Ratios

Meta-Analysis of Hazard Ratios NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

Regression Models for a Binary Response Using EXCEL and JMP

Regression Models for a Binary Response Using EXCEL and JMP SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

More information

Lecture 3: Force of Interest, Real Interest Rate, Annuity

Lecture 3: Force of Interest, Real Interest Rate, Annuity Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

Section 5.4 Annuities, Present Value, and Amortization

Section 5.4 Annuities, Present Value, and Amortization Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Portfolio Loss Distribution

Portfolio Loss Distribution Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Traffic-light a stress test for life insurance provisions

Traffic-light a stress test for life insurance provisions MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

Section 2 Introduction to Statistical Mechanics

Section 2 Introduction to Statistical Mechanics Secton 2 Introducton to Statstcal Mechancs 2.1 Introducng entropy 2.1.1 Boltzmann s formula A very mportant thermodynamc concept s that of entropy S. Entropy s a functon of state, lke the nternal energy.

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT

OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT ASSESSING INDIVIDUAL UNEXPLAINED VARIATION IN NON-LIFE INSURANCE BY OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT We consder varaton of observed clam frequences n non-lfe nsurance,

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

Stress test for measuring insurance risks in non-life insurance

Stress test for measuring insurance risks in non-life insurance PROMEMORIA Datum June 01 Fnansnspektonen Författare Bengt von Bahr, Younes Elonq and Erk Elvers Stress test for measurng nsurance rsks n non-lfe nsurance Summary Ths memo descrbes stress testng of nsurance

More information

Chapter 2 The Basics of Pricing with GLMs

Chapter 2 The Basics of Pricing with GLMs Chapter 2 The Bascs of Prcng wth GLMs As descrbed n the prevous secton, the goal of a tarff analyss s to determne how one or more key ratos Y vary wth a number of ratng factors Ths s remnscent of analyzng

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS Chrs Deeley* Last revsed: September 22, 200 * Chrs Deeley s a Senor Lecturer n the School of Accountng, Charles Sturt Unversty,

More information

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIIOUS AFFILIATION AND PARTICIPATION Danny Cohen-Zada Department of Economcs, Ben-uron Unversty, Beer-Sheva 84105, Israel Wllam Sander Department of Economcs, DePaul

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

Variance estimation for the instrumental variables approach to measurement error in generalized linear models he Stata Journal (2003) 3, Number 4, pp. 342 350 Varance estmaton for the nstrumental varables approach to measurement error n generalzed lnear models James W. Hardn Arnold School of Publc Health Unversty

More information

Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pimbley, unpublished, 2005. Yield Curve Calculations Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Estimation of Dispersion Parameters in GLMs with and without Random Effects

Estimation of Dispersion Parameters in GLMs with and without Random Effects Mathematcal Statstcs Stockholm Unversty Estmaton of Dsperson Parameters n GLMs wth and wthout Random Effects Meng Ruoyan Examensarbete 2004:5 Postal address: Mathematcal Statstcs Dept. of Mathematcs Stockholm

More information

7 ANALYSIS OF VARIANCE (ANOVA)

7 ANALYSIS OF VARIANCE (ANOVA) 7 ANALYSIS OF VARIANCE (ANOVA) Chapter 7 Analyss of Varance (Anova) Objectves After studyng ths chapter you should apprecate the need for analysng data from more than two samples; understand the underlyng

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB. INDEX 1. Load data usng the Edtor wndow and m-fle 2. Learnng to save results from the Edtor wndow. 3. Computng the Sharpe Rato 4. Obtanng the Treynor Rato

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Question 2: What is the variance and standard deviation of a dataset?

Question 2: What is the variance and standard deviation of a dataset? Queston 2: What s the varance and standard devaton of a dataset? The varance of the data uses all of the data to compute a measure of the spread n the data. The varance may be computed for a sample of

More information

Scaling Models for the Severity and Frequency of External Operational Loss Data

Scaling Models for the Severity and Frequency of External Operational Loss Data Scalng Models for the Severty and Frequency of External Operatonal Loss Data Hela Dahen * Department of Fnance and Canada Research Char n Rsk Management, HEC Montreal, Canada Georges Donne * Department

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes

Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes Control Charts wth Supplementary Runs Rules for Montorng varate Processes Marcela. G. Machado *, ntono F.. Costa * * Producton Department, Sao Paulo State Unversty, Campus of Guaratnguetá, 56-4 Guaratnguetá,

More information

Evaluating the generalizability of an RCT using electronic health records data

Evaluating the generalizability of an RCT using electronic health records data Evaluatng the generalzablty of an RCT usng electronc health records data 3 nterestng questons Is our RCT representatve? How can we generalze RCT results? Can we use EHR* data as a control group? *) Electronc

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models DISCUSSION PAPER SERIES IZA DP No. 2756 Dagnostc ests of Cross Secton Independence for Nonlnear Panel Data Models Cheng Hsao M. Hashem Pesaran Andreas Pck Aprl 2007 Forschungsnsttut zur Zukunft der Arbet

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Brigid Mullany, Ph.D University of North Carolina, Charlotte

Brigid Mullany, Ph.D University of North Carolina, Charlotte Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio Vascek s Model of Dstrbuton of Losses n a Large, Homogeneous Portfolo Stephen M Schaefer London Busness School Credt Rsk Electve Summer 2012 Vascek s Model Important method for calculatng dstrbuton of

More information

Rate-Based Daily Arrival Process Models with Application to Call Centers

Rate-Based Daily Arrival Process Models with Application to Call Centers Submtted to Operatons Research manuscrpt (Please, provde the manuscrpt number!) Authors are encouraged to submt new papers to INFORMS journals by means of a style fle template, whch ncludes the journal

More information

Transition Matrix Models of Consumer Credit Ratings

Transition Matrix Models of Consumer Credit Ratings Transton Matrx Models of Consumer Credt Ratngs Abstract Although the corporate credt rsk lterature has many studes modellng the change n the credt rsk of corporate bonds over tme, there s far less analyss

More information

Prediction of Disability Frequencies in Life Insurance

Prediction of Disability Frequencies in Life Insurance Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but

More information

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8 Statstcs Rudolf N. Cardnal Graduate-level statstcs for psychology and neuroscence NOV n practce, and complex NOV desgns Verson of May 4 Part : quck summary 5. Overvew of ths document 5. Background knowledge

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information