1.2 DISTRIBUTIONS FOR CATEGORICAL DATA



Similar documents
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

An Alternative Way to Measure Private Equity Performance

BERNSTEIN POLYNOMIALS

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

CHAPTER 14 MORE ABOUT REGRESSION

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

How To Calculate The Accountng Perod Of Nequalty

1 Example 1: Axis-aligned rectangles

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

1 De nitions and Censoring

1. Measuring association using correlation and regression

What is Candidate Sampling

STATISTICAL DATA ANALYSIS IN EXCEL

Recurrence. 1 Definitions and main statements

Calculation of Sampling Weights

Can Auto Liability Insurance Purchases Signal Risk Attitude?

SIMPLE LINEAR CORRELATION

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

The OC Curve of Attribute Acceptance Plans

Evaluating credit risk models: A critique and a new proposal

Binomial Link Functions. Lori Murray, Phil Munz

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Meta-Analysis of Hazard Ratios

Support Vector Machines

Forecasting the Direction and Strength of Stock Market Movement

Statistical algorithms in Review Manager 5

Regression Models for a Binary Response Using EXCEL and JMP

Lecture 3: Force of Interest, Real Interest Rate, Annuity

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Quantization Effects in Digital Filters

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Section 5.4 Annuities, Present Value, and Amortization

Statistical Methods to Develop Rating Models

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

DEFINING %COMPLETE IN MICROSOFT PROJECT

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Realistic Image Synthesis

Analysis of Premium Liabilities for Australian Lines of Business

This circuit than can be reduced to a planar circuit

Portfolio Loss Distribution

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Traffic-light a stress test for life insurance provisions

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Section 2 Introduction to Statistical Mechanics

Sketching Sampled Data Streams

L10: Linear discriminants analysis

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

The Application of Fractional Brownian Motion in Option Pricing

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

OLA HÖSSJER, BENGT ERIKSSON, KAJSA JÄRNMALM AND ESBJÖRN OHLSSON ABSTRACT

Lecture 5,6 Linear Methods for Classification. Summary

Stress test for measuring insurance risks in non-life insurance

Chapter 2 The Basics of Pricing with GLMs

Single and multiple stage classifiers implementing logistic discrimination

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

Extending Probabilistic Dynamic Epistemic Logic

Variance estimation for the instrumental variables approach to measurement error in generalized linear models

Joe Pimbley, unpublished, Yield Curve Calculations

8 Algorithm for Binary Searching in Trees

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Estimation of Dispersion Parameters in GLMs with and without Random Effects

7 ANALYSIS OF VARIANCE (ANOVA)

An Empirical Study of Search Engine Advertising Effectiveness

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

Logistic Regression. Steve Kroon

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

PRACTICE 1: MUTUAL FUNDS EVALUATION USING MATLAB.

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

Question 2: What is the variance and standard deviation of a dataset?

Scaling Models for the Severity and Frequency of External Operational Loss Data

The Current Employment Statistics (CES) survey,

Project Networks With Mixed-Time Constraints

Control Charts with Supplementary Runs Rules for Monitoring Bivariate Processes

Evaluating the generalizability of an RCT using electronic health records data

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

Diagnostic Tests of Cross Section Independence for Nonlinear Panel Data Models

An Interest-Oriented Network Evolution Mechanism for Online Communities

Brigid Mullany, Ph.D University of North Carolina, Charlotte

A Probabilistic Theory of Coherence

Vasicek s Model of Distribution of Losses in a Large, Homogeneous Portfolio

Rate-Based Daily Arrival Process Models with Application to Call Centers

Transition Matrix Models of Consumer Credit Ratings

Prediction of Disability Frequencies in Life Insurance

Part 1: quick summary 5. Part 2: understanding the basics of ANOVA 8

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Transcription:

DISTRIBUTIONS FOR CATEGORICAL DATA 5 present models for a categorcal response wth matched pars; these apply, for nstance, wth a categorcal response measured for the same subjects at two tmes. Chapter 11 covers models for more general types of repeated categorcal data, such as longtudnal data from several tmes wth explanatory varables. In Chapter 1 we present a broad class of models, generalzed lnear mxed models, that use random effects to account for dependence wth such data. In Chapter 13 further extensons and applcatons of the models from Chapters 10 through 1 are descrbed. The fourth and fnal unt s more theoretcal. In Chapter 14 we develop asymptotc theory for categorcal data models. Ths theory s the bass for large-sample behavor of model parameter estmators and goodness-of-ft statstcs. Maxmum lkelhood estmaton receves prmary attenton here and throughout the book, but Chapter 15 covers alternatve methods of estmaton, such as the Bayesan paradgm. Chapter 16 stands alone from the others, beng a hstorcal overvew of the development of categorcal data methods. Most categorcal data methods requre extensve computatons, and statstcal software s necessary for ther effectve use. In Appendx A we dscuss software that can perform the analyses n ths book and show the use of SAS for text examples. See the Web ste www.stat.ufl.edur aarcdarcda.html to download sample programs and data sets and fnd nformaton about other software. Chapter 1 provdes background materal. In Secton 1. we revew the key dstrbutons for categorcal data: the bnomal, multnomal, and Posson. In Secton 1.3 we revew the prmary mechansms for statstcal nference, usng maxmum lkelhood. In Sectons 1.4 and 1.5 we llustrate these by presentng sgnfcance tests and confdence ntervals for bnomal and multnomal parameters. 1. DISTRIBUTIONS FOR CATEGORICAL DATA Inferental data analyses requre assumptons about the random mechansm that generated the data. For regresson models wth contnuous responses, the normal dstrbuton plays the central role. In ths secton we revew the three key dstrbutons for categorcal responses: bnomal, multnomal, and Posson. 1..1 Bnomal Dstrbuton Many applcatons refer to a fxed number n of bnary observatons. Let y 1, y,..., yn denote responses for n ndependent and dentcal trals such that PYs Ž 1. s and PYs Ž 0. s 1 y. We use the generc labels success and falure for outcomes 1 and 0. Identcal trals means that the probablty of success s the same for each tral. Independent trals means

6 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA that the Y 4 are ndependent random varables. These are often called Bernoull trals. The total number of successes, Y s Ý n s1y, has the bnomal dstrbuton wth ndex n and parameter, denoted by bn Ž n,.. The probablty mass functon for the possble outcomes y for Y s ž / n y nyy pž y. s Ž 1 y., y s 0, 1,,...,n, Ž 1.1. y ž/ n y where the bnomal coeffcent s n!rw y! Ž n y y.!.snce x EY s EY s 1 q 0 Ž 1 y. s, EŽ Y. s and varž Y. s Ž 1 y.. The bnomal dstrbuton for Y s ÝY has mean and varance s EŽ Y. s n and s varž Y. s n Ž 1 y.. The skewness s descrbed by EYy r s 1 y r n Ž 1 y.. The dstrbuton converges to normalty as n ncreases, for fxed. There s no guarantee that successve bnary observatons are ndependent or dentcal. Thus, occasonally, we wll utlze other dstrbutons. One such case s samplng bnary outcomes wthout replacement from a fnte populaton, such as observatons on gender for 10 students sampled from a class of sze 0. The hypergeometrc dstrbuton, studed n Secton 3.5.1, s then relevant. In Secton 1..4 we menton another case that volates these bnomal assumptons. 3 3 ' 1.. Multnomal Dstrbuton Some trals have more than two possble outcomes. Suppose that each of n ndependent, dentcal trals can have outcome n any of c categores. Let yjs 1 f tral has outcome n category j and yjs 0 otherwse. Then y s Ž y, y,..., y. 1 c represents a multnomal tral, wth Ý j yj s 1; for nstance, Ž 0, 0, 1, 0. denotes outcome n category 3 of four possble categores. Note that yc s redundant, beng lnearly dependent on the others. Let n js Ýyj denote the number of trals havng outcome n category j. The counts Ž n, n,..., n. 1 c have the multnomal dstrbuton. Let s PY Ž s 1. j j denote the probablty of outcome n category j for each tral. The multnomal probablty mass functon s ž / 1 c n! n 1 n n pž n 1, n,...,ncy1. s 1 c c. Ž 1.. n! n! n!

DISTRIBUTIONS FOR CATEGORICAL DATA 7 Snce Ý n s n, ths s Ž cy1. -dmensonal, wth n s n y Ž j j c n1 q qn. cy1. The bnomal dstrbuton s the specal case wth c s. For the multnomal dstrbuton, EŽ n. s n, varž n. s n 1 y, covž n, n. syn. j j j j j j k j k Ž 1.3. We derve the covarance n Secton 14.1.4. The margnal dstrbuton of each n s bnomal. j 1..3 Posson Dstrbuton Sometmes, count data do not result from a fxed number of trals. For nstance, f y s number of deaths due to automoble accdents on motorways n Italy durng ths comng week, there s no fxed upper lmt n for y Žas you are aware f you have drven n Italy.. Snce y must be a nonnegatve nteger, ts dstrbuton should place ts mass on that range. The smplest such dstrbuton s the Posson. Its probabltes depend on a sngle parameter, the mean. The Posson probablty mass functon Ž Posson 1837, p. 06. s e y y pž y. s, y s 0, 1,,.... Ž 1.4. y! It satsfes EY s varž Y. s. It s unmodal wth mode equal to the 3 3 nteger part of. Its skewness s descrbed by EYy r s 1r'. The dstrbuton approaches normalty as ncreases. The Posson dstrbuton s used for counts of events that occur randomly over tme or space, when outcomes n dsjont perods or regons are ndependent. It also apples as an approxmaton for the bnomal when n s large and s small, wth s n. Sofeach of the 50 mllon people drvng n Italy next week s an ndependent tral wth probablty 0.00000 of dyng n a fatal accdent that week, the number of deaths Y s a bnž 50000000, 0.00000. varate, or approxmately Posson wth s n s 50,000,000Ž 0.00000. s 100. A key feature of the Posson dstrbuton s that ts varance equals ts mean. Sample counts vary more when ther mean s hgher. When the mean number of weekly fatal accdents equals 100, greater varablty occurs n the weekly counts than when the mean equals 10. 1..4 Overdsperson In practce, count observatons often exhbt varablty exceedng that predcted by the bnomal or Posson. Ths phenomenon s called o erdsperson. We assumed above that each person has the same probablty of dyng n a fatal accdent n the next week. More realstcally, these probabltes vary,

8 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA due to factors such as amount of tme spent drvng, whether the person wears a seat belt, and geographcal locaton. Such varaton causes fatalty counts to dsplay more varaton than predcted by the Posson model. Suppose that Y s a random varable wth varance varžy. for gven, but tself vares because of unmeasured factors such as those just descrbed. Let s EŽ.. Then uncondtonally, EŽ Y. s E EŽ Y., varž Y. s E varž Y. q var EŽ Y.. When Y s condtonally Posson Ž gven., for nstance, then EY s EŽ. s and varž Y. s EŽ. q varž. s q varž.. Assumng a Posson dstrbuton for a count varable s often too smplstc, because of factors that cause overdsperson. The negat e bnomal s a related dstrbuton for count data that permts the varance to exceed the mean. We ntroduce t n Secton 4.3.4. Analyses assumng bnomal Ž or multnomal. dstrbutons are also sometmes nvald because of overdsperson. Ths mght happen because the true dstrbuton s a mxture of dfferent bnomal dstrbutons, wth the parameter varyng because of unmeasured varables. To llustrate, suppose that an experment exposes pregnant mce to a toxn and then after a week observes the number of fetuses n each mouse s ltter that show sgns of malformaton. Let n denote the number of fetuses n the ltter for mouse. The mce also vary accordng to other factors that may not be measured, such as ther weght, overall health, and genetc makeup. Extra varaton then occurs because of the varablty from ltter to ltter n the probablty of malformaton. The dstrbuton of the number of fetuses per ltter showng malformatons mght cluster near 0 and near n, showng more dsperson than expected for bnomal samplng wth a sngle value of. Overdsperson could also occur when vares among fetuses n a ltter accordng to some dstrbuton Ž Problem 1.1.. In Chapters 4, 1, and 13 we ntroduce methods for data that are overdspersed relatve to bnomal and Posson assumptons. 1..5 Connecton between Posson and Multnomal Dstrbutons In Italy ths next week, let y1 s number of people who de n automoble accdents, y s number who de n arplane accdents, and y3 s number who de n ralway accdents. A Posson model for Ž Y, Y, Y. 1 3 treats these as ndependent Posson random varables, wth parameters Ž 1,, 3.. The jont probablty mass functon for Y 4 s the product of the three mass functons of form Ž 1.4.. The total n s ÝY also has a Posson dstrbuton, wth parameter Ý. Wth Posson samplng the total count n s random rather than fxed. If we assume a Posson model but condton on n, Y 4 no longer have Posson dstrbutons, snce each Y cannot exceed n. Gvenn, Y 4 are also no longer ndependent, snce the value of one affects the possble range for the others.

STATISTICAL INFERENCE FOR CATEGORICAL DATA 9 For c ndependent Posson varates, wth EY s, let s derve ther condtonal dstrbuton gven that ÝY s n. The condtonal probablty of a set of counts n 4 satsfyng ths condton s P Ž Y1s n 1, Ys n,...,ycs nc. Ý Yjs n s PŽ Y1s n 1, Ys n,...,ycs nc. P Ž ÝY s n. j s s, Ž 1.5. n Ł exp y rn! n! n Ł n expž yý Ý rn! Ł n! j.ž j. where s rž Ý.4.Thssthe multnomal Žn, 4. j dstrbuton, charac- terzed by the sample sze n and the probabltes 4. Many categorcal data analyses assume a multnomal dstrbuton. Such analyses usually have the same parameter estmates as those of analyses assumng a Posson dstrbuton, because of the smlarty n the lkelhood functons. 1.3 STATISTICAL INFERENCE FOR CATEGORICAL DATA The choce of dstrbuton for the response varable s but one step of data analyss. In practce, that dstrbuton has unknown parameter values. In ths secton we revew methods of usng sample data to make nferences about the parameters. Sectons 1.4 and 1.5 cover bnomal and multnomal parameters. 1.3.1 Lkelhood Functons and Maxmum Lkelhood Estmaton In ths book we use maxmum lkelhood for parameter estmaton. Under weak regularty condtons, such as the parameter space havng fxed dmenson wth true value fallng n ts nteror, maxmum lkelhood estmators have desrable propertes: They have large-sample normal dstrbutons; they are asymptotcally consstent, convergng to the parameter as n ncreases; and they are asymptotcally effcent, producng large-sample standard errors no greater than those from other estmaton methods. Gven the data, for a chosen probablty dstrbuton the lkelhood functon s the probablty of those data, treated as a functon of the unknown parameter. The maxmum lkelhood Ž ML. estmate s the parameter value that maxmzes ths functon. Ths s the parameter value under whch the data observed have the hghest probablty of occurrence. The parameter value that maxmzes the lkelhood functon also maxmzes the log of that functon. It s smpler to maxmze the log lkelhood snce t s a sum rather than a product of terms.

10 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA We denote a parameter for a generc problem by and ts ML estmate by. ˆ The lkelhood functon s l Ž. and the log-lkelhood functon s LŽ. s logwl Ž.x. For many models, LŽ. has concave shape and ˆ s the pont at whch the dervatve equals 0. The ML estmate s then the soluton of the lkelhood equaton, LŽ. r s 0. Often, s multdmensonal, denoted by, and ˆ s the soluton of a set of lkelhood equatons. Let SE denote the standard error of, ˆ and let covž ˆ. denote the asymptotc covarance matrx of. ˆ Under regularty condtons ŽRao 1973, p. 364., covž ˆ. s the nverse of the nformaton matrx. The Ž j, k. element of the nformaton matrx s ž / L Ž ye.. Ž 1.6. The standard errors are the square roots of the dagonal elements for the nverse nformaton matrx. The greater the curvature of the log lkelhood, the smaller the standard errors. Ths s reasonable, snce large curvature mples that the log lkelhood drops quckly as moves away from ; ˆ hence, the data would have been much more lkely to occur f took a value near ˆ rather than a value far from. ˆ j k 1.3. Lkelhood Functon and ML Estmate for Bnomal Parameter The part of a lkelhood functon nvolvng the parameters s called the kernel. Snce the maxmzaton of the lkelhood s wth respect to the parameters, the rest s rrelevant. To llustrate, consder the bnomal dstrbuton Ž 1.1.. The bnomal coeffn ž/ y cent has no nfluence on where the maxmum occurs wth respect to. Thus, we gnore t and treat the kernel as the lkelhood functon. The bnomal log lkelhood s then y nyy L s log 1 y s ylog q n y y log 1 y. 1.7 Dfferentatng wth respect to yelds LŽ. r s yr y Ž n y y. rž 1 y. s Ž y y n. r Ž 1 y.. Ž 1.8. Equatng ths to 0 gves the lkelhood equaton, whch has soluton ˆ s yrn, the sample proporton of successes for the n trals. Calculatng L r, takng the expectaton, and combnng terms, we get ye L r s E yr q n y y r 1 y s nr 1 y. Ž 1.9.

STATISTICAL INFERENCE FOR CATEGORICAL DATA 11 Thus, the asymptotc varance of ˆ s Ž 1 y. rn. Ths s no surprse. Snce EY s n and varž Y. s n Ž 1 y., the dstrbuton of ˆ s Yrn has mean and standard error ( Ž 1 y. EŽ ˆ. s, Ž ˆ. s. n 1.3.3 Wald Lkelhood Rato Score Test Trad Three standard ways exst to use the lkelhood functon to perform large-sample nference. We ntroduce these for a sgnfcance test of a null hypothess H 0: s 0 and then dscuss ther relaton to nterval estmaton. They all explot the large-sample normalty of ML estmators. Wth nonnull standard error SE of, ˆ the test statstc Ž 0. z s ˆ y rse has an approxmate standard normal dstrbuton when s 0. One refers z to the standard normal table to obtan one- or two-sded P-values. Equvalently, for the two-sded alternatve, z has a ch-squared null dstrbuton wth 1 degree of freedom Ž df.; the P-value s then the rght-taled ch-squared probablty above the observed value. Ths type of statstc, usng the nonnull standard error, s called a Wald statstc Ž Wald 1943.. The multvarate extenson for the Wald test of H 0: s 0 has test statstc y1 Ž ˆ. Ž ˆ. Ž ˆ 0 0. W s y cov y. Ž The prme on a vector or matrx denotes the transpose.. The nonnull covarance s based on the curvature Ž 1.6. of the log lkelhood at. ˆ The asymptotc multvarate normal dstrbuton for ˆ mples an asymptotc ch-squared dstrbuton for W. The df equal the rank of covž ˆ., whch s the number of nonredundant parameters n. A second general-purpose method uses the lkelhood functon through the rato of two maxmzatons: Ž. 1 the maxmum over the possble parameter values under H, and Ž. 0 the maxmum over the larger set of parameter values permttng H0 or an alternatve Ha to be true. Let l denote the 0 maxmzed value of the lkelhood functon under H 0, and let l denote the 1 maxmzed value generally e., under H j H. 0 a. For nstance, for parameter vector s Ž,. 0 1 and H 0: 0s 0, l s the lkelhood functon calculated 1 at the value for whch the data would have been most lkely; l s the 0 lkelhood functon calculated at the 1 value for whch the data would have been most lkely, when 0 s 0. Then l s always at least as large as 1 l, snce l results from maxmzng over a restrcted set of the parameter 0 0 values.

1 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA The rato s l rl of the maxmzed lkelhoods cannot exceed 1. Wlks 0 1 Ž 1935, 1938. showed that y log has a lmtng null ch-squared dstrbuton, as n. The df equal the dfference n the dmensons of the parameter spaces under H0 j Ha and under H 0. The lkelhood-rato test statstc equals y log sy logž l rl. syž L0y L 1., 0 1 where L0 and L1 denote the maxmzed log-lkelhood functons. The thrd method uses the score statstc, due to R. A. Fsher and C. R. Rao. The score test s based on the slope and expected curvature of the log-lkelhood functon LŽ. at the null value 0.Itutlzes the sze of the score functon už. s LŽ. r, evaluated at. The value už. tends to be larger n absolute value when ˆ 0 0 w s farther from. Denote ye LŽ. r x e., the nformaton. 0 evaluated at by Ž.. The score statstc s the rato of už. 0 0 0 to ts null SE, whch s w Ž.x 0 1r. Ths has an approxmate standard normal null dstrbuton. The ch-squared form of the score statstc s 0 L r 0 už. s, Ž. ye LŽ. r 0 0 where the partal dervatve notaton reflects dervatves wth respect to that are evaluated at 0.Inthe multparameter case, the score statstc s a quadratc form based on the vector of partal dervatves of the log lkelhood wth respect to and the nverse nformaton matrx, both evaluated at the H estmates e., assumng that s. 0 0. Fgure 1.1 s a generc plot of a log-lkelhood LŽ. for the unvarate case. It llustrates the three tests of H 0: s 0. The Wald test uses the ˆ Ž ˆ. behavor of L at the ML estmate, havng ch-squared form rse. The SE of ˆ depends on the curvature of LŽ. at. ˆ The score test s based on the slope and curvature of LŽ. at s 0. The lkelhood-rato test combnes nformaton about LŽ. at both ˆ and 0 s 0. It compares the log-lkelhood values L at ˆ 1 and L0 at 0s 0 usng the ch-squared statstc yž L y L. 0 1. InFgure 1.1, ths statstc s twce the vertcal dstance between values of LŽ. at ˆ and at 0. In a sense, ths statstc uses the most nformaton of the three types of test statstc and s the most versatle. As n, the Wald, lkelhood-rato, and score tests have certan asymptotc equvalences Ž Cox and Hnkley 1974, Sec. 9.3.. For small to moderate sample szes, the lkelhood-rato test s usually more relable than the Wald test.

STATISTICAL INFERENCE FOR CATEGORICAL DATA 13 FIGURE 1.1 Log-lkelhood functon and nformaton used n three tests of H : s 0. 0 1.3.4 Constructng Confdence Intervals In practce, t s more nformatve to construct confdence ntervals for parameters than to test hypotheses about ther values. For any of the three test methods, a confdence nterval results from nvertng the test. For nstance, a 95% confdence nterval for s the set of 0 for whch the test of H 0: s 0 has a P-value exceedng 0.05. Let za denote the z-score from the standard normal dstrbuton havng rght-taled probablty a; ths s the 100Ž 1 y a. percentle of that dstrbuton. Let Ž a. denote the 100Ž 1 y a. df percentle of the ch-squared dstrbuton wth degrees of freedom df. 100Ž 1 y.% confdence ntervals based on asymptotc normalty use z r, for nstance z0.05 s 1.96 for 95% confdence. The Wald confdence nterval s the set of for whch ˆ y 0 0 rse z r. Ths gves the nterval ˆ z Ž SE. r. The lkelhood-rato-based confdence w Ž ˆ.x nterval s the set of for whch y L y L Ž.. w 0 0 1 Recall that s z. x 1 r When ˆ has a normal dstrbuton, the log-lkelhood functon has a parabolc shape e., a second-degree polynomal.. For small samples wth categorcal data, ˆ may be far from normalty and the log-lkelhood functon can be far from a symmetrc, parabolc-shaped curve. Ths can also happen wth moderate to large samples when a model contans many parameters. In such cases, nference based on asymptotc normalty of ˆ may have nadequate performance. A marked dvergence n results of Wald and lkelhoodrato nference ndcates that the dstrbuton of ˆ may not be close to normalty. The example n Secton 1.4.3 llustrates ths wth qute dfferent confdence ntervals for dfferent methods. In many such cases, nference can

14 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA nstead utlze an exact small-sample dstrbuton or hgher-order asymptotc methods that mprove on smple normalty Že.g., Perce and Peters 199.. The Wald confdence nterval s most common n practce because t s smple to construct usng ML estmates and standard errors reported by statstcal software. The lkelhood-rato-based nterval s becomng more wdely avalable n software and s preferable for categorcal data wth small to moderate n. For the best known statstcal model, regresson for a normal response, the three types of nference necessarly provde dentcal results. 1.4 STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS In ths secton we llustrate nference methods for categorcal data by presentng tests and confdence ntervals for the bnomal parameter, based on y successes n n ndependent trals. In Secton 1.3. we obtaned the lkelhood functon and ML estmator ˆ s yrn of. 1.4.1 Tests about a Bnomal Parameter Consder H 0: s 0. Snce H0 has a sngle parameter, we use the normal rather than ch-squared forms of Wald and score test statstcs. They permt tests aganst one-sded as well as two-sded alternatves. The Wald statstc s ˆ y 0 y ˆ 0 zw s s. Ž 1.10. SE ' ˆŽ 1 y ˆ. rn Evaluatng the bnomal score Ž 1.8. and nformaton Ž 1.9. at 0 yelds y ny y n už 0. s y, Ž 0. s. 1 y Ž 1 y. 0 0 0 0 The normal form of the score statstc smplfes to už 0. y y n 0 y ˆ 0 zs s s s. Ž 1.11 1r. Ž. n 0Ž 1 y 0. 0Ž 1 y 0. rn 0 ' Whereas the Wald statstc zw uses the standard error evaluated at ˆ, the score statstc zs uses t evaluated at 0. The score statstc s preferable, as t uses the actual null SE rather than an estmate. Its null samplng dstrbuton s closer to standard normal than that of the Wald statstc. The bnomal log-lkelhood functon Ž 1.7. equals L0 s ylog 0 q Ž n y y. logž 1 y. under H and L s y log ˆ q Ž n y y. logž 1 y ˆ. more 0 0 1 '

STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 15 generally. The lkelhood-rato test statstc smplfes to Expressed as / 0 0 ˆ 1 y ˆ yž L0y L1. s y log q Ž n y y. log. ž 1 y / y ny y yž L0y L1. s ž y log q Ž n y y. log, n n y n 0 0 t compares observed success and falure counts to ftted.e., null counts by observed Ý observed log. Ž 1.1. ftted We ll see that ths formula also holds for tests about Posson and multnomal parameters. Snce no unknown parameters occur under H0 and one occurs under H, Ž 1.1. has an asymptotc ch-squared dstrbuton wth df s 1. a 1.4. Confdence Intervals for a Bnomal Parameter A sgnfcance test merely ndcates whether a partcular value Žsuch as s 0.5. s plausble. We learn more by usng a confdence nterval to determne the range of plausble values. Invertng the Wald test statstc gves the nterval of 0 values for whch z z,or W r ( ˆ Ž 1 y ˆ. ˆ z r. Ž 1.13. n Hstorcally, ths was one of the frst confdence ntervals used for any parameter Ž Laplace 181, p. 83.. Unfortunately, t performs poorly unless n s very large Ž e.g., Brown et al. 001.. The actual coverage probablty usually falls below the nomnal confdence coeffcent, much below when s near 0 1 or 1. A smple adjustment that adds z r observatons of each type to the sample before usng ths formula performs much better Ž Problem 1.4.. The score confdence nterval contans values for whch z 0 S z r. Its endponts are the solutons to the equatons 0 ˆ y 0 r' 0Ž 1 y 0. rn s z r.

16 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA These are quadratc n. Frst dscussed by E. B. Wlson Ž 197. 0, ths nterval s n 1 z r ˆ q ž / ž / n q z r nq z r ) ž /ž / ž / ž / 1 n 1 1 z r z r ˆ Ž 1 y ˆ. q. n q z nq z nq z r r r 1 The mdpont of the nterval s a weghted average of ˆ and, where the Ž weght nr n q z. r gven ˆ ncreases as n ncreases. Combnng terms, ths Ž. Ž mdpont equals s y q z r r n q z. r r. Ths s the sample proporton for an adjusted sample that adds z r observatons, half of each type. The square of the coeffcent of z r n ths formula s a weghted average of the varance of a sample proporton when s ˆ and the varance of a sample 1 proporton when s, usng the adjusted sample sze n q z r n place of n. Ths nterval has much better performance than the Wald nterval. The lkelhood-rato-based confdence nterval s more complex computatonally, but smple n prncple. It s the set of 0 for whch the lkelhoodrato test has a P-value exceedng. Equvalently, t s the set of 0 for whch double the log lkelhood drops by less than Ž. 1 from ts value at the ML estmate ˆ s yrn. 1.4.3 Proporton of Vegetarans Example To collect data n an ntroductory statstcs course, recently I gave the students a questonnare. One queston asked each student whether he or she was a vegetaran. Of n s 5 students, y s 0 answered yes. They were not a random sample of a partcular populaton, but we use these data to llustrate 95% confdence ntervals for a bnomal parameter. Snce y s 0, ˆ s 0r5 s 0. Usng the Wald approach, the 95% confdence nterval for s 0 1.96' Ž 0.0 1.0. r5, or Ž 0, 0.. When the observaton falls at the boundary of the sample space, often Wald methods do not provde sensble answers. By contrast, the 95% score nterval equals Ž 0.0, 0.133.. Ths s a more belevable nference. For H 0: s 0.5, for nstance, the score test statstc s z S s 0 y 0.5 r' Ž 0.5 0.5. r5 sy5.0, so 0.5 does not fall n the nterval. By contrast, for H 0: s 0.10, zs s 0 y 0.10 r' Ž 0.10 0.90. r5 sy1.67, so 0.10 falls n the nterval.

STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 17 When y s 0 and n s 5, the kernel of the lkelhood functon s l Ž. s 0 Ž 1 y. 5 s Ž 1 y. 5. The log lkelhood Ž 1.7. s LŽ. s 5 logž 1 y.. Note that LŽ ˆ. s LŽ 0. s 0. The 95% lkelhood-rato confdence nterval s the set of for whch the lkelhood-rato statstc 0 yž L y L. sy LŽ. y LŽ ˆ. 0 1 0 sy50 logž 1 y 0. F 1 Ž 0.05. s 3.84. The upper bound s 1 y expž y3.84r50. s 0.074, and the confdence nterval equals Ž 0.0, 0.074.. win ths book, we use the natural logarthm throughout, so ts nverse s the exponental functon expž x. s e x. x Fgure 1. shows the lkelhood and log-lkelhood functons and the correspondng confdence regon for. The three large-sample methods yeld qute dfferent results. When s near 0, the samplng dstrbuton of ˆ s hghly skewed to the rght for small n. It s worth consderng alternatve methods not requrng asymptotc approxmatons. FIGURE 1. Bnomal lkelhood and log lkelhood when y s 0nn s 5 trals, and confdence nterval for.

18 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 1.4.4 Exact Small-Sample Inference* 1 Wth modern computatonal power, t s not necessary to rely on large-sample approxmatons for the dstrbuton of statstcs such as. ˆ Tests and confdence ntervals can use the bnomal dstrbuton drectly rather than ts normal approxmaton. Such nferences occur naturally for small samples, but apply for any n. We llustrate by testng H : s 0.5 aganst H : 0.5 for the survey 0 a results on vegetaransm, y s 0 wth n s 5. We noted that the score statstc equals z sy5.0. The exact P-value for ths statstc, based on the null bn 5, 0.5 dstrbuton, s 5 5 PŽ z G 5.0. s PŽ Ys 0orY s 5. s 0.5 q 0.5 s 0.00000006. 100Ž 1 y.% confdence ntervals consst of all 0 for whch P-values exceed n exact bnomal tests. The best known nterval ŽClopper and Pearson 1934. uses the tal method for formng confdence ntervals. It requres each one-sded P-value to exceed r. The lower and upper endponts are the solutons n to the equatons 0 n y n k nyk n k Ýž / 0 0 Ý ž / 0 0 k k ksy ks0 nyk Ž 1 y. s r and Ž 1 y. s r, except that the lower bound s 0 when y s 0 and the upper bound s 1 when y s n. When y s 1,,..., n y 1, from connectons between bnomal sums and the ncomplete beta functon and related cumulatve dstrbuton functons Ž cdf s. of beta and F dstrbutons, the confdence nterval equals y1 n y y q 1 n y y 1q 1 q, yf Ž 1 y r. Ž y q 1. F Ž r. y,ž nyyq1. Ž yq1.,ž nyy. where F Ž c. a, b denotes the 1 y c quantle from the F dstrbuton wth degrees of freedom a and b. When y s 0 wth n s 5, the Clopper Pearson 95% confdence nterval for s Ž 0.0, 0.137.. In prncple ths approach seems deal. However, there s a serous complcaton. Because of dscreteness, the actual coverage probablty for any s at least as large as the nomnal confdence level ŽCasella and Berger 001, p. 434; Neyman 1935. and t can be much greater. Smlarly, for a test of H 0: s 0 at a fxed desred sze such as 0.05, t s not usually possble to acheve that sze. There s a fnte number of possble samples, and hence a fnte number of possble P-values, of whch 0.05 may not be one. In testng H wth fxed, one can pck a partcular that can occur as a P-value. 0 0 1 Sectons marked wth an astersk are less mportant for an overvew. y1

STATISTICAL INFERENCE FOR BINOMIAL PARAMETERS 19 FIGURE 1.3 Plot of coverage probabltes for nomnal 95% confdence ntervals for bnomal parameter when n s 5. For nterval estmaton, however, ths s not an opton. Ths s because constructng the nterval corresponds to nvertng an entre range of 0 values n H 0: s 0, and each dstnct 0 value can have ts own set of possble P-values; that s, there s not a sngle null parameter value 0 as n one test. For any fxed parameter value, the actual coverage probablty can be much larger than the nomnal confdence level. When n s 5, Fgure 1.3 plots the coverage probabltes as a functon of for the Clopper Pearson method, the score method, and the Wald method. At a fxed value wth a gven method, the coverage probablty s the sum of the bnomal probabltes of all those samples for whch the resultng nterval contans that. There are 6 possble samples and 6 correspondng confdence ntervals, so the coverage probablty s a sum of somewhere between 0 and 6 bnomal probabltes. As moves from 0 to 1, ths coverage probablty jumps up or down whenever moves nto or out of one of these ntervals. Fgure 1.3 shows that coverage probabltes are too low for the Wald method, whereas the Clopper Pearson method errs n the opposte drecton. The score method behaves well, except for some values close to 0 or 1. Its coverage probabltes tend to be near the nomnal level, not beng consstently conservatve or lberal. Ths s a good method unless s very close to 0 or 1 Ž Problem 1.3.. In dscrete problems usng small-sample dstrbutons, shorter confdence ntervals usually result from nvertng a sngle two-sded test rather than two

0 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA one-sded tests. The nterval s then the set of parameter values for whch the P-value of a two-sded test exceeds. For the bnomal parameter, see Blaker Ž 000., Blyth and Stll Ž 1983., and Sterne Ž 1954. for methods. For observed outcome y o, wth Blaker s approach the P-value s the mnmum of the two one-taled bnomal probabltes PYG Ž y. and PYF Ž y. o o plus an attanable probablty n the other tal that s as close as possble to, but not greater than, that one-taled probablty. The nterval s computatonally more complex, although avalable n software Ž Blaker gave S-Plus functons.. The result s stll conservatve, but less so than the Clopper Pearson nterval. For the vegetaransm example, the 95% confdence nterval usng the Blaker exact method s Ž 0.0, 0.18. compared to the Clopper Pearson nterval of Ž 0.0, 0.137.. 1.4.5 Inference Based on the Md-P-Value* To adjust for dscreteness n small-sample dstrbutons, one can base nference on the md-p- alue Ž Lancaster 1961.. For a test statstc T wth observed value toand one-sded Hasuch that large T contradcts H 0, 1 md-p-value s P Ts to q P T t o, wth probabltes calculated from the null dstrbuton. Thus, the md-p-value s less than the ordnary P-value by half the probablty of the observed result. Compared to the ordnary P-value, the md-p-value behaves more lke the P-value for a test statstc havng a contnuous dstrbuton. The sum of ts two one-sded P-values equals 1.0. Although dscrete, under H0 ts null dstrbuton s more lke the unform dstrbuton that occurs n the contnuous case. For nstance, t has a null expected value of 0.5, whereas ths expected value exceeds 0.5 for the ordnary P-value for a dscrete test statstc. Unlke an exact test wth ordnary P-value, a test usng the md-p-value does not guarantee that the probablty of type I error s no greater than a nomnal value Ž Problem 1.19.. However, t usually performs well, typcally beng a bt conservatve. It s less conservatve than the ordnary exact test. Smlarly, one can form less conservatve confdence ntervals by nvertng tests usng the exact dstrbuton wth the md-p-value Že.g., the 95% confdence nterval s the set of parameter values for whch the md-p-value exceeds 0.05.. For testng H 0: s 0.5 aganst H a: 0.5 n the example about the proporton of vegetarans, wth y s 0 for n s 5, the result observed s the most extreme possble. Thus the md-p-value s half the ordnary P-value, or 0.00000003. Usng the Clopper Pearson nverson of the exact bnomal test but wth the md-p-value yelds a 95% confdence nterval of Ž 0.000, 0.113. for, compared to Ž 0.000, 0.137. for the ordnary Clopper Pearson nterval. The md-p-value seems a sensble compromse between havng overly conservatve nference and usng rrelevant randomzaton to elmnate prob-

STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 1 lems from dscreteness. We recommend t both for tests and confdence ntervals wth hghly dscrete dstrbutons. 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS We now present nference for multnomal parameters 4 j.ofn observa- tons, n j occur n category j, j s 1,...,c. 1.5.1 Estmaton of Multnomal Parameters Frst, we obtan ML estmates of 4.Asafuncton of 4 j j,themultnomal probablty mass functon Ž 1.. s proportonal to the kernel Ł j n j where all G 0 and s 1. Ž 1.14. Ý j j j j The ML estmates are the 4 that maxmze Ž 1.14. j. The multnomal log-lkelhood functon s LŽ. s Ý n j log j. j To elmnate redundances, we treat L as a functon of Ž,...,. 1 cy1, snce s 1 y Ž q q. c 1 cy1. Thus, cr jsy1, j s 1,...,c y 1. Snce log c 1 c 1 s sy, j c j c dfferentatng L wth respect to gves the lkelhood equaton j LŽ. nj nc s y s0. j j c The ML soluton satsfes ˆ jr ˆcs n jrn c. Now ˆ c ž Ý n j / j ˆ c n Ý ˆ j s 1 s s, n n j so ˆ cs ncrn and then ˆ js n jrn. From general results presented later n the book Ž Secton 8.6., ths soluton does maxmze the lkelhood. Thus, the ML estmates of 4 are the sample proportons. j c c

INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 1.5. Pearson Statstc for Testng a Specfed Multnomal In 1900 the emnent Brtsh statstcan Karl Pearson ntroduced a hypothess test that was one of the frst nferental methods. It had a revolutonary mpact on categorcal data analyss, whch had focused on descrbng assocatons. Pearson s test evaluates whether multnomal parameters equal certan specfed values. Hs orgnal motvaton n developng ths test was to analyze whether possble outcomes on a partcular Monte Carlo roulette wheel were equally lkely Ž Stgler 1986.. Consder H 0: js j0, j s 1,...,c, where Ý j j0s 1. When H0 s true, the expected values of n 4 j,called expected frequences, are js n j0, j s 1,..., c. Pearson proposed the test statstc Ž n jy j. X s. Ž 1.15. Ý j 4 Greater dfferences n jy j produce greater X values, for fxed n. Let Xo Ž denote the observed value of X. The P-value s the null value of P X G X. o. Ths equals the sum of the null multnomal probabltes of all count arrays havng a sum of n wth X G X o. For large samples, X has approxmately a ch-squared dstrbuton wth Ž. df s c y 1. The P-value s approxmated by P cy1 G X o, where cy1 denotes a ch-squared random varable wth df s c y 1. Statstc Ž 1.15. s called the Pearson ch-squared statstc. j 1.5.3 Example: Testng Mendel s Theores Among ts many applcatons, Pearson s test was used n genetcs to test Mendel s theores of natural nhertance. Mendel crossed pea plants of pure yellow stran wth plants of pure green stran. He predcted that second-generaton hybrd seeds would be 75% yellow and 5% green, yellow beng the domnant stran. One experment produced n s 803 seeds, of whch n s 1 60 were yellow and n s 001 were green. The expected frequences for H : s 0.75, s 0.5 are s 803Ž 0.75. s 6017.5 and s 005.75. 0 10 0 1 The Pearson statstc X s 0.015 Ž df s 1. has a P-value of P s 0.90. Ths does not contradct Mendel s hypothess. Mendel performed several experments of ths type. In 1936, R. A. Fsher summarzed Mendel s results. He used the reproductve property of chsquared: If X1,..., Xk are ndependent ch-squared statstcs wth degrees of freedom,...,, then Ý X has a ch-squared dstrbuton wth df s 1 k Ý. Fsher obtaned a summary ch-squared statstc equal to 4, wth df s 84. A ch-squared dstrbuton wth df s 84 has mean 84 and standard devaton Ž 84. 1r s 13.0, and the rght-taled probablty above 4 s P s 0.99996. In other words, the ch-squared statstc was so small that the ft seemed too good.

STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 3 Fsher commented: The general level of agreement between Mendel s expectatons and hs reported results shows that t s closer than would be expected n the best of several thousand repettons.... I have no doubt that Mendel was deceved by a gardenng assstant, who knew only too well what hs prncpal expected from each tral made. In a letter wrtten at the tme Ž see Box 1978, p. 97., he stated: Now, when data have been faked, I know very well how generally people underestmate the frequency of wde chance devatons, so that the tendency s always to make them agree too well wth expectatons. In summary, goodness-of-ft tests can reveal not only when a ft s nadequate, but also when t s better than random fluctuatons would have us expect. wr. A. Fsher s daughter, Joan Fsher Box Ž1978, pp. 95 300., and Freedman et al. Ž 1978, pp. 40 48, 478. dscussed Fsher s analyss of Mendel s data and the accompanyng controversy. Despte possble dffcultes wth Mendel s data, subsequent work led to general acceptance of hs theores.x 1.5.4 Ch-Squared Theoretcal Justfcaton* We now outlne why Pearson s statstc has a lmtng ch-squared dstrbuton. For a multnomal sample Ž n,..., n. 1 c of sze n, the margnal dstrbuton of n s the bnž n,. j j dstrbuton. For large n, bythe normal approxma- ton to the bnomal, n j Ž and ˆ js n jrn. have approxmate normal dstrbutons. More generally, by the central lmt theorem, the sample proportons ˆ s Ž n rn,..., n rn. 1 cy1 have an approxmate multvarate normal dstrbuton Ž Secton 14.1.4.. Let denote the null covarance matrx of ' 0 n, ˆ and let s Ž,...,.. Under H, snce ' n Ž y. 0 10 cy1,0 0 ˆ 0 converges to a NŽ 0,. dstrbuton, the quadratc form 0 y1 Ž ˆ 0. 0 Ž ˆ 0. n y y Ž 1.16. has dstrbuton convergng to ch-squared wth df s c y 1. In Secton 14.1.4 we show that the covarance matrx of ' n ˆ has elements ½ y j k f j k jk s. j Ž 1 y j. f j s k y1 The matrx has Ž j, k. th element 1r when j k and Ž 1r q 1r. 0 c0 j0 c0 Ž y1 when j s k. You can verfy ths by showng that 0 0 equals the dentty matrx.. Wth ths substtuton, drect calculaton Žwth approprate combnng. of terms shows that 1.16 smplfes to X.InSecton 14.3 we provde a formal proof n a more general settng. Ths argument s smlar to Pearson s n 1900. R. A. Fsher Ž 19. gave a smpler justfcaton, the gst of whch follows: Suppose that Ž n,..., n. 1 c are ndependent Posson random varables wth means Ž,...,.. For large 1 c

4 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA 4, the standardzed values z s Ž n y. r 4 j j j j j have approxmate standard normal dstrbutons. Thus, Ý jz j s X has an approxmate ch-squared dstrbuton wth c degrees of freedom. Addng the sngle lnear constrant Ý Ž n y. j j j s 0, thus convertng the Posson dstrbutons to a multnomal, we lose a degree of freedom. When c s, Pearson s X smplfes to the square of the normal score statstc Ž 1.11.. For Mendel s data, ˆ 1 s 60r803, 10 s 0.75, n s 803, and z S s 0.13, for whch X s 0.13 s 0.015. In fact, for general c the Pearson test s the score test about multnomal parameters. ' 1.5.5 Lkelhood-Rato Ch-Squared An alternatve test for multnomal parameters uses the lkelhood-rato test. The kernel of the multnomal lkelhood s Ž 1.14.. Under H0 the lkelhood s maxmzed when ˆ js j0.inthe general case, t s maxmzed when ˆ j s n jrn. The rato of the lkelhoods equals n j Ł j Ž j0. s n j. Ł n rn j j Thus, the lkelhood-rato statstc, denoted by G,s Ý j j j0 G sy log s n log n rn. Ž 1.17. Ths statstc, whch has form Ž 1.1., s called the lkelhood-rato ch-squared statstc. The larger the value of G, the greater the evdence aganst H 0. In the general case, the parameter space conssts of 4 j subject to Ý s 1, so the dmensonalty s c y 1. Under H,the 4 j j 0 j are specfed completely, so the dmenson s 0. The dfference n these dmensons equals c y 1.For large n, G has a ch-squared null dstrbuton wth df s c y 1. When H0 holds, the Pearson X and the lkelhood rato G both have asymptotc ch-squared dstrbutons wth df s c y 1. In fact, they are asymptotcally equvalent n that case; specfcally, X y G converges n probablty to zero Ž Secton 14.3.4.. When H0 s false, they tend to grow proportonally to n; they need not take smlar values, however, even for very large n. For fxed c, as n ncreases the dstrbuton of X usually converges to ch-squared more quckly than that of G. The ch-squared approxmaton s usually poor for G when nrc 5. When c s large, t can be decent for X for nrc as small as 1 f the table does not contan both very small and moderately large expected frequences. We provde further gudelnes n Secton 9.8.4. Alternatvely, one can use the multnomal probabltes to generate exact dstrbutons of these test statstcs Ž Good et al. 1970..

STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS 5 1.5.6 Testng wth Estmated Expected Frequences Pearson s X Ž 1.15. compares a sample dstrbuton to a hypothetcal one 4.Insomeapplcatons, s Ž.4 j0 j0 j0 are functons of a smaller set of unknown parameters. ML estmates ˆ of determne ML estmates Ž ˆ.4 of 4 and hence ML estmates s n Ž ˆ.4 j0 j0 ˆ j j0 of expected frequen- 4 4 ces n X. Replacng j by estmates ˆ j affects the dstrbuton of X. When dmž. s p, the true df s Ž c y 1. y p Ž Secton 14.3.3.. Pearson faled to realze ths Ž Secton 16... We now show a goodness-to-ft test wth estmated expected frequences. A sample of 156 dary calves born n Okeechobee County, Florda, were classfed accordng to whether they caught pneumona wthn 60 days of brth. Calves that got a pneumona nfecton were also classfed accordng to whether they got a secondary nfecton wthn weeks after the frst nfecton cleared up. Table 1.1 shows the data. Calves that dd not get a prmary nfecton could not get a secondary nfecton, so no observatons can fall n the category for no prmary nfecton and yes secondary nfecton. That combnaton s called a structural zero. A goal of ths study was to test whether the probablty of prmary nfecton was the same as the condtonal probablty of secondary nfecton, gven that the calf got the prmary nfecton. In other words, f ab denotes the probablty that a calf s classfed n row a and column b of ths table, the null hypothess s H 0: 11 q 1 s 11rŽ 11 q 1. or 11 s 11 q 1. Let s 11 q 1 denote the probablty of prmary nfecton. The null hypothess states that the probabltes satsfy the structure that Table 1. shows; that s, probabltes n a trnomal for the categores Ž yes yes, yes no, no no. for prmary secondary nfecton equal Ž, Ž 1 y.,1y.. Let n denote the number of observatons n category Ž a, b. ab. The ML estmate of s the value maxmzng the kernel of the multnomal lkelhood n n 11 1 n Ž. Ž y. Ž 1 y.. TABLE 1.1 Prmary and Secondary Pneumona Infectons n Calves Secondary Infecton a Prmary Infecton Yes No Yes 30 Ž 38.1. 63 Ž 39.0. No 0 Ž. 63 Ž 78.9. Source: Data courtesy of Thang Tran and G. A. Donovan, College of Veternary Medcne, Unversty of Florda. a Values n parentheses are estmated expected frequences.

6 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA TABLE 1. Probablty Structure for Hypothess Secondary Infecton Prmary Infecton Yes No Total Yes Ž 1 y. No 1 y 1 y The log lkelhood s LŽ. s n11 log q n1 logž y. q n logž 1 y.. Dfferentaton wth respect to gves the lkelhood equaton The soluton s n n n n 11 1 1 q y y s0. 1 y 1 y ˆ s Ž n11 q n1. rž n11 q n1 q n.. For Table 1.1, ˆ s 0.494. Snce n s 156, the estmated expected frequen- ces are ˆ s n s 38.1, s nž y. s 39.0, and s nž 1 y. 11 ˆ ˆ1 ˆ ˆ ˆ ˆ s 78.9. Table 1.1 shows them. Pearson s statstc s X s 19.7. Snce the c s 3 possble responses have p s 1 parameter Ž. determnng the expected frequences, df s Ž 3 y 1. y 1 s 1. There s strong evdence aganst H Ž 0 Ps 0.00001.. Inspecton of Table 1.1 reveals that many more calves got a prmary nfecton but not a secondary nfecton than H0 predcts. The researchers concluded that the prmary nfecton had an mmunzng effect that reduced the lkelhood of a secondary nfecton. NOTES Secton 1.1: Categorcal Response Data 1.1. Stevens Ž 1951. defned Ž nomnal, ordnal, nterval. scales of measurement. Other scales result from mxtures of these types. For nstance, partally ordered scales occur when subjects respond to questons havng categores ordered except for don t know or undecded categores. Secton 1.3: Statstcal Inference for Categorcal Data 1.. The score method does not use. ˆ Thus, when s a model parameter, one can usually compute the score statstc for testng H 0: s 0 wthout fttng the model. Ths s advantageous when fttng several models n an exploratory analyss and model fttng s computatonally ntensve. An advantage of the score and lkelhood-rato methods s that

PROBLEMS 7 they apply even when ˆ s. In that case, one cannot compute the Wald statstc. Another dsadvantage of the Wald method s that ts results depend on the parameterzaton; nference based on ˆ and ts SE s not equvalent to nference based on a nonlnear functon of t, such as log ˆ and ts SE. Secton 1.4: Statstcal Inference for Bnomal Parameters 1.3. Among others, Agrest and Coull Ž 1998., Blyth and Stll Ž 1983., Brown et al. Ž 001., Ghosh Ž 1979., and Newcombe Ž 1998a. showed the superorty of the score nterval to the Wald nterval for. Ofthe exact methods, Blaker s Ž 000. has partcularly good propertes. It s contaned n the Clopper Pearson nterval and has a nestedness property whereby an nterval of hgher nomnal confdence level necessarly contans one of lower level. 1.4. Usng contnuty correctons wth large-sample methods provdes approxmatons to exact small-sample methods. Thus, they tend to behave conservatvely. We do not present them, snce f one prefers an exact method, wth modern computatonal power t can be used drectly rather than approxmated. 1.5. In theory, one can elmnate problems wth dscreteness n tests by performng a supplementary randomzaton on the boundary of a crtcal regon Ž see Problem 1.19.. In rejectng the null at the boundary wth a certan probablty, one can obtan a fxed overall type I error probablty even when t s not an achevable P-value. For such randomzaton, the one-sded P y value s randomzed P-value s U PŽ Ts t. q PŽ T t., o o where U denotes a unform Ž 0, 1. random varable Ž Stevens 1950.. In practce, ths s not used, as t s absurd to let ths random number nfluence a decson. The md P-value replaces the arbtrary unform multple U PTs Ž t. by ts expected value. o Secton 1.5: Statstcal Inference for Multnomal Parameters 1.6. The ch-squared dstrbuton has mean df, varance df, and skewness Ž 8rdf. 1r. It s approxmately normal when df s large. Greenwood and Nkuln Ž 1996., Kendall and Stuart Ž 1979., and Lancaster Ž 1969. presented other propertes. Cochran Ž 195. presented a hstorcal survey of ch-squared tests of ft. See also Cresse and Read Ž 1989., Koch and Bhapkar Ž 198., Koehler Ž 1998., and Moore Ž 1986b.. PROBLEMS Applcatons 1.1 Identfy each varable as nomnal, ordnal, or nterval. a. UK poltcal party preference ŽLabour, Conservatve, Socal Democrat. b. Anxety ratng Ž none, mld, moderate, severe, very severe. c. Patent survval Ž n number of months. d. Clnc locaton Ž London, Boston, Madson, Rochester, Montreal.

8 INTRODUCTION: DISTRIBUTIONS AND INFERENCE FOR CATEGORICAL DATA e. Response of tumor to chemotherapy Žcomplete elmnaton, partal reducton, stable, growth progresson. f. Favorte beverage Ž water, juce, mlk, soft drnk, beer, wne. g. Apprasal of company s nventory level Žtoo low, about rght, too hgh. 1. Each of 100 multple-choce questons on an exam has four possble answers, one of whch s correct. For each queston, a student guesses by selectng an answer randomly. a. Specfy the dstrbuton of the student s number of correct answers. b. Fnd the mean and standard devaton of that dstrbuton. Would t be surprsng f the student made at least 50 correct responses? Why? c. Specfy the dstrbuton of Ž n, n, n, n. 1 3 4, where n j s the number of tmes the student pcked choce j. d. Fnd En, varž n., covž n, n., and corrž n, n. j j j k j k. 1.3 An experment studes the number of nsects that survve a certan dose of an nsectcde, usng several batches of nsects of sze n each. The nsects are senstve to factors that vary among batches durng the experment but were not measured, such as temperature level. Explan why the dstrbuton of the number of nsects per batch survvng the experment mght show overdsperson relatve to a bnž n,. dstrbuton. 1.4 In hs autobography A Sort of Lfe, Brtsh author Graham Greene descrbed a perod of severe mental depresson durng whch he played Russan Roulette. Ths game conssts of puttng a bullet n one of the sx chambers of a pstol, spnnng the chambers to select one at random, and then frng the pstol once at one s head. a. Greene played ths game sx tmes and was lucky that none of them resulted n a bullet frng. Fnd the probablty of ths outcome. b. Suppose that he had kept playng ths game untl the bullet fred. Let Y denote the number of the game on whch t fres. Show the probablty mass functon for Y, and justfy. 1.5 Consder the statement, Please tell me whether or not you thnk t should be possble for a pregnant woman to obtan a legal aborton f she s marred and does not want any more chldren. For the 1996 General Socal Survey, conducted by the Natonal Opnon Research Center Ž NORC., 84 repled yes and 98 repled no. Let denote

PROBLEMS 9 the populaton proporton who would reply yes. Fnd the P-value for testng H 0: s 0.5 usng the score test, and construct a 95% confdence nterval for. Interpret the results. 1.6 Refer to the vegetaransm example n Secton 1.4.3. For testng H 0: s 0.5 aganst H a: 0.5, show that: a. The lkelhood-rato statstc equals w5logž 5r1.5.x s 34.7. b. The ch-squared form of the score statstc equals 5.0. c. The Wald z or ch-squared statstc s nfnte. 1.7 In a crossover tral comparng a new drug to a standard, denotes the probablty that the new one s judged better. It s desred to estmate and test H 0: s 0.5 aganst H a: 0.5. In 0 ndependent observatons, the new drug s better each tme. a. Fnd and sketch the lkelhood functon. Gve the ML estmate of. b. Conduct a Wald test and construct a 95% Wald confdence nterval for. Are these sensble? c. Conduct a score test, reportng the P-value. Construct a 95% score confdence nterval. Interpret. d. Conduct a lkelhood-rato test and construct a lkelhood-based 95% confdence nterval. Interpret. e. Construct an exact bnomal test and 95% confdence nterval. Interpret. f. Suppose that researchers wanted a suffcently large sample to estmate the probablty of preferrng the new drug to wthn 0.05, wth confdence 0.95. If the true probablty s 0.90, about how large a sample s needed? 1.8 In an experment on chlorophyll nhertance n maze, for 1103 seedlngs of self-fertlzed heterozygous green plants, 854 seedlngs were green and 49 were yellow. Theory predcts the rato of green to yellow s 3:1. Test the hypothess that 3:1 s the true rato. Report the P-value, and nterpret. 1.9 Table 1.3 contans Ladslaus von Bortkewcz s data on deaths of solders n the Prussan army from kcks by army mules ŽFsher 1934; Qune and Seneta 1987.. The data refer to 10 army corps, each observed for 0 years. In 109 corps-years of exposure, there were no deaths, n 65 corps-years there was one death, and so on. Estmate the mean and test whether probabltes of occurrences n these fve categores follow a Posson dstrbuton Ž truncated for 4 and above..