nternatonal Statstcal Revew (2004), 72, 2, 257 284, Prnted n The Netherlands c nternatonal Statstcal nsttute Sple Correspondence Analyss: A Bblographc Revew Erc. Beh School of Quanttatve ethods and atheatcal Scences, Unversty of Western Sydney, Australa Suary Over the past few decades correspondence analyss has ganed an nternatonal reputaton as a powerful statstcal tool for the graphcal analyss of contngency tables. Ths popularty stes fro ts developent and applcaton n any European countres, especally France, and ts use has spread to Englsh speakng natons such as the Unted States and the Unted Kngdo. ts growng popularty aongst statstcal practtoners, and ore recently those dscplnes where the role of statstcs s less donant, deonstrates the portance of the contnung research and developent of the ethodology. The a of ths paper s to hghlght the theoretcal, practcal and coputatonal ssues of sple correspondence analyss and dscuss ts relatonshp wth recent advances that can be used to graphcally dsplay the assocaton n two-way categorcal data. Key words: Coputng ssues; Correspondence analyss faly; Graphcal dsplays; nerta; Ordered categores; Orthogonal polynoals; Pearson rato; Profle; Reconsttuton odel; Sngular value decoposton; Transton forula; Two-way contngency table. 1 ntroducton The analyss of the contngency table, s a very portant coponent of ultvarate statstcs wth any dfferent types of analyss dedcated solely to ths type of data set. Fenberg (1982) ponts out that the ter contngency sees to have orgnated wth Karl Pearson (1904) who used t to descrbe the easure of the devaton fro coplete ndependence between the rows and coluns of such a data structure. ore recently, the ter has coe to refer to the counts and the argnal frequences n the contngency table. As a result, a contngency table contans nforaton whch s of a dscrete or categorcal nature. The developent of technques to handle probles nvolvng contngency tables are due ost portantly to Karl Pearson, G. Udny Yule and R.A. Fsher (see Goodan, 1996). One of the ost nfluental technques developed to easure the assocaton between two categorcal varables s the Pearson ch-squared statstc. Pearson (1900) developed the ground work for the ch-squared statstc whch s used to copare the observed counts wth what s expected under the hypothess of ndependence between the two varables. One of the frst exaples used to nvestgate the applcaton of easurng assocaton n contngency tables was that of Fsher (1940) and s reproduced here as Table 1. t s the cross-classfcaton of 5387 chldren fro Cathness, Scotland accordng to ther har and eye colour. Fsher was nterested n deternng how the two varables were assocated. Goodan (1981) also consdered ths exaple n hs nvestgaton of assocaton for contngency tables where the varables consst of ordered responses.
258 E.. BEH Table 1 Two-way contngency table classfyng 5387 chldren n Cathness, Scotland, accordng to har colour and eye colour. Eye Har Colour Colour Far Red edu Dark Black BLUE 326 38 241 110 3 LGHT 688 116 584 188 4 EDU 343 84 909 412 26 DARK 98 48 403 681 85 A test of the departure fro ndependence between har colour and eye colour produces a Pearson ch-squared statstc of 1240.039 whch s hghly sgnfcant. Ths sple exaple deonstrates that, whle the Pearson ch-squared statstc has long been used to deterne the level of assocaton between two varables, t does not dvulge how the assocaton s constructed, nor does the statstc allow for an nvestgaton of slar, or dfferent, categores. any ethodologes have been dscussed n the lterature whch do pert an nvestgaton of these ssues, however t s not the purpose of ths paper to revew these. nstead we wll focus our dscusson on the developent and applcaton of correspondence analyss to two-way contngency tables; such an analyss s coonly referred to as sple or, ore recently, as classcal, correspondence analyss. The ter sple s not eant to reflect the ease of executon, or nterpretaton, of the analyss. nstead t refers to ts applcaton to the ost basc, or sple, data set a two-way contngency table, as apposed to ultple correspondence analyss whch apples to ore than two categorcal varables. The ter classcal has also been used to descrbe the orgnal graphcal ethodology developed snce there are adustents to the classcal approach that can be pleented. Correspondence analyss s a technque that represents graphcally the row and colun categores and allows for a coparson of ther correspondences, or assocatons, at a category level. For exaple, Fgure 1 shows a two-densonal space where the assocaton between the row and colun categores of Table 1 can be vsualsed. t shows that, n general, the far hared chldren n Cathness, Scotland, tend to have blue or lght coloured eyes, whle dark hared chldren tend to be those wth dark coloured eyes. Ths fgure s referred to as a correspondence plot, and s an portant coponent of the output generated fro the classcal correspondence analyss of Table 1. Also fro the output, we fnd that the frst denson vsualses 86.56% of the assocaton between the row and colun categores, whle the second axs vsualses 13.07%. Thus Fgure 1 vsually shows 99.63% of the assocaton that exsts between the har colour and eye colour of the chldren classfed n Cathness. n parts of Secton 3 we wll dscuss partcular ponts relevant to the constructon of Fgure 1 and the analyss of Table 1. Part of ths revew nvolves dscussng adaptatons of correspondence analyss ade over the years for ts applcaton to coplex cross-classfcatons such as ordnal data, ranked data, cohort data etc. t should be evdent fro ths paper that correspondence analyss s applcable n a dverse range of stuatons. For exaple, correspondence analyss has been benefcal n the areas of socal scence, engneerng, health scence, edcne, archeology, ecology, software developent and arket research. The paper conssts of eght further Sectons. Secton 2 dscusses soe of the portant developents of correspondence analyss ade between the early 20th century to the present. An excellent revew of the hstory of the ethodology, fro the French perspectve, can be found n van eter, Schltz, Cbos & ouner (1994). Secton 3 brefly descrbes soe techncal aspects of correspondence analyss, ncludng ethods of decoposng the Pearson ratos of two-way contngency tables, the defnton of profle co-ordnates and ts relatonshp wth the Pearson ch-squared statstc. Whle ultvarate statstcs
Sple Correspondence Analyss: A Bblographc Revew 259 Prncpal Axs 2-0.3-0.2-0.1 0.0 0.1 0.2 0.3 # far * * BLUE LGHT # red DARK * # # black dark # EDU edu * -0.3-0.2-0.1 0.0 0.1 0.2 0.3 Prncpal Axs 1 Fgure 1. Correspondence plot of Table 1. s generally descrbed usng atrx notaton, we steer away fro t n ths paper to help splfy the dscusson. Secton 4 descrbes soe ways to odel two-way contngency tables usng correspondence analyss and descrptons of the ebers of the correspondence analyss faly are ade n Secton 5. Secton 6 outlnes any of the text books that dscuss n detal the advanceent of research nto correspondence analyss. Accopanyng these texts s a very general revew of applcatons ade usng the analyss. There are any applcatons of correspondence analyss n nearly all dscplnes and those dscussed n ths paper are not ntended to fully suarse such applcatons. nstead, an a of ths paper s to deonstrate the ethod s dversty n appled stuatons. n Secton 7, we dscuss the developent of varous coputng ssues that have evolved for the applcaton correspondence analyss. We outlne soe of the early progras used to ake such an analyss, and nclude a dscusson of soe of the coercally avalable software that perfors correspondence analyss. Soe other ssues to do wth correspondence analyss are dscussed n Secton 8 and soe fnal rearks are ade n Secton 9. 2 Developent of Correspondence Analyss The theoretcal ssues assocated wth correspondence analyss date back to the early 20th century and ts foundaton s algebrac rather than geoetrc. The foundaton of the technque was nearly lad wth the 1904 and 1906 papers of Karl Pearson, as argued by de Leeuw (1983), when he developed the correlaton coeffcent of a two-way contngency table usng lnear regresson. As Pearson (1906) states: The concepton of lnear regresson lne as gvng ths arrangeent wth the axu
260 E.. BEH degree of correlaton appears of consderable phlosophcal nterest. t aounts prarly to uch the sae thng as sayng that f we have a fne classfcaton, we shall get the axu correlaton by arrangng the arrays so that the eans of the arrays fall as closely as possble on a lne. de Leeuw (1983) then notes: ths s exactly what correspondence analyss does. Pearson ust was not falar wth sngular value decoposton, although ths had been dscovered uch earler by Beltra, Sylvester and ordan. See also Gf (1990). However, the orgnal algebrac dervaton of correspondence analyss s often accredted to Hrschfeld (1935) who developed a forulaton of the correlaton between the rows and coluns of a two-way contngency table. Others to contrbute to such developents nclude Rchardson & Kuder (1933) and Horst (1935). n fact, Horst, who dscussed hs fndngs n early 1934 before the Psychology Secton of the Oho Acadey of Scence, was the frst to con the ter ethod of recprocal averagng, an alternatve dervaton of correspondence analyss. The splest dervaton of correspondence analyss was ade by the boetrcan R.A. Fsher n 1940 when he consdered data relatng to har and eye colour n a saple of chldren fro Cathness, Scotland, see Table 1. Whle the orgnal developent of the proble aed at dealng wth two-way contngency tables, a ore coplex approach dealng wth ult-way contngency tables was not dscussed untl 1941 when psychoetrcan Lous Guttan dscussed hs ethod, called dual (or optal) scalng, whch s now referred to as the foundaton of ultple correspondence analyss. Later applcatons of ultple correspondence analyss were consdered usng the Burt atrx of Burt (1950). n fact Guttan (1953) wrtes of Burt: t s gratfyng to see how Professor Burt has ndependently arrved at uch the sae forulaton. Ths convergence of thnkng lends credence to the slarty of the approach. Fsher and Guttan presented essentally the sae theory n the boetrc and psychoetrc lterature. Thus boetrcans regard Fsher as the nventor of correspondence analyss, whle psychoetrcans regard t as beng Guttan. n the 1940 s and 1950 s further advances were ade to the atheatcal developent of correspondence, partcularly n the feld of psychoetrcs, by Guttan and hs researchers. n apan, a group of data analysts led by Chko Hayash also further developed Guttan s deas, whch they referred to as the quantfcaton of qualtatve data. The 1960 s saw the bggest leap n the developent of correspondence analyss when t was gven a geoetrc for by lngust ean-paul Benzécr and hs tea of researchers at the atheatcal Statstcs Laboratory, Faculty of Scence n Pars, France. Ths work culnated n two volues on data analyss; Benzécr (1973b, 1973a). As a result the ethod of l analyse des correspondances, as coned by Benzécr, s very popular n France not ust aong statstcans, but aong researchers fro ost dscplnes n the country. The popularty of correspondence analyss n France resulted n a ournal dedcated to the developent and applcaton of the technque as well as ethods of classfcaton, Cahers de l Analyse des Données, founded by Benzécr. n 1974, ths new ethod was wdely exposed to Englsh speakng researchers wth the popular paper by.o. Hll (Hll, 1974). He was the frst to con the ethod s nae correspondence analyss whch s the Englsh translaton of Benzécr s l analyse des correspondances. Hll showed that the ethod s atheatcally slar to already popular ethods of data analyss such as prncpal
Sple Correspondence Analyss: A Bblographc Revew 261 coponents analyss, canoncal correlaton analyss and recprocal averagng (whch he dscussed the prevous year). Snce Hll s (1974) contrbuton, the theory of correspondence analyss, especally ts applcaton to ultvarate data, has been renvented any tes and gven dfferent naes, such as hoogenety analyss (Gf, 1990) and dual scalng (Nshsato, 1980, 1994). 3 Theoretcal Developent 3.1 Notaton Consder an two-way contngency table, N, where the -th cell entry s gven by n for and. Let the grand total of N be n and the correspondence atrx, or atrx of relatve frequences, be P so that the -th cell entry s p n n and p. Defne the -th row argnal proporton by p p and defne the -th colun argnal proporton as p p. The row argnal proportons are called row asses and the colun probablty argnals are called colun asses. 3.2 Pearson Rato s The a of correspondence analyss, lke any ultvarate data analytc technques, s to deterne scores whch descrbe how slar or dfferent responses fro two or ore varables are. For a two-way contngency table, upon whch our dscusson s focused, the strength of assocaton between the rows scores and colun scores should also be easured. For our dscusson of sple correspondence analyss, frst consder the odel of coplete ndependence between the rows and coluns: p p p (1) Of course, coplete ndependence wll hardly ever be satsfed, and so a ultplcatve easure of the departure fro the odel of coplete ndependence can be consdered such that p p p (2) For the odel of coplete ndependence gven by (1), for all and. As coplete ndependence wll seldo be observed, one can deterne whch eleents. These eleents can easly be observed by calculatng p p p (3) whch Goodan (1996) refers to as Pearson rato s. By consderng the Pearson rato s, the Pearson ch-squared statstc can be expressed by X n p p (4) and has a Pearson ch-squared dstrbuton wth degrees of freedo;. Therefore, a sall Pearson ch-squared statstc whch s consstent wth the hypothess of ndependence (dependng on the degrees of freedo), wll be acheved when each. A property of the Pearson ch-squared statstc s that as n ncreases, so too does the statstc. Ths
262 E.. BEH can hnder tests of assocaton n contngency tables. To overcoe ths proble sple correspondence analyss consders X n referred to as the total nerta of the contngency table to descrbe the level of assocaton, or dependence, between two categorcal varables. By decoposng the total nerta the researcher can dentfy portant sources of nforaton that help descrbe ths assocaton. Usng dfferent decopostons wll yeld dfferent nterpretatons of the assocaton, and lead to dfferent graphcal outputs. The ost coon type of decoposton used, wth a few exceptons, n correspondence analyss s sngular value decoposton (SVD). The next subsecton descrbes the use of SVD to perfor sple correspondence analyss. Other types of decopostons can be used, and two others are descrbed at the end of ths secton. 3.3 Sngular Value Decoposton Classcally, sple correspondence analyss s conducted by perforng a sngular value decoposton (SVD) on the Pearson rato s. The ethod of SVD, also referred to as the Eckart Young decoposton, s the ost coon tool used to decopose the Pearson rato s. For the applcaton to the analyss of contngency tables, Eckart & Young (1936) conectured that the Pearson rato ay be decoposed nto coponents by a b (5) where s the axu nuber of densons requred to graphcally depct the assocaton between the row and colun responses. For exaple, for Table 1, only densons are requred to graphcally depct all of the assocaton between the har and eye colour of the chldren classfed n Cathness. However, for a sple nterpretaton of ths assocaton, generally only the frst two densons are used to construct such a graphcal suary. The result, (5), was forally proven for any rectangular atrx by ohnson (1963). Consder the RHS of equaton (5). The vector a a a s the -th row sngular vector and s assocated wth the row categores. Slarly, the vector b b b s the -th colun sngular vector and s assocated wth the colun categores. The eleents of vector are real and postve and are the frst sngular values and are arranged n descendng order so that These sngular values can also be calculated by (6) a b p (7) whle the sngular vectors have the property p a a p b b (8) To reove the trval values of, a and b, consder agan equaton (5). t becoes
Sple Correspondence Analyss: A Bblographc Revew 263 a b (9) By consderng ths expresson, the Pearson ch-squared statstc can be expressed as follows X n p p n p p a b n p a p b By usng the orthogonalty propertes of a and b, the total nerta can be wrtten n ters of the sngular values such that X n (10) For exaple,, and so that X n for Table 1. So the frst axs explans of the total varaton that exsts n the table, whle the second axs explans of ths varaton. Thus Fgure 1 accounts for 99.63% of the total varaton n Table 1. Hence, the total varaton n the contngency table (or the Pearson ch-squared statstc) can be parttoned nto coponents, whch are referred to as the prncpal nerta values. Each prncpal nerta can be parttoned further nto sub-coponents to dentfy how a partcular row or colun category contrbutes to the prncpal axs. The frst prncpal axs, wth an nerta value of s the axs that descrbes ost of the varaton. Generally, the -th prncpal axs s the -th ost portant axs and a correspondence plot contanng the frst two axes wll be ore descrptve than f any other axes were ncluded. 3.4 Co-ordnate Systes 3.4.1 Standard profle co-ordnates n order to vsualse the assocatons between row categores or colun categores, the set of sngular vectors, a and b for and ay be plotted as coordnates onto the -th denson of a correspondence plot. For such a plot each axs s referred to as a prncpal axs. For exaple, the frst axs s called the frst prncpal axs, whle the second axs s called the second prncpal axs. However these vectors n such a plottng syste do not take nto consderaton the strength of the relatonshp between the rows and coluns along each axs. n fact the axes are equally weghted as can be seen fro (8). Therefore, these axes have assocated wth the unt nertas and Greenacre (1984, p.93) refers to the sngular vectors as a syste of co-ordnates as standard co-ordnates.
264 E.. BEH 3.4.2 Prncpal profle co-ordnates nstead of defnng the row and colun co-ordnates usng a and b, let the row and colun profle co-ordnates be defned by f a (11) respectvely. Then (8) becoes g b (12) p f f p g g (13) The syste of co-ordnates defned usng (11) and (12) nvolve the easure of assocaton and are plotted n Fgure 1 for. nstead of each axs havng unt nertas, the -th prncpal axs has an nerta value equal to. Note, fro (10) and (13) that X n p f (14) Slarly X n p g (15) Equatons (14) and (15) show that profle co-ordnates close to the orgn do not contrbute uch to the varaton of the data snce they only ake a sall contrbuton to the total nerta. Profle co-ordnates far fro the orgn do ake such a contrbuton. An alternatve expresson for the row profle co-ordnates obtaned by ultplyng (3) by p b and usng the colun orthogonalty property of (8) s Slarly, the colun profle co-ordnates can be alternatvely expressed by f p p b (16) g p p a (17) Equaton (16) s the weghted su of the -th row profle, whle (17) s the weghted su of the -th colun profle. These equatons also show the lnk between the profle co-ordnates and standard co-ordnates.
Sple Correspondence Analyss: A Bblographc Revew 265 3.4.3 Goodan s profle co-ordnates Consder agan the Pearson ratos. Then a b f b a g (18) Ths shows, and as was noted before, that profle co-ordnates near the orgn contrbute to the hypothess of ndependence, whle those far fro the orgn do not. Equaton (18) provdes an adequate reason f one wants to nclude on the sae dsplay both row and colun profle coordnates. Ths suggests that for the coparson of a row profle co-ordnate wth a colun profle coordnate, nstead of usng the row profle co-ordnates as prevously defned, there s soe advantage n usng a re-scaled verson. Rather than usng the co-ordnates defned above, Goodan (1986) suggested usng co-ordnates of the for f a (19) g b (20) wth. These co-ordnates were also descrbed n Atchson & Greenacre (2002). Dfferent values of (and therefore ) wll produce dfferent co-ordnates. For exaple, Gabrel (1971, p.458) descrbes three sets of co-ordnates used to construct a bplot that concde wth, and so that, and respectvely. The assgnent of these values have been descrbed as syetrc, row soetrc and colun soetrc factorsatons respectvely (Lobardo, Carler & D Abra, 1996). Gabrel & Odoroff (1990) also descrbes the colun etrc preservng (CP) bplot co-ordnates (slar to those for correspondence analyss). The relatonshp between Goodan s co-ordnates and those of sple correspondence analyss s evdent when and. n ths case, the row profle co-ordnates are those defned by (11), whle the coluns are plotted usng ther standard co-ordnates. Slarly, when and the colun profle co-ordnates are ust those defned by (12), whle the rows are plotted usng ther standard co-ordnates. Snce the classcal correspondence analyss approach focuses on parttonng the total nerta nto sngular values, the profle co-ordnates of (11) and (12) obtan slar results. For exaple, p f p f (21) p g p g Therefore, f and are chosen so that, then (22) p f p g
266 E.. BEH whle the relatonshp between the total nerta and the profle co-ordnates of (19) and (20) s X n A sple generalsaton of soe graphcal procedures used for the analyss of categorcal data can be found n Beh (2003b) and ncludes the Goodan co-ordnates as a specal case. 3.5 Transton Forulae The transton forulae (Hll, 1973), or barycentrc forulae (Benz écr, 1992, p.111), are equatons for obtanng profle co-ordnates for one varable fro the co-ordnates of the other varable. Suppose we consder the applcaton of classcal correspondence analyss. Usng (16) and (12), equaton (11) can be alternatvely expressed by f p p b p g p Therefore, we can obtan the row profle co-ordnates when the colun profle co-ordnates are known by f p p g (23) Slarly, we can obtan the colun profle co-ordnates when the row profle co-ordnates are known by g p p f (24) Therefore the profle co-ordnate, f, s a scaled cobnaton of the colun profle co-ordnate, g, as vares. Alternatvely (23) can be vewed as the weghted su of the -th row profle across the coluns, where the weght s g. Thus, f p s relatvely large, g wll be heavly weghted and so wll nfluence f. However, a drect coparson between a row and colun profle s not possble usng the scalng approach of (11) and (12). Refer also to dscussons ade n Secton 3.6.4. 3.6 Dstances 3.6.1 Centrng of profle co-ordnates The row and colun profle co-ordnates are centred about the orgn of the correspondence plot, called a centrod. As t wll be shown, the orgn s where the expected cell values p p le. t can be shown that the row profle co-ordnates are centred about the centrod of the correspondence plot. That s for. p f
Sple Correspondence Analyss: A Bblographc Revew 267 To show that ths s true, recall the defnton of the row profle co-ordnate of (11). Then p f p a p a Slarly, t can be shown that the colun profle co-ordnates are centred about the centrod. That s for. p g 3.6.2 Dstance fro the orgn The squared dstance of the -th row profle fro the orgn s d p p p p b a whch splfes to d f (25) and s the Eucldean dstance of the -th row profle co-ordnate fro the orgn. Therefore, equaton (14) becoes X n p d (26) Hence the larger the dstance of the -th row profle n the -densonal correspondence plot fro the orgn, the larger the weghted dscrepancy between the profle of category to the average profle of the colun categores. t follows that ponts far fro the orgn ndcate a clear devaton fro what we would expect under coplete ndependence, whle a pont near the orgn ndcates that the frequences n row of the contngency table fts the ndependence hypothess well. n fact, Lebart et al. (1984) showed by usng confdence crcles that the researcher s able to test graphcally whether the poston of a partcular row or colun category contrbutes to the hypothess of ndependence for the contngency table. Beh (2001a) deonstrated that these crcles can be usefully appled for the correspondence analyss of ordnal two-way contngency tables. Generally, f the orgn les outsde of the confdence crcle, then that category can be sad to contrbute to the dependency between the row and colun categores of the contngency table. f the orgn les wthn the crcle, then that category does not ake such a contrbuton. The sae concluson can be ade for the Eucldean dstance of the colun profle co-ordnates to the orgn.
268 E.. BEH 3.6.3 Ch-squared dstances One of the advantages of usng a correspondence plot s that the researcher s able to graphcally establsh slar and/or dfferent profles fro the sae varable. The squared dstance between two row profles and n an optal correspondence plot s gven by d p p p p p (27) and s the weghted Eucldean dstance between these profles. Equaton (27) can be wrtten n ters of f and f, the profle co-ordnates of rows and along the -th prncpal axs. However, when, the frst ter of the su s zero snce a a. Usng the row profle co-ordnate defnton of (11), ths dstance s easured n the Eucldean space, and for -densonal correspondence plot s defned by d f f (28) or equvalently d d d f f so that d d. Slarly, the Eucldean dstance between coluns and can be easured by d p p p p p g g (29) These results lead to the concluson that when two row profles, or two colun profles, are slar, then they wll be postoned closely to one another n the correspondence plot. f two profles are dfferent, then they wll be postoned at a dstance fro one another. Therefore correspondence analyss can deterne how profles wthn a varable correspond to one another thus the etyology of the technque. The relatonshp between (27) and (28) also verfes the property of dstrbutonal equvalence as stated by Lebart, orneau & Warwck (1984). 1. f two row profles havng dentcal profles are aggregated, then the dstance between the reans unchanged; 2. f two row profles havng dentcal dstrbuton profles are aggregated, then the dstance between the reans unchanged. The dstrbutonal equvalence results also apply to the colun profle co-ordnates. ore recently, Yaakawa, chhash & yosh (1998, 1999) and Yaakawa, Kanau, chhash & yosh (1999) consdered the developent and applcaton of a technque that nvolves, for the analyss of two-way tables, nsng the dstance between two row profle co-ordnates
Sple Correspondence Analyss: A Bblographc Revew 269 d f f S for any postve value of S, where S s deterned usng the nteror pont ethod for the neural soluton algorth. Ths s used as an alternatve to (28) and shfts correspondence analyss fro the L -nor planar space to the ore general L S -nor space. 3.6.4 nter-pont dstances As the row and colun co-ordnates can be sultaneously represented on the sae correspondence plot, t sees reasonable to assue that one s able to easure the dstance between a row and colun profle. Such dstances have been referred to as nter-pont dstances Carroll, Green & Schaffer (1986, 1987, 1989) proposed a way of easurng these dstances by recodng the two-way contngency table to be of the for of an ndcator atrx. However Greenacre (1989) deonstrated that the clas ade by these authors are flawed. Accordng to Hoffan, de Leeuw & Arun (1995) the dfference n opnon between Carroll, Green and Schaffer (CGS) and Greenacre can be descrbed as follows: Greenacre was traned n the French school, whch appears to correspond ncely wth the fact that he takes sple correspondence analyss to be a ore fundaental and satsfactory technque than CA. t also eans that he tends to ephasse the so called ch-squared dstance nterpretaton of wthn-set dstances. CGS have ther startng pont n ultdensonal scalng and unfoldng theory, whch naturally leads the to ephasse between-set dstance relatons. For a descrpton of unfoldng theory for categorcal data refer to Heser (1981). Ths s where CGS ade ther stake. They tred to enforce characterstcs of ultdensonal scalng (DS) to be applcable to correspondence analyss, when DS can be used to easure such dstances. Dstance ssues n sple correspondence analyss should be confned to categores wthn a chosen varable (ch-squared dstances). However, conclusons about nter-pont dstances should only be used as an nforal gude to the correspondence between categores fro dfferent varables. For a ore foral descrpton of nter-pont correspondences, soe descrpton of the assocaton between the two varables needs to be taken nto account. Ths s a goal of the sple correspondence analyss procedure of Beh (1997); to obtan a graphcal descrpton of the lnear (and non-lnear) assocaton of two-way contngency tables by solatng generalsed correlatons (Davy, Rayner & Beh, 2003). Also refer to Greenacre (1989, p.363) for ore coents on ths ssue. 3.7 Other Pearson Rato Decopostons 3.7.1 The general decoposton The decoposton of Pearson rato s can be generalsed by consderng any approach that s of the for a D u b a u u b (30) where D denotes the type of decoposton used to dentfy the scores and the easures of assocaton between the two varables. For exaple, SVD s a specal case where a b and
270 E.. BEH u u u The ter u quantfes a level of assocaton between the two varables, for u a and b, such that u a u b p where a a a a s the vector of scores for the -th row category and b b b b, s the vector of scores for the -th colun category. Here, a s the optal nuber of densons requred to represent the row scores, whle b s the optal nuber of densons requred to represent the colun scores. The eleents of these vectors are constraned by the orthogonalty property (8). Therefore, the decoposton of Pearson ratos ay also be expressed as a D u b a u u b Other than SVD, there are any ways that the decopostons of the Pearson rato that can be ade. We wll be brefly consderng another two bvarate oent decoposton and hybrd decoposton. As long as the decoposton of the Pearson rato s of the for (30), correspondence analyss wll antan the sae atheatcal structure as the classcal approach. However, the nterpretaton of the output depends on the decoposton used. 3.7.2 oent decoposton The ethod of correspondence analyss developed n Beh (1997) decoposes the Pearson rato s such that D u a u u b where u a u b are asyptotcally standard norally dstrbuted rando varables. The values a u and b are row and colun orthogonal polynoals, respectvely. Those polynoals that are consdered are those presented by Best (1994) and Beh (1998). For each varable, the polynoals requre a set of scores to reflect the structure of the categores. For exaple, ordnal scores should be used for ordnal categores. Therefore perforng ths type of correspondence analyss s especally nforatve for graphcally dsplayng, and takng nto consderaton, the structure of ordered categores. Refer to Best & Rayner (1996) and Rayner & Best (1996) for a full nterpretaton of ths decoposton. An portant theoretcal plcaton of usng the bvarate oent decoposton s that the u p
Sple Correspondence Analyss: A Bblographc Revew 271 ters have a clear and sple nterpretaton; they defne the u -th bvarate oent between the row and colun categores of N. As a result, Davy et al. (2003) refer to the, and ther ultvarate extensons, as generalsed correlatons. Consder as an exaple p s s where s and s are the set or row and colun scores used to construct the orthogonal polynoals, and s p and slarly defned. s p. The quanttes and are When nteger valued scores, are used as the scores to specfy the underlyng structure of both sets of categores, then s Pearson s product oent correlaton (Rayner & Best, 1996), and when drank scores are used, s Spearan s rank correlaton (Best & Rayner, 1996). 3.7.3 Hybrd decoposton Another ethod of decoposng s to consder a hybrd decoposton that conssts of a xture of eleents fro sngular value decoposton and bvarate oent decoposton. t s of the for D a b where a b p and uses sngular vectors for the nonal (row) varable and orthogonal polynoals to take nto account the coplex structure of the (coluns) varable. t s based on the decoposton of the Pearson ch-squared statstc presented n Beh (2001b) and has been used to perfor ordnal correspondence analyss on sngly ordered two-way contngency tables; see Beh (2003a). An portant plcaton of usng ths decoposton s that the square of the sngular values obtaned fro sngular value decoposton can be expressed as the su of squares of the u. That s, for the -th largest sngular value Therefore, usng such a decoposton eans that sngular values can be parttoned nto lnear, quadratc and hgher order ters. 4 odelng Sple Correspondence Analyss 4.1 RC Correlaton odel n Secton 2.3, the departure fro the ndependence hypothess can be ade by testng (2) aganst (1). Such a easure can be quantfed by coparng the defnton of the Pearson ratos (3) and the
272 E.. BEH decoposton (30) p p p a u b a u u b For exaple when sngular value decoposton s appled to the Pearson rato s p p p a b (31) odel (31) s tered the Fsher s dentty by Lebart et al. (1984) and Lancaster (1969) and s naed so to reflect the work of Fsher (1940). However, t s also ore coonly referred to as the saturated RC canoncal correlaton odel. odel (31) s saturated when. The work of de Leeuw & van der Heden (1991) showed that odel (31) s equvalent to the canoncal correlaton odel when of the canoncal correlatons between the row and colun varables are non-zero. The unsaturated odel s p p p a b (32) where. Grass & Vsentn (1994) note that Escoufer (1983, 1984) proposed a generalsaton of a correspondence analyss, where the generalsed RC correlaton odel s p q q q a b (33) where q are ont proportons obtaned fro soe specfed hypothess, and q and q are any weghts wth the restrctons q, q, q q. For exaple, under the hypothess of ndependence, q p p, q p and q p. Also refer to de Leeuw & van der Heden (1991) for a dscusson of another generalsed verson of (31). 4.2 Basc Correspondence odel When N s a contngency table of sze, or, then. For such a table, the RC canoncal correlaton odel (31) becoes p p p a b (34) odel (34) has been studed by Goodan (1985b), Glula, Kreger & Rtov (1988), Ro & Sarkar (1992), Rtov & Glula (1993), Wllas (1952) and Glula & Haberan (1986) and s called the rank-2 canoncal correlaton odel, or basc correspondence odel. t can be seen fro ths odel that the easure of correlaton between the row and colun scores, a and b, respectvely, can be calculated by
Sple Correspondence Analyss: A Bblographc Revew 273 When that a b p (35), odel (34) becoes the odel of ndependence (1). By rearrangng (34) we see Squarng both sdes of (35) and ultplyng by p p p p p p a b (36) p yelds p p p p p a b p p (37) So sung over the rows and coluns of (37) and usng (8) gves X n (38) whch s ust equaton (10) when. Therefore, the total nerta of a contngency table when s ust the square of the correlaton between the rows and coluns. Exaples of where a or a contngency table wll arse nclude those where there are two responses for one varable; Yes and No, Pass and Fal, ale and Feale, and so on. Ro & Sarkar (1992) proposed the rank-2 odel p p p a b (39) where and depend on the paraeter. f n (39) then so that p p p a b (40) ultplyng ths by a b p p p a p b p and sung over the rows and coluns yelds a b p a p b p Snce s defned by (35), t can be seen that, usng (8). Therefore, when, (39) reduces to be the rank-2 canoncal correlaton odel of (34). Also, f the correlaton between the row and colun scores s zero, the odel of Ro & Sarkar (1992) reduces to the ndependence odel (1). 4.3 Reconsttuton odel n the correspondence analyss context, the scores used to dentfy slartes and dfferences between categores wthn a partcular varable are not the standard co-ordnates, nstead they are the
274 E.. BEH profle co-ordnates defned by (11) and (12). Usng ths transforaton, the RC canoncal correlaton odel (31) becoes p p p f g odel (41) s referred to as the correspondence odel by Goodan (1986), or the reconsttuton odel by Greenacre (1984) and any others. The reconsttuton odel of (41), ust lke (31), can be used to deterne the effect of reducng the proble fro densons to densons. The better the approxaton gven by the odel, the better the -densons are n representng the row and colun profles of the contngency table. The reconsttuton forula verfes that f the row and colun profle co-ordnates are close to zero then the hypothess of coplete ndependence s supported. (41) 4.4 Other odels There are any odels that can be used as an alternatve to (31) and (41). Here two ore are brefly descrbed. One of the ost popular alternatves to the RC correlaton odel of (31) s the Goodan RC odel, as seen n Goodan (1979). Ths odel s p a b (42) where a and b are as defned earler. The paraeter s tered the coeffcent of ntrnsc assocaton, whle and are postve paraeters, or nusance paraeters as Glula et al. (1988) call the. Escoufer & uncar (1986) showed that these paraeters can be calculated by p p (43) p p p p p (44) odel (42) has been extensvely revewed by any such as Goodan (1979, 1981, 1985b, 1985a, 1986, 1996), Escoufer (1988), Haberan (1981) and Becker & Clogg (1989). n fact, Goodan (1981) appled odel (42) to two-way contngency tables wth ordered varables. Consder the rank-2 Goodan correlaton odel when p a b (45) Ths has been exaned by Glula, Kreger & Rtov (1988), Rtov & Glula (1991, 1993), Glula (1984) and Ro & Sarkar (1992). Glula, Kreger & Rtov (1988) show that the paraeter fro equaton (35) and fro (45) are dfferent easureents. They pont out that whle s a correlaton value, as can be seen fro (35), and thus les wthn the nterval, les wthn. Goodan (1991) showed that the ntrnsc assocaton can be calculated by
Sple Correspondence Analyss: A Bblographc Revew 275 a b p (46) for. For classcal correspondence analyss, reparaeterse the row and colun scores so that f a (47) so that the reconsttuton forula s g b (48) p Therefore, the constrants of (14) and (15) becoe f g (49) p f f (50) p g g n soe stuatons, reconsttutng the cell values usng the unsaturated verson of (41) can lead to negatve estates of p. Ths proble was also ponted out for the unsaturated verson of the reconsttuton forula (51) p p p and for slar odels by Rayner & Best (1996) and Beh (2001b). Therefore, the exponental approxaton should be consdered as an alternatve. u a u u b 5 The Correspondence Analyss Faly Over the past few decades correspondence analyss has encountered any attepts to adust the ethod so that t can cater for nterdscplnary probles that have arsen. The frst adustent was ade n the feld of ecology and proposed by Hll & Gauch r. (1980). Ther ethod, whch s called detrended correspondence analyss, s a ethod of reovng the technque s characterstc arch effect by cuttng the frst axs nto segents durng the calculaton of the row and colun scores and then resettng the average of each segent to zero. Paler (1993) ponted out that such a refneent produces nelegances that have been crtcsed by nchn (1987), Oksanen (1987, 1988) and Wartenberg, Ferson & Rohlf (1987). The crtcss of Wartenberg et al. (1987) propted Peet, Knox, Case & Allen (1988), although agreeng wth soe of ther ponts, to provde prospectve users wth a ore balanced perspectve on the advantages and dsadvantages of DCA. Oksanen & nchn (1997) nvestgated the nstablty of detrended correspondence analyss
276 E.. BEH usng coputer progras desgned for the executon of correspondence analyss and ts detrended for. Another ethod of correspondence analyss, called canoncal correspondence analyss, was developed by ter Braak (1986, 1987). t conducts correspondence analyss by ncludng the addtonal step of selectng the lnear cobnaton of row varables that axses the varaton of the colun scores. n fact Brks, Peglar & Austn (1994) provde a lst of 378 references relatng to the applcaton and developent of canoncal correspondence analyss between 1986 to 1993. Ths 1994 lst has snce been updated and ncludes 402 references up to 1996. Ths bblography can be found at the followng web address; http www fas uontreal ca BOL Casgran cca bb ndex htl any ore can now be found, for exaple, by utlsng search engnes avalable on the nternet. Another for of correspondence analyss that can be ncluded n ths faly s ont correspondence analyss developed by Greenacre (1988, 1990, 1991) and proved upon by Bok (1996). Ths s a ultple correspondence analyss adustent, but can naturally be used for the analyss of two-way contngency tables as well. For bvarate categorcal data, the ethod nvolves transforng the table nto a Burt atrx (Burt, 1950) where the dagonal sub-atrces consst of row and colun argnal frequences and the off-dagonal atrces are N and ts transpose. Applyng classcal correspondence analyss to the Burt atrx wll lead to an nflated total nerta snce the dagonal sub-atrces are ncluded n the analyss. Ths leads to a proble snce they are ncluded n the analyss because the cell frequences, and hence the argnal frequences, are assued to be fxed. ont correspondence analyss s perfored by ottng the dagonal sub-atrces and leads to an dentcal correspondence analyss when copared wth the classcal approach. Thus, correspondence analyss can no longer be vewed as a sngle ultvarate analytc technque. nstead there s, as Paler (1993) calls t, a correspondence analyss faly that ncorporates all these correspondence analyss ethodologes. Also ncluded n ths faly are detrended canoncal correspondence analyss, whch s a cobnaton of the two ethods, non-syetrcal correspondence analyss (D Abra & Lauro, 1989, 1992; Kroonenberg & Lobardo, 1998, 1999; Lauro & Balb, 1999; Lobardo et al., 1996), and partal canoncal correspondence analyss (ter Braak, 1988). 6 Texts and Applcatons Despte the Englsh ntroducton to correspondence analyss by Hll (1974), ts ntal developent had been farly slow to reach Englsh speakng countres. uch of the earler work, n the 1960 s and early 1970 s, was wrtten n French. There are three possble reasons for ths slow developent : 1. The ntal lag n the developent outsde France ay be due to the proble non-englsh speakng researchers have wth the language. 2. Correspondence analyss s often ntroduced wthout any reference to other ethods of the statstcal treatent of categorcal data. Thus, ntally, the applcaton and developent of correspondence analyss was rare. 3. Due to the dfference n phlosophes of data analyss between European and Englsh speakng statstcans, correspondence analyss, n ts early years, faled to ature outsde of France. There are only a few Englsh wrtten artcles concerned wth correspondence analyss, fro the graphcal pont of vew, between late 1960 s and early 1970 s. These nclude Benz écr (1969), Hll (1974), agnan (1974) and Tel (1975). There are also only a few early Englsh wrtten artcles dscussng the applcaton of correspondence analyss. Tel & Chenee (1975) appled correspondence analyss to rock saples collected fro an old volcanc regon fro the Erta Chan n Eastern
Sple Correspondence Analyss: A Bblographc Revew 277 Ethopa. The a of ths study was to deterne portant geologcal eleents n the regon. Davd, Caplgo & Darlng (1974) used correspondence analyss to study the aor factors nfluencng geologcal processes n the volcanc belt along the Superor Provnce of the Canadan Sheld. Another farly early applcaton of correspondence analyss can be seen n van Heel & Frank (1980). They appled the ethod to help dentfy ages of bologcal acro-olecules usng an electron croscope. A decade of further developent brought about the Englsh publcaton of two ndependent texts, Greenacre (1984) and Lebart et al. (1984). They are both now used as standard books and can be used as an ntroducton to correspondence analyss. chael Greenacre was a student of ean-paul Benzécr for two years fro 1973, and as a result, uch of today s lterature on correspondence analyss uses Benzécr s style. The text of Ludovc Lebart, Alan orneau and Kenneth. Warwck s based on an earler French work by Lebart, orneau & Tabard (1977) and dscusses other ultvarate technques such as prncpal coponent analyss, canoncal correlaton analyss and cluster analyss. Snce these books have becoe standard texts on correspondence analyss, the technque has experenced an exploson of ts applcaton and developent n ost felds of research. Of partcular note s the ethod s growth n popularty fro the early 1990 s. Correspondence analyss has been appled n areas such as sensory evaluaton, ecology, psychoetry, health care, econocs, edcne, lterature and engneerng. Refer to Beh (2002), whch can be found at the web address; http www uws edu au about acadorg clb sqs research reports for a descrpton of artcles relevant to these dscplnes. Archaeology s a dscplne that has experenced a strong nterest n the applcaton of correspondence analyss. Recently, Clouse (1999) nvestgated the applcaton of correspondence analyss to adng archaeologcal studes. Baxter (1994) provdes soe excellent dscussons of correspondence analyss n ths area. Of specal nterest are the reference lsts that he has put together. They provde an extensve overvew of the any ways correspondence analyss can be used n archaeologcal studes. These lsts can be vewed on the nternet at the followng addresses; http scence ntu ac uk sor b oldcabb ht references pror to 1993 http scence ntu ac uk sor b corranbb htl references fro 1993. Over the years, there are any ore artcles that have appeared n the lterature that span a wde range of dscplnes. The dscplnes dscussed n ths paper, and the artcles noted by Beh (2002), are ncluded to gve a flavour of the dverse applcatons and popularty that correspondence analyss has antaned. 7 Coputng There are several coputer packages and revew ournal artcles (both relatvely old and new revews) relatng to any of the coputatonal ssues assocated wth correspondence analyss. For exaple, publshed revew artcles nclude those by Lebart & orneau (1982), Greenacre (1986), Carr (1990), Hoffan (1991), Goran & Pravera (1993), Tan, Sorooshan & eyers (1993), Thopson (1995) and Bond & chalds (1997). There are also any coercally avalable progras, and coputer progras publshed n ournal for, that perfor classcal correspondence analyss. For exaple, Carr (1990) detals FOR- TRAN code, called CORSPOND, that executes sple correspondence analyss. t s lted to the
278 E.. BEH analyss of a two-way contngency table wth 80 rows and 20 coluns wth nforaton assocated wth a axu of the 10 densons calculated. However, Carr (1990) ponts out that these ltatons are easly extended by odfyng progra densons. Whle these three values are used as nput values, the progra s output conssts of all the nput data, a suary of egenvalues and the percentage contrbuton they ake wth the total nerta, the profle co-ordnates and two-densonal plots. For exaple, f -densons are specfed, there are two-densonal plots that can be dsplayed. The Carr (1990) progra produces each of these plots. Tan et al. (1993) outlnes a progra called ATCORS wrtten usng the language atlab. Ther progra allows for suppleentary row and colun categores to be ncluded. The progra of Tan and hs co-authors allow for the analyss of a contngency table wth a large nuber of rows and coluns. The output s thorough wth graphs for relatve and absolute contrbutons of profles to each prncpal axs, error profles and suppleentary proectons, ncludng an optal correspondence plot. Evertt (1994) offers a very sple progra for classcal correspondence analyss usng S-PLUS. The correspondence plot usng S-PLUS can easly ncorporate colour and contan all the relevant nforaton, such as profle naes, and partal nerta values. Sh & Carr (2001) provdes detals for R code, CORSPONDA, whch also perfors sple correspondence analyss. There are also any coercally avalable packages that perfor correspondence analyss. A bref search of the nternet wll also reveal any packages avalable fro coercal vendors. However, we wll only brefly dscuss those that have appeared n the research lterature. Hoffan (1991) and Thopson (1995) gve a good revew of, between the, eght packages. Hoffan (1991) dscussed Dual3 (verson 3.2), APWSE (verson 2.01), PC-DS (verson 5.0) and SCA (verson 1.5). SCA (verson 1.0) was dscussed by Greenacre (1986). Dual3 deals wth the dual scalng approach to correspondence analyss of Nshsato (1980) and was wrtten by Shzuhko and ra Nshsato. t s wrtten n BASC and so s coand orentated rather than enu drven. Hoffan (1991) suggests that Dual3 s easy to use only f the researcher has pror knowledge of dual scalng (or sple correspondence analyss). However, the progra wll only calculate solutons up to three densons. Hoffan also suggests that APWSE s not the best package to use. The docuentaton s wrought wth ncorrect assertons and sleadng stateents (Hoffan, 1991, p. 308). However, coparng APWSE wth other correspondence analyss progras, t s enu orentated, easy to use and s colourful. APWSE can handle contngency tables contanng up to 100 rows and 100 coluns, and s largely wrtten for ndustral applcatons. SCA s a coputer package largely wrtten for the correspondence analyss of two-way contngency tables, but can perfor a ultple correspondence analyss only f the data s presented n the for of an ndcator atrx. t s wrtten n Turbo BASC and unlke Nshsato s progra (Dual3) can calculate solutons up to the tenth denson. As t s wrtten to be able to analyse ndcator atrces, SCA can analyse a contngency table wth up to 175 coluns and vrtually an unlted nuber of rows (however, the larger the data set, the slower the coputatonal power), and thus s a very good progra to use. Thopson (1995) revewed four other coercally avalable packages; BDP (verson 7.0), NCSS (verson 5.3), SAS (verson 6.07) and SPSS (verson 6.0). As far as worked exaples are concerned, Thopson felt at the te that BDP s a good start for those learnng correspondence analyss, whle SAS s also very good n ths respect (but wth fewer exaples). However, the SAS docuentaton s ore techncal than the other three packages revewed, whle the SPSS docuentaton s good. All of the progras wll conduct a sple correspondence analyss. However, all except NCSS, wll calculate a ultple correspondence analyss. The three progras that perfor ths analyss do so va the Burt atrx. SAS, NCSS and BDP wll pert an analyss of suppleentary categores, yet SPSS wll not. Lebart (1982) revewed ther progra whch perfors a correspondence analyss of two-way
Sple Correspondence Analyss: A Bblographc Revew 279 and ult-way contngency tables. Ther progra, wrtten n FORTRAN, also perfors a prncpal coponent analyss and cluster analyss on the data, and can analyse contngency tables consstng of hundreds of rows and thousands of coluns. Bond & chalds (1997) have also wrtten a progra capable of perforng correspondence analyss on two-way and ult-way contngency tables. Ther progra, called ANACOR, s wrtten n Lsp-Stat and ts perforance s claed (by the authors) to be as good as the coercally avalable products, and n soe respects, better. Ther package has one advantage that any of the others do not have, and that s ouse-drven zoong capabltes for the correspondence plot. When analysng large contngency tables, the correspondence plot can often look very cluttered, especally toward the centrod, where any profle categores ay be postoned. ANACOR allows the user to zoo-n or zoo-out of the plot by drawng a square around the regon that s to be nvestgated. ANACOR s also enu drven and conssts of colour graphcs. Goran & Pravera (1993) descrbed ther progra CA.EXE, whch perfors a ultple correspondence analyss va the ndcator or Burt atrces (see Greenacre, 1984, for ore detals on ths type of analyss). The progra s wrtten n QUCKBASC 4.5 and can be executed on an S/DOS or DC/DOS achne. The progra can handle any nuber of observatons/people classfed nto a contngency table, however, the total nuber of categores ust not exceed 70. Correspondence analyss odules, or add-ns, also appear n other popular packages. STATA ncludes the odule CORANAL, as descrbed by van Ker (1998). XLSTAT (Verson 5) s avalable as an add-n to crosoft Excel to perfor correspondence analyss as well as ultple correspondence analyss, ultdensonal scalng, prncpal coponent analyss and dscrnant analyss. VSP (ultvarate Statstcal Package) s a Wndows package that perfors correspondence analyss and ts detrended and canoncal fors. The statstcal package NTAB (Verson 13) also contans a odule CORRES.TB that perfors sple, and ultple, correspondence analyss. STATSTCA also contans a odule to perfor canoncal and detrended correspondence analyss. For the sple correspondence analyss assocated wth ordnal two-way contngency tables, Beh (2004) dscusses n soe detal progras wrtten usng S-PLUS. Refer to Beh (2002) for a ore coprehensve lst of contrbutors to the coputaton of correspondence analyss. 8 Other ssues The a of ths paper has been to provde a dscusson of the developent and the applcaton of correspondence analyss. However, there are any other ssues assocated wth correspondence analyss that have not been dscussed n any detal due to space restrctons. We wll therefore provde references relatng to several of these ssues. Recent theoretcal developents n apan have been ade nto the correspondence analyss of artfcal shaped data, especally cylndrcal and bnary cylndrcal shapes, dsk, torus, syetrc polyhedron, ultple crcular, sphercal and other typcal geoetrc fgures. See Okaoto (1994a, 1994b, 1994c, 1995a, 1995b, 2000), Endo (1995, 1996) and Okaoto & Endo (1995) for detals. The ssue of nfluence of cells, and responses, n correspondence analyss has been nvestgated. K (1992, 1994), and Pack & ollffe (1992) looked at the pact on the analyss by ncludng and excludng, or deletng, categores, whle Pack & ollffe (1992) also looked at the topc of nfluence. An ssue slar to that of nfluence was dscussed by Krzanowsk (1993). For an ncdence atrx, the author exanes ethods for dentfyng coluns (attrbutes) whch hghlght portant row (ncdence) dfferences. Soe of the ost popular dscussons concern the theoretcal slartes between correspondence analyss and log-lnear odels, despte ther dfferences n phlosophy of data analyss. Goodan
280 E.. BEH (1985a, 1986), van der Heden & de Leeuw (1985), Choulakan (1988), van der Heden & Worsley (1988) and van der Heden, de Falguerolles & de Leeuw (1989) showed the lnk between correspondence analyss and non-ordnal log-lnear odels. Gower (1989) coents on the lnk between correspondence analyss and log-lnear odels: Correspondence analyss (CA) has been enthusastcally developed n France and wdely adopted n other contnental countres but has had a ore cautous recepton n Brtan. n part ths has been a consequence of clas that CA s a descrptve ethod and not odel based. Lnks between CA and log-lnear analyss (LLA) have helped to gan ore acceptance n Brtan, and perhaps for LLA to gan ore acceptance abroad. We refer to the above entoned artcles for ore detals. For the lnk between correspondence analyss and log-lnear odels of ordnal categorcal data refer to Beh (2001b). Of partcular nterest s ther non-teratve estaton procedure of paraeters fro an ordnal log-lnear odel. 9 Dscusson The developent of correspondence analyss s a long and nterestng one, and one that has not beng exclusvely confned to statstcans. ts dversty of developent and applcaton range the felds of boetry, psychoetry, lngustcs to health care and vegetaton scence. Therefore, correspondence analyss akes a very versatle ethod of data analyss n all stuatons where an exploratory or ore n-depth analyss of categorcal data s requred. n a sense ths s a reflecton of all statstcal technques, and s ncely sued up by Kendall (1972, p.194): t s hard to thnk of any subect whch has not ade soe knd of contrbuton to statstcal theory agrculture, astronoy, bology, chestry and so on through the alphabet. The rearkable thng, perhaps, s that these lnes of developent reaned relatvely ndependent for so long and only n the present century have been seen to have a coon conceptual content. References Atchson,. & Greenacre,. (2002). Bplots n copostonal data. Appled Statstcs, 51, 375 392. Baxter,.. (1994). Exploratory ultvarate Analyss n Archaeology. Ednburgh: Ednburgh Unversty Press. Becker,.P. & Clogg, C.C. (1989). Analyss of sets of two-way contngency tables usng assocaton odels. ournal of the Aercan Statststcal Assocaton, 84, 142 151. Beh, E.. (1997). Sple correspondence analyss of ordnal cross-classfcatons usng orthogonal polynoals. Boetrcal ournal, 39, 589 613. Beh, E.. (1998). A coparatve study of scores for correspondence analyss wth ordered categores. Boetrcal ournal, 40, 413 429. Beh, E.. (2001a). Confdence crcles for correspondence analyss usng orthogonal polynoals. ournal of Appled atheatcs and Decson Scences, 5, 1 11. Beh, E.. (2001b). Parttonng Pearson s ch-squared statstc for sngly ordered two-way contngency tables. The Australan and New Zealand ournal of Statstcs, 43, 327 333. Beh, E.. (2002). Sple correspondence analyss: A bblographc revew. Research Report No. QS2002.9. School of Quanttatve ethods and atheatcal Scences, Unversty of Western Sydney, Australa. Beh, E.. (2003a). Sngly ordered sple correspondence analyss. Research Report No. QS2003.2. School of Quanttatve ethods and atheatcal Scences, Unversty of Western Sydney, Australa. Beh, E.. (2003b). A sple generalsaton of approaches for the graphcal analyss of cross-classfed data. Research Report QS2003.1, School of Quanttatve ethods and atheatcal Scences, Unversty of Western Sydney, Australa. Beh, E.. (2004). S-PLUS code for ordnal correspondence analyss. Coputatonal Statstcs, (to appear). Benzécr,.P. (1969). Statstcal analyss as a tool to ake patterns eerge fro data. n ethodologes of Pattern Recognton, Ed. S. Watanabe, pp. 35 74. Benzécr,.P. (1973a). L Analyse des donnees:. La Taxonoe. Pars: Dunod. Benzécr,.P. (1973b). L Analyse des donnees:. La Taxonoe. Pars: Dunod. Benzécr,.P. (1992). Correspondence Analyss Handbook. New York: arcel Dekker. Best, D.. (1994). Nonparaetrc coparson of two hstogras. Boetrcs, 50, 538 541.
Sple Correspondence Analyss: A Bblographc Revew 281 Best, D.. & Rayner,.C.W. (1996). Nonparaetrc analyss for doubly ordered two-way contngency tables. Boetrcs, 52, 1153 1156. Brks, H., Peglar, S. & Austn, H. (1994). An Annotated Bblography of Canoncal Correspondence Analyss and Related Constraned Ordnaton ethods 1986 1993. Tech. Rep. Botancal nsttute, Unversty of Bergen, Allegaten, Bergen, Norway. Bok, R.. (1996). An effcent algorth for ont correspondence analyss. Psychoetrka, 61, 255 269. Bond,. & chalds, G. (1997). nteractve correspondence analyss n a dynac obect orented envronent. ournal of Statstcal Software, 2. Burt, C. (1950). The factoral analyss of qualtatve data.. Statst. Psychology, 3, 166 185. Carr,.R. (1990). CORSPOND: A portable FORTRAN-77 progra for correspondence analyss. Coputers and Geoscences, 16, 289 307. Carroll,.D., Green, P.E. & Schaffer, C.. (1986). nterpont dstance coparsons n correspondence analyss. ournal of arketng Research, 23, 271 280. Carroll,.D., Green, P.E. & Schaffer, C.. (1987). Coparng nterpont dstances n correspondence analyss.. arketng Research, 24, 445 450. Carroll,.D., Green, P. & Schaffer, C.. (1989). Reply to Greenacre s coentary on the Carroll Green Schaffer scalng of two-way correspondence analyss solutons. ournal of arketng Research, 26, 366 368. Choulakan, V. (1988). Exploratory analyss of contngency tables by log-lnear forulaton and generalsatons of correspondence analyss. Psychoetrka, 53, 235 250. Clouse, R.A. (1999). nterpretng archaeologcal data through correspondence analyss. Hstorcal Archaeology, 33, 99 107. D Abra, L. & Lauro, N. (1989). Non syetrcal analyss of three-way contngency tables. n ultway Data Analyss, Eds. R. Copp and S. Bolasco, pp. 301 315. Asterda: North-Holland. D Abra, L. & Lauro, N.C. (1992). Non syetrcal exploratory data analyss. Statstca Applcata, 4, 511 529. Davd,., Caplgo, C. & Darlng, R. (1974). Progresses n R- and C-ode analyss: correspondence analyss and applcatons to the study of geologcal processes. Canadan ournal of Earth Scences, 11, 131 146. Davy, P.., Rayner,.C.W. & Beh, E.. (2003). Generalsed correlatons. Preprnt, No.4/03, School of atheatcs and Appled Statstcs, Unversty of Wollongong, Australa. de Leeuw,. (1983). On the prehstory of correspondence analyss. Statstca Neerlandca, 37, 161 164. de Leeuw,. & van der Heden, P. (1991). Reduced rank odels for contngency tables. Boetrka, 78, 229 232. Eckart, C. & Young, G. (1936). The approxaton of one atrx by another of lower rank. Psychoetrka, 1, 211 218. Endo, H. (1995). Correspondence analyss of an artfcal bnary cylnder data. Statstcs & Probablty Letters, 25, 231 240. Endo, H. (1996). Correspondence analyss of artfcal bnary data wth crcular structure. atheatca aponca, 43, 339 355. Escoufer, B. (1983). Analyse de la dfference entre deux edures sur le produt de deux ees ensebles. Les Cahers de l Analyse des Donnees, 8, 325 329. Escoufer, B. (1984). Analyse factorelle en reference a un odele: applcaton a l analyse de tableaux d échange. Revue de Statstque Applquee, 32, 25 36. Escoufer, Y. (1988). Beyond correspondence analyss. n Classfcaton and Related ethods of Data Analyss, Ed. H.H. Bock, pp. 505 514. Asterda: North-Holland. Escoufer, Y. & uncar, S. (1986). Least-squares approxaton of frequences or ther logarths. nternatonal Statstcal Revew, 54, 279 283. Evertt, B.S. (1994). A Handbook of Statstcal Analyss usng S-plus. Chapan and Hall. Fenberg, S.E. (1982). Contngency tables. Encyclopeda of Statstcal Scences, 2, 161 171. Fsher, R.A. (1940). The precson of dscrnant functons. Annals of Eugencs, 10, 422 429. Gabrel, K.R. (1971). The bplot graphc dsplay of atrces wth applcaton to prncpal coponent analyss. Boetrka, 58, 453 467. Gabrel, K.R. & Odoroff, C.L. (1990). Bplots n boedcal research. Statstcs n edcne, 9, 469 485. Gf, A. (1990). Non-lnear ultvarate Analyss. Chchester: Wley. Glula, Z. (1984). On soe slartes between canoncal correlaton odels and latent class odels for two-way contngency tables. Boetrka, 71, 523 529. Glula, Z. & Haberan, S.. (1986). Canoncal analyss of contngency tables by axu lkelhood. ournal of the Aercan Statstcal Assocaton, 81, 780 788. Glula, Z., Kreger,. & Rtov, Y. (1988). Ordnal Assocaton n contngency tables: Soe nterpretatve aspects. ournal of the Aercan Statstcal Assocaton, 83, 540 545. Goodan, L.A. (1979). Sple odels for the analyss of assocaton n cross-classfcatons havng ordered categores. ournal of the Aercan Statstcal Assocaton, 74, 537 552. Goodan, L.A. (1981). Assocaton odels and canoncal correlaton n the analyss of cross-classfcatons havng ordered categores. ournal of the Aercan Statstcal Assocaton, 76, 320 334. Goodan, L.A. (1985a). Correspondence analyss odels, log-lnear odels and log-blnear odels for the analyss of contngency tables. Bulletn of the nternatonal Statstcal nsttute, 51, 28.1-1 28.1-14. Goodan, L.A. (1985b). The analyss of cross-classfed data havng ordered and/or unordered categores: assocaton odels, correlaton odels and asyetry odels for contngency tables wth or wthout ssng entres. The Annals of Statstcs, 13, 10 69. Goodan, L.A. (1986). Soe useful extensons of the usual correspondence analyss approach and the usual log-lnear odels approach n the analyss of contngency tables. nternatonal Statstcal Revew, 54, 243 309. Goodan, L.A. (1991). easures, odels and graphcal dsplays n the analyss of cross-classfed data. ournal of the Aercan Statststcal Assocaton, 86, 1085 1111.
282 E.. BEH Goodan, L.A. (1996). A sngle general ethod for the analyss of cross-classfed data: Reconclaton and synthess of soe ethods of Pearson, Yule, and Fsher, and also soe ethods of correspondence analyss and assocaton analyss. ournal of the Aercan Statstcal Assocaton, 91, 408 428. Goran, B.S. & Pravera, L.H. (1993). CA a sple progra for ultple correspondence analyss. Educatonal Psychologcal easureent, 53, 685 687. Gower,.C. (1989). Dscusson of a cobned approach to contngency table analyss usng correspondence analyss and log-lnear analyss. Appled Statstcs, 38, 249 292. Grass,. & Vsentn, S. (1994). Correspondence analyss appled to grouped cohort data. Statstcs n edcne, 13, 2407 2425. Greenacre,.. (1984). Theory and Applcaton of Correspondence Analyss. London: Acadec Press. Greenacre,.. (1986). SCA: A progra to perfor sple correspondence analyss. The Aercan Statstcan, 40, 230 231. Greenacre,.. (1988). Correspondence analyss of ultvarate categorcal data by weghted least-squares. Boetrka, 75, 457 467. Greenacre,.. (1989). The Carroll Green Schaffer scalng n correspondence analyss: a theoretcal and eprcal apprasal. ournal of arketng Research, 26, 358 365. Greenacre,.. (1990). Soe ltatons of ultple correspondence analyss. Coputatonal Statstcs Quarterly, 3, 249 256. Greenacre,.. (1991). nterpretng ultple correspondence analyss. Appled Stochastc odels and Data Analyss, 7, 195 210. Guttan, L. (1941). The quantfcaton of a class of attrbutes: A theory and ethod of scale constructon. n The Predcton of Personal Adustent, Ed. P. Horst, pp. 319 348. Socal Scence Research Councl, New York. Guttan, L. (1953). A note on Sr Cyrl Burt s Factoral analyss of qualtatve data. The Brtsh ournal of Statstcal Psychology, 6, 1 4. Haberan, S.. (1981). Tests for ndependence n two-way contngency tables based on canoncal correlaton and on lnearby-lnear nteracton. The Annals of Statstcs, 9, 1178 1186. Heser, W.. (1981). Unfoldng Analyss of Proxty Data, Doctor of Socal Scences Thess. Departent of Data Theory, Unversty of Leden, The Netherlands. Hll,.O. (1973). Recprocal averagng: an egenvector ethod of ordnaton. ournal of Ecology, 61, 237 251. Hll,.O. (1974). Correspondence analyss: a neglected ultvarate ethod. Appled Statstcs, 23, 340 354. Hll,.O. & Gauch r., H.G. (1980). Detrended correspondence analyss: an proved ordnaton technque. Vegetato, 42, 47 58. Hrschfeld, H.O. (1935). A connecton between correlaton and contngency. Proceedngs of the Cabrdge Phlosophcal Socety, 31, 520 524. Hoffan, D., de Leeuw,. & Arun, R. (1995). ultple correspondence analyss. n Advanced ethods of arketng Research, Ed. R.P. Bagozz, pp. 260 294. Hoffan, D.L. (1991). Revew of four correspondence analyss progras for the B PC. The Aercan Statstcan, 45, 305 311. Horst, P. (1935). easurng coplex atttudes. The ournal of Socal Psychology, 6, 369 375. ohnson, R.. (1963). On a theore stated by Eckart and Young. Psychoetrka, 28, 259 263. Kendall,.G. (1972). The hstory and future of statstcs. n Statstcal Papers n Honor of George W. Snedecor, Ed. T.A. Bancroft, pp. 193 210. The owa State Unversty Press, Aes, owa. K, H. (1992). easures of nfluence n correspondence analyss. ournal of Statstcal Coputaton and Sulaton, 40, 201 217. K, H. (1994). nfluence functons n ultple correspondence analyss. Korean ournal of Appled Statstcs, 7, 69 74. Kroonenberg, P.. & Lobardo, R. (1998). Nonsyetrc correspondence analyss: A tutoral. Kwanttateve ethoden, 58, 57 83. Kroonenberg, P.. & Lobardo, R. (1999). Nonsyetrc correspondence analyss: A tool for analysng contngency tables wth a dependence structure. ultvarate Behavoral Research, 34, 367 396. Krzanowsk, W.. (1993). Attrbute selecton n correspondence analyss of ncdence atrces. Appled Statstcs, 42, 529 541. Lancaster, H.O. (1969). The Ch-squared Dstrbuton. New York: Wley. Lauro, C. & Balb, S. (1999). The analyss of structured qualtatve data. Appled Stochastc odels and Data Analyss, 15, 1 27. Lebart, L. (1982). Exploratory analyss of large atrces wth applcatons to textual data. n COSTAT 1982, Eds. P.E.H. Caussnus and R. Toassone, pp. 67 75. Wen: Physca-Verlag. Lebart, L. & orneau, A. (1982). SPAD A syste of FORTRAN progras for correspondence analyss. ournal of arketng Research, 19, 608 609. Lebart, L., orneau, A. & Tabard, N. (1977). Technques de la Descrpton Statstque: ethodes et logcels pour l analyse des grands tableaux. Pars: Dunod. Lebart, L., orneau, A. & Warwck, K.. (1984). ultvarate Descrptve Statstcal Analyss. New York: Wley. Lobardo, R., Carler, A. & D Abra, L. (1996). Nonsyetrc correspondence analyss for three-way contngency tables. ethodologca, 4, 59 80. agnan,.f. (1974). Correspondence factoral analyss. n COSTAT 1974, Eds. G. Bruckann, F. Ferschl and L. Schetterer, pp. 234 243. Wen: Physca-Verlag. nchn, P.R. (1987). An evaluaton of the relatve robustness of technques for ecologcal ordnaton. Vegetato, 69, 89 107. Nshsato, S. (1980). Analyss of Categorcal Data: Dual Scalng and ts Applcatons. Toronto: Unversty of Toronto Press.
Sple Correspondence Analyss: A Bblographc Revew 283 Nshsato, S. (1994). Eleents of Dual Scalng: An ntroducton to Practcal Data Analyss. Hllsdale, N..: L. Erlbau Assocates. Okaoto,. (1994a). Correspondence analyss of an artfcal cylnder data. Statstcs & Probablty Letters, 20, 101 112. Okaoto,. (1994b). Correspondence analyss of an artfcal dsk data. ournal of the apanese Statstcal Socety, 24, 157 168. Okaoto,. (1994c). Correspondence analyss of an artfcal torus data. Behavoretrka, 21, 149 161. Okaoto,. (1995a). Correspondence analyss of artfcal data based on non-regular syetrc polyhedron. atheatca aponca, 44, 61 66. Okaoto,. (1995b). Correspondence analyss of soe artfcal data wth ultple crcular structure. atheatca aponca, 42, 201 212. Okaoto,. (2000). Correspondence analyss of typcal geoetrc fgures. atheatca aponca, 51, 145 152. Okaoto,. & Endo, H. (1995). Sphercal trat n correspondence analyss of artfcal data. ournal of the apan Statstcal Socety, 25, 181 191. Oksanen,. (1987). Probles of ont dsplay of speces and ste scores n correspondence analyss. Vegetato, 72, 51 57. Oksanen,. (1988). A note on the occasonal nstablty of detrendng n correspondence analyss. Vegetato, 74, 29 32. Oksanen,. & nchn, P.R. (1997). nstablty of ordnaton results under changes n nput data order: explanatons and reedes. ournal of Vegetaton Scence, 8, 447 454. Pack, P. & ollffe,.t. (1992). nfluence n correspondence analyss. Appled Statstcs, 41, 365 380. Paler,.N. (1993). Puttng thngs n even better order: the advantage of canoncal correspondence analyss. Ecology, 74, 2215 2230. Pearson, K. (1900). On a crteron that a gven syste of devatons fro the probable n the case of a correlated syste of varables s such that t can be reasonably supposed to have arsen fro rando saplng. Phlosophcal agazne (Seres 5), 50, 157 175. Pearson, K. (1904). On the theory of contngency and ts relaton to assocaton and noral correlatons. Drapers Copany Research eors (Boetrc Seres), 1. Pearson, K. (1906). On certan ponts connected wth scale order n the case of a correlaton of two characters for soe arrangeent gve a lnear regresson lne. Boetrka, 5, 176 178. Peet, R.K., Knox, R.G., Case,.S. & Allen, R.B. (1988). Puttng thngs n order: the advantages of detrended correspondence analyss. Aercan Naturalst, 131, 924 934. Rayner,.C.W. & Best, D.. (1996). Sooth extensons of Pearsons product oent correlaton and Spearans Rho. Statstcs and Probablty Letters, 30, 171 177. Rchardson,. & Kuder, G.F. (1933). akng a ratng scale that easures. Personnel ournal, 12, 36 40. Rtov, Y. & Glula, Z. (1991). The order restrcted RC odel for ordered contngency tables: estaton and testng for ft. Annals of Statstcs, 19, 2090 2101. Rtov, Y. & Glula, Z. (1993). Analyss of contngency tables by correspondence odels subect to ordered constrants. ournal of the Aercan Statstcal Assocaton, 88, 1380 1387. Ro, D. & Sarkar, S.K. (1992). A generalsed odel for the analyss of assocaton n ordnal contngency tables. ournal of Statstcal Plannng and nference, 33, 205 212. Sh,. & Carr,.R. (2001). A odfed code for R-ode correspondence analyss of large-scale probles. Coputers & Geoscences, 27, 139 146. Tel, H. (1975). Correspondence factor analyss: An outlne of ts ethod. atheatcal Geology, 7, 3 12. Tel, H. & Chenee,.L. (1975). Applcaton of correspondence factor analyss to the study of aor and trace eleents n the Erta Ale Chan (Afar, Ethopa). atheatcal Geology, 7, 13 30. ter Braak, C..F. (1986). Canoncal correspondence analyss: a new egenvector technque for ultvarate drect gradent analyss. Ecology, 67, 1167 1179. ter Braak, C..F. (1987). Ordnaton. n Data Analyss n County and Landscape Ecology, Eds. R.H. ongan, C.F.. ter Braak and O.F.R. van Tongeren, pp. 91 173. Wagenngen: Pudoc. ter Braak, C..F. (1988). Partal canoncal correspondence analyss. n Classfcaton and Related ethods of Data Analyss, Ed. H.H. Bock, pp. 551 558. Asterda: North-Holland. Thopson, P.A. (1995). Correspondence analyss n statstcal package progras. The Aercan Statstcan, 49, 310 316. Tan, D.Q., Sorooshan, S. & yers, D.E. (1993). Correspondence analyss wth atlab. Coputers and Geoscences, 19, 1007 1022. van der Heden, P. & de Leeuw,. (1985). Correspondence analyss used coplentary to log-lnear analyss. Psychoetrka, 50, 429 447. van der Heden, P. & Worsley, K.. (1988). Coent on Correspondence analyss used coplentary to log-lnear analyss. Psychoetrka, 53, 287 291. van der Heden, P.G.., de Falguerolles, A. & de Leeuw,. (1989). A cobned approach to contngency table analyss usng correspondence analyss and log-lnear analyss. Appled Statstcs, 38, 249 292. van Heel,. & Frank,. (1980). Classfcaton of partcles n nosy electron crographs usng correspondence analyss. n Pattern Recognton n Practce, Eds. E.S. Gelsea and L.N. Kanal, pp. 235 243. Asterda: North-Holland. van Ker, P. (1998). Sple and ultple correspondence analyss n STATA. STATA Techncal Bulletn Reprnts, 10, 210 217. van eter, K.., Schltz,.-A., Cbos, P. & ouner, L. (1994). Correspondence analyss: A hstory and French socologcal perspectve. n Correspondence Analyss n the Socal Scences: Recent Developents and Applcatons, Eds.. Greenacre and. Blasus, pp. 128 137. San Dego: Acadec Press. Wartenberg, D., Ferson, S. & Rohlf, F.. (1987). Puttng thngs n order: a crtque of detrended correspondence analyss. Aercan Naturalst, 129, 434 448.
284 E.. BEH Wllas, E.. (1952). Use of scores for the analyss of assocaton n contngency tables. Boetrka, 39, 274 289. Yaakawa, A., chhash, H. & yosh, T. (1998). ultple correspondence analyss based on L s -Nor and ts applcaton to an analyss of senor sulaton. n Proceedngs of the 2nd apan Australa ont Workshop on ntellgent and Evolutonary Systes, pp. 99 106. Yaakawa, A., chhash, H. & yosh, T. (1999). ultple correspondence analyss of etc experences of advanced aged persons. n Proceedngs of nternatonal Conference on Producton Research, Vol. 2, pp. 1065 1068. Yaakawa, A., Kanau, Y., chhash, H. & yosh, T. (1999). Sultaneous applcaton of clusterng and correspondence analyss. n Proceedngs of nternatonal Conference on Neural Networks, No. 625. Résué Au cours des dernères décennes, l analyse des correspondances a gagné une réputaton nternatonale d outl statstque pussant pour l analyse graphque des tableaux de contngence. Cette popularté provent de son développeent et de son applcaton dans de nobreux pays Européens, partculèreent la France, et son utlsaton s est étendue à des pays anglophones tels que les Etats Uns et le Royaue Un. Sa popularté crossante par les pratcens de la statstque, et plus réceent dans des dscplnes où le rôle de la statstque est ons donant, déontre l portance de la recherche et du développeent contnuels sur la éthodologe. Le but de cet artcle est de soulgner les aspects théorques, pratques et nforatques de l analyse des correspondances sple et de dscuter sa relaton avec des avanćees récentes qu peuvent tre utlsées pour représenter graphqueent l assocaton en données catégorelles à deux densons. [Receved Septeber 2002, accepted Deceber 2003]