Inferring Individual Level Relationships from Aggregate Data *


 Jade Evans
 1 years ago
 Views:
Transcription
1 Inferrng Indvdual Level Relatonshps from Aggregate Data * Khong Eom **, Youngjae Jn *** < ABSTRACT > Ths paper ntroduces a technque for nferrng ndvdual level relatonshps from aggregate data. Socal scentsts encounter wth the ecologcal fallacy problem where ndvdual level data are not avalable, yet the ndvdual level relatonshp s sought. In partcular, t s the case that socal scentsts attempt to examne a subject for whch no survey data are avalable or relable. A seres of crosslevel nference technques have been ntroduced snce the Goodman s semnal work (1959). We ntroduce a new technque of sgnfcantly mprovng the crosslevel nference, the Gary Kng s soluton. For the purpose of verfcaton, we examned regonal votng n the 16 th Natonal Assembly electons of South Korea. The estmates of regonal votng are compared wth those of survey results. We found that the estmates from the Kng s soluton are closely matchng wth those from survey results f cell frequency n survey results s large enough. The Kng s soluton produces more relable estmates n a case that cell frequency n survey results s small. Key words : Crosslevel nference, ecologcal fallacy, aggregate data, ndvdual level relatonshp * An earler verson was presented at the 53rd meetng of the Internatonal Statstcal Insttute, Seoul, Korea, August 2229, ** Frostburg State Unversty, Lecturer (emal: *** Yonse Unversty, Assocate Professor (emal:
2 I. Introducton Who voted for Chunghee Park n South Korea n the 1963 presdental electon? What s the level of regonal votng n that electon? How many tmes dd female have an experence on aborton for her lfe tme? They all are nterestng questons, yet hard to examne, partly because t s a hstorcal matter so that survey data s not avalable or partly because a survey respondent had a poltcally correct answer, f data were avalable, and thus results mght be based. The purpose of ths paper s to ntroduce a new technque for solvng these problems, the Gary Kng s ecologcal regresson. To verfy ths technque, we compare estmates from survey results wth those from the Kng s method. The case selected for the verfcaton s "regonal votng" n the 16 th Korean Natonal Assembly electons. The regonal votng refers to the concentraton of votes along regonal party lnes n a number of Korean regons (Km 1994; Lee 1998; Lee and Brunn 1996). 1) Stated another way, voters whose hometown s Jeolla mostly vote for the canddate of the party whose leaders were born n the regon. Snce ths appears to happen regardless of the qualty of canddates and the deology of party, votng patterns result n partes often beng representatves of regons 1) By defnton, regonalsm refers to the voters affectve dentfcatons wth, and support for, canddates wth roots n ther respectve regons (Km and Koh 1980, 81)
3 nstead of dstrcts or the naton (Shn, Jn, Gross and Eom 2005). However, the level of regonal votng has been a dffcult topc to be examned, because votng s secret and a survey respondent may have a poltcally correct answer. Snce the Goodman s semnal work (1959), several technques for the crosslevel nference s developed to solve the ecologcal fallacy problem (Palmqust 1993). The Kng s soluton (1997) s well known for producng effcent and robust estmates. In addton, t contans nformaton on the uncertanty of estmates at the level of analyss. After the explanaton of the Kng s soluton, we analyzed regonal votng n the 16th Natonal Assembly electons. The estmates from the Kng's soluton are compared wth those from survey results. We found that estmates from the Kng s soluton are closely matchng wth those from survey results f cell frequency n survey results s large enough. The Kng s soluton produces more relable estmates n a case that cell frequency n survey results s small. II. Ecologcal Fallacy and Ecologcal Regressons The crosslevel nference s "the process of usng aggregate (.e., "ecologcal") data to nfer dscrete ndvdual level relatonshps of nterest" (Kng 1997, xv). It provdes a soluton for the problem of ecologcal fallacy. In ths secton, we ntroduce the problem of  2 
4 ecologcal fallacy. We then move to descrbe a seres of efforts to solve ths problem. 1. Ecologcal Fallacy It s well known that usng aggregate level data to fgure out ndvdual level relatonshps generates the ecologcal fallacy problem whch produces based and neffcent estmates (Palmqust 1993). For example, suppose that our research queston s to examne the level of lteracy between the foregn born and the natve (Robnson 1950). Further, assume that we have three groups (the sophstcated, the regular and the foregn born), and both the sophstcated and the foregn born prefer to lve n a cty and the regular lke to lve n a rural area. If a researcher regresses the percentage of the foregn born on lteracy rate at the county level, he or she may fnd that the greater the percentage of foregn born, the hgher the lteracy rate. It would be a shockng result, because the foregn born are not lkely to be lterate. However, f one analyzes the relatonshp at the ndvdual level, he or she may fnd a dfferent and more convncng result; the natve tend to have a hgher lteracy than the foregn born. Ths dscrepancy occurs because the sophstcated as well as the foregn born resde n the same type of area,.e., cty. Wthout consderaton of aggregaton unt, the fndngs from aggregate data mslead the ndvdual level relatonshp. It shows the napproprateness of usng aggregate data to examne the ndvdual level relatonshp
5 2. Ecologcal Regressons To solve the aggregaton bas, several methods have been ntroduced. A common assumpton the models make can be descrbed n below table. <Table 1> The Robnson s problem Lterate (L) Illterate (IL) Margnal The Natve(N)?? The Foregn born (F)?? 1000 Margnal Let s suppose that n a total populaton of 21,000 we observe only margnal populaton values for the Natve (N) and the Foregn born (F): 20,000 and 1,000. Also we know margnal values for the Lterate (L) and the Illterate (IL). Our research problem s to fnd cell frequency noted as queston marks; how many are the lterate among the natve and how many are the lterate among the foregn born? We then calculate lteracy ratos between the natve and the foregn born and examne whether the brthplace s related to the lteracy. One of the ways to solve ths problem can be suggested as follows. Let s suppose that we know the value for the left upper corner by pure luck; the number of the lterate among the natve s 15,000. Once we have ths nformaton, we can accordngly calculate the rest of cell values. The results are shown n table
6 <Table 2> A soluton for the Robnson s problem Lterate (L) Illterate (IL) Margnal The Natve(N) (5000) The Foregn born (F) (0) (1000) 1000 Margnal Snce the natve populaton who can read and wrte s 15,000, the number of the llterate among the natve s 5000 n a populaton of In addton, the number of the entre lterate s 15,000 and the number of the lterate natve s 15,000, and thus the number of the lterate foregn born s zero. For a purpose of comparson, table 2 can be rewrtten n table 3. <Table 3> Lteracy Ratos Lterate (L) Illterate (IL) Margnal The Natve(N) The Foregn born (F) Margnal Of the natve, the lteracy rato s 0.75 whle t s 0.00 for the foregn born. Therefore, t leads to a concluson that the natve are more lkely to be lterate compared to the foregn born. Ths example shows that we are able to dsaggregate aggregate data f we "correctly" mpose some constrants on the parameter of our nterest. In ths case, we assumed that some nformaton on the number of the  5 
7 lterate among the natve s avalable. We can generalze table 3 n the followng table. <Table 4> General Form of Ecologcal Regresson Lterate (L) Illterate (IL) Margnal The Natve(N) β N 1β N X The Foregn(F) β F 1β F 1X Margnal T 1T X s the proporton of the natve, β N s the lteracy rato for the natve, and β F s the lteracy rato for the foregn born. T s the proporton of the lterate and "" s an aggregaton unt. Wth some constrants, the parameters of our nterest (β N and β F ) can be calculated usng aggregate values of X and T. Wth ths general form, two approaches for ecologcal regresson have been developed: method of bounds and statstcal approach. Method of bounds uses determnstc nformaton n data (Achen and Shvely 1995). Let s suppose once agan we attempt to estmate the lteracy rato among the natve and the foregn born wth aggregate nformaton. The relatonshp n table 4 can be wrtten as follows: T = β N Then, X + β F (1X ) 1) β N = T X X 1 X β F  6 
8 Snce βs are a proporton, t should be between 0 and 1; 0 β T 1. In addton, f β F = 0, X becomes a maxmum value for β N, whle f β F T 1+ X = 1, X becomes a mnmum value for β N. Hence, lower and upper lmts for β N are: T 1+ X Max X,0, T Mn X,1. Wth the same procedure, we can obtan lower and upper lmts for β F as follows: T X Max 1 X,0, T Mn 1 X,1. Wth our example, the range of plausble values for β N s = [Max (0.7, 0), Mn ( )]= [0.7, 0.75]. Therefore, we can sgnfcantly narrow down plausble values of parameter β N. However, n often cases, method of bounds produces too broad nformaton, especally when the dstrbuton of proportons (X and/or T) s consderably skewed. For example, the range of β F values as n our example s [Max (5, 0), Mn (15, 1)] = [0, 1]. In ths case, method of bounds does not reduce the range of plausble values for
9 The second approach for ecologcal regressons has been developed to use logc of statstcal assocaton. If there s assocaton between varables, t wll occur across unts wth some fluctuaton. The frst model was developed by Leo A. Goodman (1959). He argues that f we can reasonably assume three thngs, we can nfer ndvdual behavor from aggregate data. Hs assumptons are constant effect of parameters, lnear functon, and normal dstrbuton of resduals. Followng hs suggeston, the equaton 1) can be rewrtten as follows: T = β N X + β F (1X ) + e, 2) Where e s resduals. If three condtons are met, he argues, parameter βs and ther standard errors are correctly nferred. The Goodman s model, however, has several problems (Voss 2000). Frst, hs constant effect assumpton s not substantvely reasonable. For example, beng the constant lteracy rato for the natve across unts are too restrctve. If the parameters (βs) covary wth a unt, the estmates may over or underestmates true βs due to the aggregaton bas. Second, snce the Goodman s model produces only a sngle estmate, t s hard to know ndvdual behavor wthn a unt. A seres of models have been developed to solve or relax these assumptons (Achen and Shverly 1995). For example, the  8 
10 homogeneous model utlzes, rather than estmates, nformaton from observed data. That s, wth our example, the homogeneous model observes the lteracy rato among the entre natve, and then uses ths rato as a benchmark for nferrng ndvdual relatonshps. The same procedure s appled for the lteracy rato for the entre foregn born. It s only useful when unts are hghly segregated, however. It becomes unrelable when both the natve and the foregn born are mxed n the same unt. The nformed assumpton model uses nformed knowledge nstead of observed lteracy rato. For example, we may have pror nformaton that the entre foregn born are llterate. In ths case, β F becomes zero, and thus we can use ths nformaton and then calculate β N and the rest of βs, as shown n table 4. However, n most cases, pror nformaton s unattanable. And, researchers may not receve a warnng when ths nformed knowledge s ncorrect, whch results n based estmates of parameters (Voss 2000). The fnal example for ecologcal regressons has a dfferent premse. The neghborhood model assumes that parameters of our nterest s the same wthn a unt (β N = β F ), yet vares across unts (β N β N j, where j). Therefore, the equaton 2) becomes T = + β (1X ) + e = β + e, where β s a functon of X. For example, ths model assumes that the lteracy rato between the natve and the foregn born s the same wthn the same unt, whle the lteracy rato vares across unts. As one may notce, the assumpton  9 
11 the neghborhood model makes s too strong. Even t s a plausble assumpton, we do not have to estmate a model, because we have an answer for our research queston; whether the brthplace s related to the level of the lteracy. Wth an excepton of the neghborhood model, a survey of ecologcal regressons shows some common problems. Frst, all of the models assumed the constant effect of parameters. It seems to be too restrctve, because parameters of our nterest are hardly constant across unts. Second, the models produce only a sngle estmate. Snce we attempt to nfer the ndvdual level relatonshp, t s not lkely to be satsfed wth a sngle estmate. Fnally, f an equaton has more than two parameters to be estmated, t s hard to magne how these methods can be extended. Gary Kng (1997) provdes an nterestng method to solve these problems. Frst, he does not assume a constant effect; rather he assumes that a parameter vares wth a common underlyng dmenson. Second, because of the varyng parameter, we may have an estmate per unt. In addton, snce hs method uses addtonal nformaton from method of bounds, the estmates become more effcent. Hs method can be wrtten as follows (Kng 1997, 9394): T = β N X + β F (1X ) + e, where P(β N, β F ) = TN (β N, β F Β, Σ) 3)
12 Probablty densty of parameters (β N, β F ) follows truncated normal dstrbuton of (β N, β F ) wth lmts β N, = [0, 1] and β F = [0, 1]. Wth the help of method of bounds, these lmts for (β N, β F ) can be narrowed down as follows: β N, = T 1+ X Max X,0, T Mn X,1 β F = T X Max 1 X,0, T Mn 1 X,1. The mean and varance matrx of (β N, β F ) are Β Β = Β N F and 2 σ N = σ σ NF Σ 2 NF σ F If hs three assumptons are met, estmaton produces an effcent and robust estmate. 2) The estmaton procedure of the Kng s soluton can be summarzed as follows: 1) The frst step calculates the bounds of parameters. 2) The second step estmates parameters from truncated bvarate normal dstrbutons wthn the bounds. If one extends a model wth more than two parameters, the estmates from the frst estmaton are used for margnal values. For 2) Three assumptons are sngle model of parameter, the absence of spatal correlaton, and no correlaton of margnal and parameter. Kng, Rosen, and Tanner (1999, 6768) show, however, that the volaton of the thrd assumpton does not produce based estmates f the bounds of parameters are low enough
13 example, f one s nterested n the proporton of regonal votng, he or she frst estmates a turnout rate among those who were born n a certan regon n a gven dstrct. The estmated turnout rate s used as margnal values for the proporton of regonal votng. It can be dagramed below: <Fgure 1> Kng s Soluton: the frst step Jeolla Vote Not vote Margnal β J 1β J X Other Regons β J ' 1β J ' 1X Margnal T 1T Where X s the proporton of votng age populaton who were born n Jeolla, T s the proporton of voters those who turn out to vote, β J s a turnout rate among those who were born n Jeolla, β J ' s a turnout rate among those who were born n a regon other than Jeolla, and "" s a dstrct ndcator. <Fgure 2> Kng s Soluton: the second step Vote Not vote Margnal NCNP Other partes Jeolla λ J 1λ J β J 1β J x Other Regons λ J ' 1λ J ' β J ' 1β J ' 1x P 1  P T Where "x" s the estmated proporton of voters whose hometown s n Jeolla and who turn out to vote, P s the vote share for a canddate whose party label s the Natonal Congress for New Poltcs (NCNP), and proporton of regonal votng. s the
14 The frst step s to examne turnout rate ( and ) for those who were born n Jeolla (X ) and for those who were born n areas other than Jeolla (1X ). Once we obtan estmates for βs, these estmates are used to calculate margnal values for regonal votng estmates (x and 1x ). The second step starts wth the calculaton of bounds of parameters (λs) and then estmates the parameters across unts. Note, however, that snce has a component to be estmated, t s not a fxed varable. Therefore, extendng tables produce more uncertan estmates due to added uncertanty orgnatng from the frst estmaton. 3) In next secton, we apply the Kng s soluton to fnd the level of regonal votng n the Korean Natonal Assembly electons of III. Applcaton: Dsaggregatng Regonal Votng The 2000 electon outcomes n Korea suggest that there are three regons whch tend to exhbt partsan regonalsm: Jeolla, Gyeongsang, and Chungcheong. Jeolla regon covers Jeollabukdo and Jeollanamdo areas, Gyeongsang regon refers to Gyeongsangbukdo and Gyeongsangnamdo areas, and Chungcheong regon means Chungcheongbukdo and Chungcheongnamdo areas. Regonal domnance by a partcular party was specfed n terms of the 3) Note that the parameters (λ and (1λ)) are weghted by the number of votng age populaton n a gven dstrct
15 brthplace of partcular party leaders. A leader of the Grand Natonal Party, Km Yong Sam was born n Gyeongsang regon a leader of the Natonal Congress for New Poltcs, Km Dae Jung n Jeolla regon and a leader of the Unted Lberal Democrats, Km Chong Phl n Chungcheong regon. Ths lnk between the brthplace of a party leader and the domnance of a partcular party s well documented n contemporary Korean poltcs (Km 1994; Lee 1998; Lee and Brunn 1996). In ths secton, usng the Gary Kng s ecologcal regresson we attempt to dsaggregate aggregate votes along the level of regonal party lnes n a dstrct. The Kng s method nfers regonal votng at the canddate level. The percentage of regonal votng at the canddate level wll be averaged out across regonal blocs and compared to estmates from survey results. The followng equatons are to be estmated:, 4), 5), 6) where P s the vote share of a canddate, λ s the proporton of regonal votng, and λ' s the proporton of nonregonal votng. J ndcates Jeolla, G Gyeongsang, and C Chungcheong. "x " s, where β s a turnout rate for those were born n a certan
16 regon, and X s the proporton of voters for those who were born n a certan regon. "" ndcates a dstrct. The level of analyss s the canddate level. Estmaton s done by the program called "EzI." 4) In the 16 th Natonal Assembly electons of Korea (Aprl 13, 2000), 194 ncumbent and 449 nonncumbent canddates ran for offce (Natonal Electon Commsson 2000). We focus our analyss on the vote share for canddates of the three major partes. 5) The percentage of those who regstered ther brthplace n a gven dstrct s collected wth the help of one of major partes. 6) The results are shown n table 5. <Table 5> Regonal Votng Estmates from Ecologcal Inference Regonal Votng Level (Average λs) Regonal Blocs GNP NCNP ULD Seoul 61.00% 75.27% 2.57% Busan 67.29% 64.66% 1.62% Daegu 62.02% 70.49% 2.26% Incheon 61.05% 73.75% 2.51% Gwangju 56.14% 79.46% 0.95% Ulsan 57.61% 66.25% 1.23% Gyeonggdo 60.43% 73.92% 2.48% Gangwondo 60.14% 71.80% 2.61% Jeollabukdo 57.52% 61.04% 1.76% Jeollanamdo 43.85% 63.86% 0.00% 4) "EzI" are developed by Kenneth Benot and Gary Kng (released n 2001). It s avalable from vsted May 1, ) We focus on only these three partes because they comprsed over 96% of the sngle member dstrct seats n the 16 th Natonal Assembly Electon. 6) Because of a contrbutor s request, the source of data has not been released. Data on dstrcts n Chungcheongdo are not avalable so that the number of dstrcts n ths study are
17 Gyeongsangbukdo 54.88% 71.17% 1.83% Gyeongsangnamdo 55.35% 68.99% 2.18% Average 52.21% 61.74% 1.04% Source: compled by the authors. Note: Average λ s calculated by averagng out dstrct level regonal votngs (λ ) along wth regonal blocs. GNP stands for the Grand Natonal Party, NCNP for the Natonal Congress for New Poltcs and ULD for the Unted Lberal Democrats. Table 5 shows that on average the percentage of regonal votng (61.74%) s the hghest among those who were born n Jeolla and t may be beneft to canddates of the Natonal Congress for New Poltcs. The percentage of regonal votng for those who were born n Gyeongsang ranked the second. Not surprsngly, the level of regonal votng are the lowest for those who were born n Chungcheong. It resulted n less concentraton of votes on canddates runnng under the Unted Lberal Democrats (ULD). In the 15 th Natonal Assembly Electons of 1996, the ULD won 25 of the 28 seats. But, by the electons of 2000, the ULD was only able to wn 11 of the 24 seats n Chungcheong. It s also the case when one examnes the percentage of regonal votng wthn a regonal bloc. For example, percent of those who were born n Jeolla cast a regonal votng f they resde n dstrcts wthn Gwangju. More than 70 percent of voters also voted for canddates of the NCNP n dstrcts wthn Seoul, Daegu, Incheon, Gyeonggdo, Gangwondo and Gyeongsangbukdo f they were born n Jeolla
18 Those who were born n Gyeongsang tend to cast a slghtly less regonal votng, yet qute a sgnfcant level. On average, more than half of voters who were born n Gyeongsang cast a regonal votng n the 16 th Natonal Assembly electons. It s especally the case when one examnes n dstrcts wthn Seoul, Busan, Daegu, Incheon, Gyeonggdo, and Gangwondo more than 60 percent of voters voted for canddates of the GNP f they were born n Gyeongsang. It s also the case, though to less extent, f he or she resdes n Gwangju, Ulsan, Jeollabukdo, Gyeongsangbukdo, and Gyeongsangnamdo. Not surprsngly, those who were born n Chungcheong cast the least extent of regonal votng. Only handful of voters who were born n Chungcheong cast regonal votng on average. However, t should be noted that data for dstrcts wthn Chungcheong area were not avalable and thus estmates may be underestmated. In sum, regonal votng estmates from the Kng s method provde supportve evdence for the argument that regonal votng s a natonwde problem (Km 1994; Lee 1998; Lee and Brunn 1996). Not only s the level of regonal votng sgnfcant n dstrcts wthn the the known regonal votng blocs (Jeolla and Gyeongsang), but also t appears to be substantal n dstrcts outsde these regonal blocs. However, there s a sgnfcant fluctuaton at the level of regonal votng across regons. For example, the percentage of regonal votng for those who were born n Jeolla vares from 61.04% n Jeollabukdo
19 to 79.46% n Gwangju, whle t vares from 43.85% n Jeollanamdo to 67.29% n Busan f voters were born n Gyeongsang. We can conclude that the level of regonal votng s not constant, but vares across regonal blocs. The results from ecologcal nference can be verfed by survey results. The procedure s the same above except that fgures are obtaned from ndvdual level data. The frst step s to dentfy voters who were born n a certan regon, and then calculate how many these voters turn out to vote for the pertnent party. The Korean Socal Scence Data Center conducted a survey of the 16 th Natonal Assembly Electons n Aprl 13, 2000 (Korean Socal Scence Data Center 2000). Multstage quota samplng technque was used to collect a random sample by regonal blocs. 1,100 ntervews were completed wth a rejecton rate of 5 percent. Fortunately, the survey ncludes a queston on the hometown of and the vote choce of a respondent. These two questons were used to construct a regonal votng; for example, f he or she was born n Jeolla area and voted for the NCNP, t s coded as a regonal votng for the NCNP. In Seoul, ffty fve respondents were born n Jeolla. Thrty four out of the ffty fve voted for the NCNP. Therefore, the percentage of regonal votng for the NCNP s percent for Seoul. Table 6 shows the percentage of regonal votng n regonal blocs. <Table 6> Regonal Votng Estmates from Survey Results
20 Regonal Votng (Percentage/Frequency) Regonal Blocs GNP NCNP ULD Seoul 46.88% 61.82% 3.03% (32) (55) (33) Incheon/Gyeonggdo 65.00% 57.14% 7.50% (20) (35) (40) Gangwondo 0.00% % 0.00% (1) (1) (3) Daejeon/Chungcheongnamdo 0.00% 20.00% 20.34% (4) (5) (59) Chungcheongbukdo 33.33% 0.00% 27.59% (3) (1) (29) Gwangju/Jeollanamdo 25.00% 50.68% 0.00% (4) (73) (4) Jeollabukdo 0.00% 60.00% 0.00% (1) (40) (5) Busan/Ulsan/Gyeongsangnamdo 64.24% 31.25% 0.00% (151) (16) (8) Daegu/Gyeongsangbukdo 53.15% 33.33% 25.00% (111) (3) (4) Average 30.09% 46.02% 9.27% Source: The Korean Socal Scence Data Center (2000). Fgures n parenthess are the number of respondents. Table 6 shows that regonal votng s the most evdent for those who were born n Jeolla, followed by those who were born n Gyeongsang and n Chungcheong. The level of regonal votng sslghtly low compared to that from the Kng s soluton. On average, percent voted for the NCNP f they were born n Jeolla, whle t s percent f voters were born n Gyeongsang. A sgnfcant fluctuaton appeared across regonal blocs. In partcular, f cell frequency s too small, the varaton of regonal votng s beyond the acceptable range. For example, n Gangwondo where cell frequency s one, the percentage of regonal votng s
21 percent out of those who were born n Jeolla, whle t s zero percent n Chungcheongbukdo where cell frequency s also one. If one may focus on the level of regonal votng where the number of respondents are suffcent enough, we can fnd smlarty n the level of regonal votng between estmates from the Kng s method and estmates from survey results. For example, accordng to survey results, the percentage of regonal votng n Seoul s percent for the NCNP whle the comparable fgure s percent by the ecologcal regresson. It s 60 percent n Jeollabukdo by survey results, whle t s percent by the ecologcal regresson. We can safely conclude that estmates from the Kng s soluton are closely matchng wth those from survey results. IV. Concluson Applyng aggregate level fndngs for the ndvdual level relatonshps generates based estmates, known as the ecologcal fallacy problem. Socal scentsts often encounters wth a dffculty to conduct a research at the ndvdual level wth aggregate data. In partcular, f a research queston s related to the past event when survey data are not avalable, t s almost mpossble to pursue a research. Further, f there s a poltcally correct answer on survey questons, t s hard to obtan unbased estmates
22 In ths paper, we ntroduced a way to nfer the ndvdual level relatonshps from aggregate data. We began wth the aggregaton bas whch leads to the ecologcal fallacy problem. A seres of efforts have been suggested to solve the aggregaton bas. The method by Gary Kng, whch combnes method of bounds and statstcal assocaton, s emphaszed. The Kng s soluton s well known for a method to produce a robust and effcent estmate, even though there s a severe aggregaton bas. The soluton appled to nfer regonal votng at the canddate level. The percentage of regonal votng was averaged out across regonal blocs. The average percentages, then, were compared to estmates from survey results. We found that estmates from the Kng s method are closely matchng wth estmates from survey results f cell frequency n survey results s large enough. We also found that the former s more relable than the latter f cell frequency n survey results s small. Ecologcal regressons offer a new venue to examne prevously mpossble questons. For example, we can examne who voted for Chunghee Park n the 1963 Korean presdental electon. We can further queston why they voted for hm; for example, was the generaton effect related to the outcome of the 1963 Korean presdental electon? Furthermore, we can use ecologcal regressons to examne whether or not a voter casts a vote for a party canddate n a congressonal electon, whle the same voter casts a dfferent party canddate for a presdental electon (Burden and Kmball 1998). We
23 should note, however, that ecologcal regressons also show some lmtaton. If tables are extended more than 2 by 2, the uncertanty of estmates gets thcker. Scholars of ecologcal regresson attempt to reduce ths uncertanty (Kng, Rosen, Tanner 1999; Rosen, Jang, Kng forthcomng)
24 < REFERENCE > Achen, Chrstopher H. and W. Phllps Shvely (1995). CrossLevel Inference. Chcago: Unversty of Chcago press. Benot, Kenneth and Gary Kng (1996). "A Prevew of EI and EzI: Program for Ecologcal Inference." Socal Scence Computer Revew 14: Burden, Barry C. and Davd C. Kmball (1998). "A New Approach to the Study of Tcket Splttng." Amercan Poltcal Scence Revew 92: Goodman, Leo (1959). "Some Alternatves to Ecologcal Correlaton." Amercan Journal of Socology 64: Km, JaeOn and B.C. Koh (1980). "The Dynamcs of Electoral Poltcs: Socal Development, Poltcal Partcpaton, and Manpulaton of Electoral Laws." n Poltcal Partcpaton n Korea: Democracy, Moblzaton, and Stablty edted by Chong Lm Km. Santa Barbara: CLIO books Kng, Gary, Or Rosen, and Martn A. Tanner (1999). "BnomalBeta Herarchcal Models for Ecologcal Inference." Socologcal Methods & Research 28: Kng, Gary (1997). A Soluton to the Ecologcal Inference Problem: Reconstructng Indvdual Behavor from Aggregate Data. Prnceton, NJ: Prnceton Unversty Press
25 Korean Socal Scence Data Center (2000). A Survey on Voters Atttudes toward the 16th General Electon. Seoul: Korean Socal Scence Data Center. Lee, Dong Ok and Stanley D. Brunn (1996). "Poltcs and regons n Korea: an analyss of the recent presdental electon." Poltcal Geography 15: Lee, Nam Young (1998). "Regonalsm and Votng Behavor n South Korea." Korea Observer 29: Natonal Electon Commsson. ( ). Palmqust, Bradley Lowell (1993). Ecologcal Inference, Aggregate Data Analyss of U. S. Electons, and the Socalst Party of Amerca. Ph. D. dssertaton at the Unversty of Calforna, Berkley. Robnson, W. S (1950). "Ecologcal Correlatons and the Behavor of Indvduals." Amercan Socologcal Revew 15: Rosen, Or, Wenxn Jang, Gary Kng, and Martn A. Tanner (Forthcomng). "Bayesan and Frequentst Inference for Ecologcal Inference: the R X C Case." Statstca Neerlandca. Shn, Myungsoon, Youngjae, Jn, Donald A. Gross, and Khong Eom (2005). "Money Matters n PartyCentered Poltcs: Campagn Spendng n Korean Congressonal Electons." Electoral Studes 24: Voss, D. Stephen (2000). Famlarty Doesn t Breed Contempt: The Poltcal Geography of Racal Polarzaton. Ph. D. dssertaton at Harvard Unversty
RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY:
Federco Podestà RECENT DEVELOPMENTS IN QUANTITATIVE COMPARATIVE METHODOLOGY: THE CASE OF POOLED TIME SERIES CROSSSECTION ANALYSIS DSS PAPERS SOC 302 INDICE 1. Advantages and Dsadvantages of Pooled Analyss...
More informationDollar a Day Revisited
Dollar a Day Revsted Martn Ravallon, Shaohua Chen, and Prem Sangraula The artcle presents the frst major update of the nternatonal $1 a day poverty lne, proposed n World Development Report 1990: Poverty
More informationEducational Expansion and its Heterogeneous Returns for Wage Workers
Dscusson Paper No. 07010 Educatonal Expanson and ts Heterogeneous Returns for Wage Workers Mchael Gebel and Fredhelm Pfeffer Dscusson Paper No. 07010 Educatonal Expanson and ts Heterogeneous Returns
More informationSectorSpecific Technical Change
SectorSpecfc Techncal Change Susanto Basu, John Fernald, Jonas Fsher, and Mles Kmball 1 November 2013 Abstract: Theory mples that the economy responds dfferently to technology shocks that affect the producton
More informationJournal of International Economics
Journal of Internatonal Economcs 79 (009) 31 41 Contents lsts avalable at ScenceDrect Journal of Internatonal Economcs journal homepage: www.elsever.com/locate/je Composton and growth effects of the current
More informationDoes Demographic Change Affect the Current Account? A Reconsideration #
ISSN 18334474 Does Demographc Change Affect the Current Account? A Reconsderaton # Mchael Graff, a,* Kam K Tang, b, Je Zhang c Ths paper reexamnes the mpact of demographc factors on the current account
More informationDropout: A Simple Way to Prevent Neural Networks from Overfitting
Journal of Machne Learnng Research 15 (2014) 19291958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever
More informationThe Developing World Is Poorer Than We Thought, But No Less Successful in the Fight against Poverty
Publc Dsclosure Authorzed Pol c y Re s e a rc h Wo r k n g Pa p e r 4703 WPS4703 Publc Dsclosure Authorzed Publc Dsclosure Authorzed The Developng World Is Poorer Than We Thought, But No Less Successful
More informationAssessing health efficiency across countries with a twostep and bootstrap analysis *
Assessng health effcency across countres wth a twostep and bootstrap analyss * Antóno Afonso # $ and Mguel St. Aubyn # February 2007 Abstract We estmate a semparametrc model of health producton process
More informationtématické články Measuring the Value of Urban Forest using the Hedonic Price Approach regionální studia
Measurng the Value of Urban Forest usng the Hedonc Prce Approach Odhad hodnoty městských lesů metodou hedoncké ceny Jan Melchar 1 jan.melchar@czp.cun.cz Charles Unversty Envronment Center Ondřej Vojáček
More informationWillingness to Pay for Health Insurance: An Analysis of the Potential Market for New Low Cost Health Insurance Products in Namibia
Wllngness to Pay for Health Insurance: An Analyss of the Potental Market for New Low Cost Health Insurance Products n Namba Abay Asfaw Center for Dsease Control and Preventon\Natonal Insttute for Occupatonal
More informationA Study of the Cosine DistanceBased Mean Shift for Telephone Speech Diarization
TASL046013 1 A Study of the Cosne DstanceBased Mean Shft for Telephone Speech Darzaton Mohammed Senoussaou, Patrck Kenny, Themos Stafylaks and Perre Dumouchel Abstract Speaker clusterng s a crucal
More informationSequential DOE via dynamic programming
IIE Transactons (00) 34, 1087 1100 Sequental DOE va dynamc programmng IRAD BENGAL 1 and MICHAEL CARAMANIS 1 Department of Industral Engneerng, Tel Avv Unversty, Ramat Avv, Tel Avv 69978, Israel Emal:
More informationCREDIT RISK AND EFFICIENCY IN THE EUROPEAN BANKING SYSTEMS: A THREESTAGE ANALYSIS*
CREDIT RISK AD EFFICIECY I THE EUROPEA BAKIG SYSTEMS: A THREESTAGE AALYSIS* José M. Pastor WPEC 998 Correspondenca a: José M. Pastor: Departamento de Análss Económco, Unverstat de Valènca, Campus dels
More informationEstimating income equity in social health insurance system
Centre for Economc and Fnancal Research at New Economc School Aprl 01 Estmatng ncome equty n socal health nsurance system Galna Besstremyannaya Workng Paper No 17 CEFIR /NES Workng Paper seres Estmatng
More informationTemi di discussione. University dropout: The case of Italy. del Servizio Studi. by Federico Cingano and Piero Cipollone
Tem d dscussone del Servzo Stud Unversty dropout: The case of Italy by Federco Cngano and Pero Cpollone Number 626  Aprl 2007 The purpose of the Tem d dscussone seres s to promote the crculaton of workng
More informationThe Effects of Increasing Openness and Integration to the MERCOSUR on the Uruguayan Labour Market: A CGE Modeling Analysis 1.
The Effects of Increasng Openness and Integraton to the MERCOSUR on the Uruguayan Labour Market: A CGE Modelng Analyss 1. María Inés Terra 2, Marsa Buchel 2, Slva Laens 3, Carmen Estrades 2 November 2005
More informationDISCUSSION PAPER. Should Urban Transit Subsidies Be Reduced? Ian W.H. Parry and Kenneth A. Small
DISCUSSION PAPER JULY 2007 RFF DP 0738 Should Urban Transt Subsdes Be Reduced? Ian W.H. Parry and Kenneth A. Small 1616 P St. NW Washngton, DC 20036 2023285000 www.rff.org Should Urban Transt Subsdes
More informationDo Firms Maximize? Evidence from Professional Football
Do Frms Maxmze? Evdence from Professonal Football Davd Romer Unversty of Calforna, Berkeley and Natonal Bureau of Economc Research Ths paper examnes a sngle, narrow decson the choce on fourth down n the
More informationThe Relationship between Exchange Rates and Stock Prices: Studied in a Multivariate Model Desislava Dimitrova, The College of Wooster
Issues n Poltcal Economy, Vol. 4, August 005 The Relatonshp between Exchange Rates and Stock Prces: Studed n a Multvarate Model Desslava Dmtrova, The College of Wooster In the perod November 00 to February
More informationAdverse selection in the annuity market when payoffs vary over the time of retirement
Adverse selecton n the annuty market when payoffs vary over the tme of retrement by JOANN K. BRUNNER AND SUSANNE PEC * July 004 Revsed Verson of Workng Paper 0030, Department of Economcs, Unversty of nz.
More informationDocumentation for the TIMES Model PART I
Energy Technology Systems Analyss Programme http://www.etsap.org/tools.htm Documentaton for the TIMES Model PART I Aprl 2005 Authors: Rchard Loulou Uwe Remne Amt Kanuda Antt Lehtla Gary Goldsten 1 General
More informationWhy Don t We See Poverty Convergence?
Why Don t We See Poverty Convergence? Martn Ravallon 1 Development Research Group, World Bank 1818 H Street NW, Washngton DC, 20433, USA Abstract: We see sgns of convergence n average lvng standards amongst
More informationMultiProduct Price Optimization and Competition under the Nested Logit Model with ProductDifferentiated Price Sensitivities
MultProduct Prce Optmzaton and Competton under the Nested Logt Model wth ProductDfferentated Prce Senstvtes Gullermo Gallego Department of Industral Engneerng and Operatons Research, Columba Unversty,
More informationPhysical Security and Vulnerability Modeling for Infrastructure Facilities
Proceedngs of the 39th Hawa Internatonal Conference on System Scences  2006 Physcal Securty and Vulnerablty Modelng for Infrastructure Facltes Dean A. Jones Chad E. Davs Sanda Natonal Laboratores Albuquerque,
More informationOptimal Call Routing in VoIP
Optmal Call Routng n VoIP Costas Courcoubets Department of Computer Scence Athens Unversty of Economcs and Busness 47A Evelpdon Str Athens 11363, GR Emal: courcou@aueb.gr Costas Kalogros Department of
More informationA Structure for General and Specc Market Rsk Eckhard Platen 1 and Gerhard Stahl Summary. The paper presents a consstent approach to the modelng of general and specc market rsk as dened n regulatory documents.
More informationBoosting as a Regularized Path to a Maximum Margin Classifier
Journal of Machne Learnng Research 5 (2004) 941 973 Submtted 5/03; Revsed 10/03; Publshed 8/04 Boostng as a Regularzed Path to a Maxmum Margn Classfer Saharon Rosset Data Analytcs Research Group IBM T.J.
More informationDistributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundatons and Trends R n Machne Learnng Vol. 3, No. 1 (2010) 1 122 c 2011 S. Boyd, N. Parkh, E. Chu, B. Peleato and J. Ecksten DOI: 10.1561/2200000016 Dstrbuted Optmzaton and Statstcal Learnng va the
More informationThe Global Macroeconomic Costs of Raising Bank Capital Adequacy Requirements
W/1/44 The Global Macroeconomc Costs of Rasng Bank Captal Adequacy Requrements Scott Roger and Francs Vtek 01 Internatonal Monetary Fund W/1/44 IMF Workng aper IMF Offces n Europe Monetary and Captal Markets
More information