On the Similarity Evaluation of Candidates in Ranked Voting Model

Transcription

1 Asia Pacific Management Review (2005) 10(2) a On the Similarity Evaluation of Candidates in Ranked Voting Model Tsuneshi Obata a* and Hiroaki Ishii b Department of Computer Science and Intelligent Systems Oita University 700 Dannoharu Oita Japan. b Graduate School of Information Science and Technology Osaka University 2-1 Yamada-oka Suita Japan Abstract Accepted in September 2004 Available online In preference voting model it has been shown that the single voting has some issues. So the model that each voter has two or more votes is recommended so far. Ranked voting data arise when voters vote candidates with their ranking of preference. Such data are often processed after summing up the votes in each candidate and each rank. Many methods to order all candidates or to identify the most preferable candidate from these data have been proposed recently. However these data have no information about which candidate tends to be ranked as the second by the voters who ranked a certain candidate as top. The candidates who are ranked highly by the same voter seem to be similarly evaluated for her/him. Therefore if many voters support a pair of candidates we can judge that the pair has high similarity. On this hypothesis we propose a method to estimate the similarity and the configuration of the candidates with multidimensional scaling under ranked voting model. We also propose a model underlying voting behavior and investigate the validity of our estimating method with the model. Further we mention the possibility of the mathematical method that is efficient for the voting model of multiple selections among candidates. Keywords: Multidimensional scaling; Ranked voting model; Similarity of candidates 1. Introduction Preference voting is held in order to select one (or more) candidate/proposal or to order these with ranking. In the literatures about social choice it has been shown that single voting has some irrationality (Saeki 1980). So the model that each voter has two or more votes is recommended so far. However such model causes question how the multiple votes should be aggregated and how the winner(s) should be determined. For ranked voting data that obtained when voters vote candidates with their ranking of preferences methods to determine the winner(s) or to order all candidates have been proposed on the basis of data envelopment analysis (DEA) (Green et al. 1996; Hashimoto 1997; Obata and Ishii 2003). DEA is a nonparametric method to evaluate the efficiency of decisionmaking units (Bellamine et al. 2004; Charnes et al. 1978; Sueyoshi 2003; Wang et al. 2001). In DEA the existence of similar data which means the data placed near in the data space has a great influence on the estimation of the data. In terms of ranked voting model an observed data is a set of the numbers of votes of each rank that a candidate gained. So the similarity of the data not always means the similarity of the characteristic or the political policy of the candidates. In the situation that not only one candidate is chosen in particular it seems to be very important whether selected winners are similar in policy or not. In order to reflect public opinion widely it does not seem desirable that similar candidates hold all seats. To the contrary in selection of the administration of a certain enterprise similar persons may be hoped for smooth management. The above-mentioned methods treat ranked voting data after summing up the votes in each rank. That is these data have no information about which candidate is ranked as second by the voters who rank a specific candidate as top. Then we suspect that the similarity of the candidates could be estimated with this information. If many voters support a pair of candidates we can judge that the pair has high similarity. On this hypothesis we propose a method to estimate the configuration and the similarity of the candidates from ranked voting data (before summing up) by using multidimensional scaling (MDS) (Kruskal 1964a 1964b; Kruskal and Wish 1978; Saito 1980) in Section 2. Research to evaluate similarity between candidates with MDS is introduced in (Kruskal and Wish 1978); however it has applied MDS to not voting data but the data judging similarity itself. * Department of Computer Science and Intelligent Systems Oita University 700 Dannoharu Oita Japan. obata@csis.oita-u.ac.jp 125

2 In Section 3 we have an experiment to investigate the validity of our method. We also propose a model underlying voting behavior which is analogous to the spatial model of voting (Gill and Gainous 2002). Finally in Section 4 we consider the possibility of the method that is efficient for the voting model of multiple selections among candidates by using the similarity. 2. Estimation of the Similarity between the Candidates We consider ranked voting data that is obtained when voters select and rank more than one candidate. It is assumed that there are n voters V 1 V n and m candidates C 1 C m and each voter selects k ( m ) candidates with ranking of them. We denote the index of the candidate who is ranked as j-th place by voter V 1 by i lj i.e. V 1 ranks C ilj as j-th place. Here let k = 2 in particular. That is each voter select a pair of candidates C il1 and C il 2. We denote by s ij the number of voters who placed candidates C i and C j as the top and the second rank respectively i.e. s = # V C = C and C = C ij l i il1 j i l 2 i j = 1 m where # means the number of the element. If s ij is large it means that many voters support candidates C i and C j together and therefore we may judge that they are similar. Accordingly we guess that the matrix S = (s ij ) can be treated as similarity matrix of nonmetric MDS (Kruskal 1964a 1964b; Kruskal and Wish 1978; Saito 1980). MDS is a method to determine the optimal configuration of the stimuli in r-dimensional space from similarity/dissimilarity data between stimuli. However before applying nonmetric MDS some preceding modification is needed i.e. symmetrization and normalization. Symmetrization: Even though nommetric MDS can treat nonsymmetrical data we symmetrize the data in order to simplify our analysis. An element of the matrix is modified to the value of the sum of symmetric elements i.e. sij = s ij + s ji i j = 1 m. Matrix S is modified to a symmetric matrix S = (sij ). This is equivalent to that each voter votes a pair of candidates without ranking. Normalization: If not so many voters support C i and C j even though they are very similar practically s ij (and sij ) may be small. So some normalization by the number of supporters is required. Set sij Figure 1. Three Candidates and a Voter sij = si+ + s j+ sij where si+ = sik is the number of the voters who k ranked candidate C i as the top or the second. The denominator means how many voters rank candidates C i or C j (or both) within the second place. Hereafter we denote S by S again. Then now we can apply nonmetric MDS to the (modified) similarity matrix S. MDS brings coefficients of each candidate in the multidimensional space as a result. We can use the distances between the points that have obtained coefficients as indicators that measure similarity between candidates. Of course it is possible to interpret the political positions of candidates by using coefficients. In addition if we use cluster analysis candidates may be separated into some clusters. When k > 2 we may take the same way as above using only the data about the top and the second rank. However this means that we throw away the information of candidates who are similar but has less preference (such candidates are supposed to be placed near in lower ranking). We propose another way that uses this information. Set k 1 (q) s ij = s ij q=1 i j = 1 m where ( q) s ij = # Vl Ci = Ci and C = C i j = 1 m; q = 1 k 1. lq j i lq + 1 (q) That is s ij means the number of the voters who ranked candidates C i and C j as the q-th and the (q+1)- th place respectively and s ij means the number of the voters who ranked C i and C j adjacently. The remaining processes normalization and MDS can be done as well as the case of k = Model of Voting Behavior and an Experiment In order to investigate the validity of our method we have an experiment. Antecedent to that we propose a model of voting behavior analogous (but slightly different) to the spatial model of voting (Gill and Gainous 2002). 126

3 They (and we also) assume that all voters are placed in a certain space. While they suppose that each voter has her/his own metric function i.e. people have various sense of distance we suppose they have the same Euclidean metric. We consider the model that satisfies 1. each candidate is placed in r-dimensional Euclidian space of their characteristic and political policy; 2. each voter has an ideal (virtual) candidate placed in the same space; 3. each voter prefers the candidate who is closer to his ideal candidate; 4. each voter votes ranked voting in the order of preference. Hereafter we also call the voter s ideal candidate simply the voter because these can be identified. For example if three candidates C 1 C 2 C 3 and a voter V 1 lie in 2-dimensional space as Figure 1 shows the voter V 1 votes C 2 C 1 and C 3 in that order. According to this model we have the following experiment to simulate a probable situation. In the experiment we imagine that 5. candidates and voters are placed in 2-dimensional space (i.e. r = 2); 6. every candidate and voter belongs to any one of four groups (parties); 7. one of these four groups spreads over the space (it means noncomitted people). [Experiment] Figure 2. Generated Random Voters Figure 3. Resulted Configuration (k = 2) Step 1: Generate m candidates in an appropriate way. Step 2: Generate n p voters as random vectors from 2- dimensional normal distributions N(µ p Σ p ) p = where n = n 1 + n 2 + n 3 + n 4. Step 3: Calculate the distances from each voter to each candidate and determine the order of candidates whom each voter votes. Step 4: Analyze the configuration and the distances of candidates from the ranked voting data with the method proposed in the previous section. Here we use m = 10 n = 1000 n 1 = 300 n 2 = 200 n 3 = 100 n 4 = 400 µ 1 = ( 11) T Σ 1 = diag(11) µ 2 = (20) T Σ 2 = diag(11) µ 3 = (0 3) T Σ 3 = diag(0.50.5) µ 4 = (00) T Σ 4 = diag(33) so groups are three parties and group 4 is noncomitted people. And we placed candidates near centers of each party; c 1 = ( 11) T c 2 = ( ) T and c 3 = ( 10.6) T are supposed to be affiliated by group 1; c 4 = (20) T and c 5 = (2 0.5) T are by group 2; c 6 = (0 3) T is by group 3; and c 7 = (00) T c 8 = (22) T c 9 = ( ) T and c 10 = ( 1 1) T are supposed to be independent (see Fig- Best Median Worst Best Median Worst Table 1. Best Median and Worst Values of r 2 k = k =

4 ure 2) where c i denotes coordinates of the point of candidate C i. In order to compare the original configuration and obtained configuration we rotate scale translate and invert obtained configuration to fit the original. In concrete we minimize 10 r 2 = min 1 c i c i 2 10 i=1 where c i is the transformed point of candidate C i of obtained configuration. If the value of r 2 is small we can judge that our method reconstruct the original configuration well. Under these conditions we have 30 trials. Table 1 shows the best median (small to the 15 th ) and worst values of r 2 of 30 trials. This shows that our method can reconstruct the original configuration best when k = 5. Figures 3 and 4 show obtained actual configurations of the cases of k = 2 and 5. They show plots of the original configuration (upper left) the best case (upper right) the median case (lower left) and the worst case (lower right). According to these plots our method seems to be able to restore the original configuration very well when k = 5 and roughly even when k = Conclusions In this paper we have proposed a method to evaluate the similarity and the configuration of the candidates under ranked voting model. According to our experiment the method seems to be able to restore the original configuration roughly. However this experiment is based on an artificial model so more practical experiment is needed. Our goal is not to estimate the similarity between the candidates but to use it to select suitable candidates. It is very important whether selected candidates are similar or Figure 4. Resulted Configuration (k = 5) 128

5 not in the situation that more than one candidate wins the election. In order to reflect the will of the people widely various candidates should be selected. So the similarity estimated by our method is useful to avoid selecting too similar candidates. In the further research we would like to investigate how to use the similarity in the concrete. For example we can use distances between candidates obtained from the resulted configuration as output or input parameters in the course of evaluation of scores with DEA. References Bellamine I. Morita H. and Ishii H. (2004). Performance analysis of linear regression systems subject to inefficiency. Asia Pacific Management Review 9(3) Gill Charnes A. Cooper W.W. and Rhodes E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research J. and Gainous J. (2002). Why does voting get so complicated? A review of theories for analyzing democratic participation. Statistical Science Green R.H. Doyle J.R. and Cook; W.D. (1996). Preference voting and project ranking using DEA and cross-evaluation. European Journal of Operational Research Hashimoto A. (1997). A ranked voting system using a DEA/AR exclusion model: A note. European Journal of Operational Research Kruskal J.B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika (1964b). Nonmetric multidimensional scaling: A numerical method. Psychometrika and Wish M. (1978). Multidimensional Scaling. Beverly Hills: SAGE Publications. Obata T. and Ishii H. (2003). A method for discriminating efficient candidates with ranked voting data. European Journal of Operational Research Saeki Y. (1980). Kimekata no ronri. Tokyo: Tokyo Daigaku Shuppankai (in Japanese). Saito T. (1980). Tajigen shakudo kouseihou. Tokyo: Asakura Shoten (in Japanese). Sueyoshi T. (2003). DEA Implications of Congestion. Asia Pacific Management Review 8(1) Wang K.L. Weng C.C. and Chang M.L. (2001). A study of technical efficiency of travel agencies in Taiwan. Asia Pacific Management Review 6(1)