On the Similarity Evaluation of Candidates in Ranked Voting Model

Size: px
Start display at page:

Download "On the Similarity Evaluation of Candidates in Ranked Voting Model"

Transcription

1 Asia Pacific Management Review (2005) 10(2) a On the Similarity Evaluation of Candidates in Ranked Voting Model Tsuneshi Obata a* and Hiroaki Ishii b Department of Computer Science and Intelligent Systems Oita University 700 Dannoharu Oita Japan. b Graduate School of Information Science and Technology Osaka University 2-1 Yamada-oka Suita Japan Abstract Accepted in September 2004 Available online In preference voting model it has been shown that the single voting has some issues. So the model that each voter has two or more votes is recommended so far. Ranked voting data arise when voters vote candidates with their ranking of preference. Such data are often processed after summing up the votes in each candidate and each rank. Many methods to order all candidates or to identify the most preferable candidate from these data have been proposed recently. However these data have no information about which candidate tends to be ranked as the second by the voters who ranked a certain candidate as top. The candidates who are ranked highly by the same voter seem to be similarly evaluated for her/him. Therefore if many voters support a pair of candidates we can judge that the pair has high similarity. On this hypothesis we propose a method to estimate the similarity and the configuration of the candidates with multidimensional scaling under ranked voting model. We also propose a model underlying voting behavior and investigate the validity of our estimating method with the model. Further we mention the possibility of the mathematical method that is efficient for the voting model of multiple selections among candidates. Keywords: Multidimensional scaling; Ranked voting model; Similarity of candidates 1. Introduction Preference voting is held in order to select one (or more) candidate/proposal or to order these with ranking. In the literatures about social choice it has been shown that single voting has some irrationality (Saeki 1980). So the model that each voter has two or more votes is recommended so far. However such model causes question how the multiple votes should be aggregated and how the winner(s) should be determined. For ranked voting data that obtained when voters vote candidates with their ranking of preferences methods to determine the winner(s) or to order all candidates have been proposed on the basis of data envelopment analysis (DEA) (Green et al. 1996; Hashimoto 1997; Obata and Ishii 2003). DEA is a nonparametric method to evaluate the efficiency of decisionmaking units (Bellamine et al. 2004; Charnes et al. 1978; Sueyoshi 2003; Wang et al. 2001). In DEA the existence of similar data which means the data placed near in the data space has a great influence on the estimation of the data. In terms of ranked voting model an observed data is a set of the numbers of votes of each rank that a candidate gained. So the similarity of the data not always means the similarity of the characteristic or the political policy of the candidates. In the situation that not only one candidate is chosen in particular it seems to be very important whether selected winners are similar in policy or not. In order to reflect public opinion widely it does not seem desirable that similar candidates hold all seats. To the contrary in selection of the administration of a certain enterprise similar persons may be hoped for smooth management. The above-mentioned methods treat ranked voting data after summing up the votes in each rank. That is these data have no information about which candidate is ranked as second by the voters who rank a specific candidate as top. Then we suspect that the similarity of the candidates could be estimated with this information. If many voters support a pair of candidates we can judge that the pair has high similarity. On this hypothesis we propose a method to estimate the configuration and the similarity of the candidates from ranked voting data (before summing up) by using multidimensional scaling (MDS) (Kruskal 1964a 1964b; Kruskal and Wish 1978; Saito 1980) in Section 2. Research to evaluate similarity between candidates with MDS is introduced in (Kruskal and Wish 1978); however it has applied MDS to not voting data but the data judging similarity itself. * Department of Computer Science and Intelligent Systems Oita University 700 Dannoharu Oita Japan. obata@csis.oita-u.ac.jp 125

2 In Section 3 we have an experiment to investigate the validity of our method. We also propose a model underlying voting behavior which is analogous to the spatial model of voting (Gill and Gainous 2002). Finally in Section 4 we consider the possibility of the method that is efficient for the voting model of multiple selections among candidates by using the similarity. 2. Estimation of the Similarity between the Candidates We consider ranked voting data that is obtained when voters select and rank more than one candidate. It is assumed that there are n voters V 1 V n and m candidates C 1 C m and each voter selects k ( m ) candidates with ranking of them. We denote the index of the candidate who is ranked as j-th place by voter V 1 by i lj i.e. V 1 ranks C ilj as j-th place. Here let k = 2 in particular. That is each voter select a pair of candidates C il1 and C il 2. We denote by s ij the number of voters who placed candidates C i and C j as the top and the second rank respectively i.e. s = # V C = C and C = C ij l i il1 j i l 2 i j = 1 m where # means the number of the element. If s ij is large it means that many voters support candidates C i and C j together and therefore we may judge that they are similar. Accordingly we guess that the matrix S = (s ij ) can be treated as similarity matrix of nonmetric MDS (Kruskal 1964a 1964b; Kruskal and Wish 1978; Saito 1980). MDS is a method to determine the optimal configuration of the stimuli in r-dimensional space from similarity/dissimilarity data between stimuli. However before applying nonmetric MDS some preceding modification is needed i.e. symmetrization and normalization. Symmetrization: Even though nommetric MDS can treat nonsymmetrical data we symmetrize the data in order to simplify our analysis. An element of the matrix is modified to the value of the sum of symmetric elements i.e. sij = s ij + s ji i j = 1 m. Matrix S is modified to a symmetric matrix S = (sij ). This is equivalent to that each voter votes a pair of candidates without ranking. Normalization: If not so many voters support C i and C j even though they are very similar practically s ij (and sij ) may be small. So some normalization by the number of supporters is required. Set sij Figure 1. Three Candidates and a Voter sij = si+ + s j+ sij where si+ = sik is the number of the voters who k ranked candidate C i as the top or the second. The denominator means how many voters rank candidates C i or C j (or both) within the second place. Hereafter we denote S by S again. Then now we can apply nonmetric MDS to the (modified) similarity matrix S. MDS brings coefficients of each candidate in the multidimensional space as a result. We can use the distances between the points that have obtained coefficients as indicators that measure similarity between candidates. Of course it is possible to interpret the political positions of candidates by using coefficients. In addition if we use cluster analysis candidates may be separated into some clusters. When k > 2 we may take the same way as above using only the data about the top and the second rank. However this means that we throw away the information of candidates who are similar but has less preference (such candidates are supposed to be placed near in lower ranking). We propose another way that uses this information. Set k 1 (q) s ij = s ij q=1 i j = 1 m where ( q) s ij = # Vl Ci = Ci and C = C i j = 1 m; q = 1 k 1. lq j i lq + 1 (q) That is s ij means the number of the voters who ranked candidates C i and C j as the q-th and the (q+1)- th place respectively and s ij means the number of the voters who ranked C i and C j adjacently. The remaining processes normalization and MDS can be done as well as the case of k = Model of Voting Behavior and an Experiment In order to investigate the validity of our method we have an experiment. Antecedent to that we propose a model of voting behavior analogous (but slightly different) to the spatial model of voting (Gill and Gainous 2002). 126

3 They (and we also) assume that all voters are placed in a certain space. While they suppose that each voter has her/his own metric function i.e. people have various sense of distance we suppose they have the same Euclidean metric. We consider the model that satisfies 1. each candidate is placed in r-dimensional Euclidian space of their characteristic and political policy; 2. each voter has an ideal (virtual) candidate placed in the same space; 3. each voter prefers the candidate who is closer to his ideal candidate; 4. each voter votes ranked voting in the order of preference. Hereafter we also call the voter s ideal candidate simply the voter because these can be identified. For example if three candidates C 1 C 2 C 3 and a voter V 1 lie in 2-dimensional space as Figure 1 shows the voter V 1 votes C 2 C 1 and C 3 in that order. According to this model we have the following experiment to simulate a probable situation. In the experiment we imagine that 5. candidates and voters are placed in 2-dimensional space (i.e. r = 2); 6. every candidate and voter belongs to any one of four groups (parties); 7. one of these four groups spreads over the space (it means noncomitted people). [Experiment] Figure 2. Generated Random Voters Figure 3. Resulted Configuration (k = 2) Step 1: Generate m candidates in an appropriate way. Step 2: Generate n p voters as random vectors from 2- dimensional normal distributions N(µ p Σ p ) p = where n = n 1 + n 2 + n 3 + n 4. Step 3: Calculate the distances from each voter to each candidate and determine the order of candidates whom each voter votes. Step 4: Analyze the configuration and the distances of candidates from the ranked voting data with the method proposed in the previous section. Here we use m = 10 n = 1000 n 1 = 300 n 2 = 200 n 3 = 100 n 4 = 400 µ 1 = ( 11) T Σ 1 = diag(11) µ 2 = (20) T Σ 2 = diag(11) µ 3 = (0 3) T Σ 3 = diag(0.50.5) µ 4 = (00) T Σ 4 = diag(33) so groups are three parties and group 4 is noncomitted people. And we placed candidates near centers of each party; c 1 = ( 11) T c 2 = ( ) T and c 3 = ( 10.6) T are supposed to be affiliated by group 1; c 4 = (20) T and c 5 = (2 0.5) T are by group 2; c 6 = (0 3) T is by group 3; and c 7 = (00) T c 8 = (22) T c 9 = ( ) T and c 10 = ( 1 1) T are supposed to be independent (see Fig- Best Median Worst Best Median Worst Table 1. Best Median and Worst Values of r 2 k = k =

4 ure 2) where c i denotes coordinates of the point of candidate C i. In order to compare the original configuration and obtained configuration we rotate scale translate and invert obtained configuration to fit the original. In concrete we minimize 10 r 2 = min 1 c i c i 2 10 i=1 where c i is the transformed point of candidate C i of obtained configuration. If the value of r 2 is small we can judge that our method reconstruct the original configuration well. Under these conditions we have 30 trials. Table 1 shows the best median (small to the 15 th ) and worst values of r 2 of 30 trials. This shows that our method can reconstruct the original configuration best when k = 5. Figures 3 and 4 show obtained actual configurations of the cases of k = 2 and 5. They show plots of the original configuration (upper left) the best case (upper right) the median case (lower left) and the worst case (lower right). According to these plots our method seems to be able to restore the original configuration very well when k = 5 and roughly even when k = Conclusions In this paper we have proposed a method to evaluate the similarity and the configuration of the candidates under ranked voting model. According to our experiment the method seems to be able to restore the original configuration roughly. However this experiment is based on an artificial model so more practical experiment is needed. Our goal is not to estimate the similarity between the candidates but to use it to select suitable candidates. It is very important whether selected candidates are similar or Figure 4. Resulted Configuration (k = 5) 128

5 not in the situation that more than one candidate wins the election. In order to reflect the will of the people widely various candidates should be selected. So the similarity estimated by our method is useful to avoid selecting too similar candidates. In the further research we would like to investigate how to use the similarity in the concrete. For example we can use distances between candidates obtained from the resulted configuration as output or input parameters in the course of evaluation of scores with DEA. References Bellamine I. Morita H. and Ishii H. (2004). Performance analysis of linear regression systems subject to inefficiency. Asia Pacific Management Review 9(3) Gill Charnes A. Cooper W.W. and Rhodes E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research J. and Gainous J. (2002). Why does voting get so complicated? A review of theories for analyzing democratic participation. Statistical Science Green R.H. Doyle J.R. and Cook; W.D. (1996). Preference voting and project ranking using DEA and cross-evaluation. European Journal of Operational Research Hashimoto A. (1997). A ranked voting system using a DEA/AR exclusion model: A note. European Journal of Operational Research Kruskal J.B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika (1964b). Nonmetric multidimensional scaling: A numerical method. Psychometrika and Wish M. (1978). Multidimensional Scaling. Beverly Hills: SAGE Publications. Obata T. and Ishii H. (2003). A method for discriminating efficient candidates with ranked voting data. European Journal of Operational Research Saeki Y. (1980). Kimekata no ronri. Tokyo: Tokyo Daigaku Shuppankai (in Japanese). Saito T. (1980). Tajigen shakudo kouseihou. Tokyo: Asakura Shoten (in Japanese). Sueyoshi T. (2003). DEA Implications of Congestion. Asia Pacific Management Review 8(1) Wang K.L. Weng C.C. and Chang M.L. (2001). A study of technical efficiency of travel agencies in Taiwan. Asia Pacific Management Review 6(1)

ANALYTIC HIERARCHY PROCESS AS A RANKING TOOL FOR DECISION MAKING UNITS

ANALYTIC HIERARCHY PROCESS AS A RANKING TOOL FOR DECISION MAKING UNITS ISAHP Article: Jablonsy/Analytic Hierarchy as a Raning Tool for Decision Maing Units. 204, Washington D.C., U.S.A. ANALYTIC HIERARCHY PROCESS AS A RANKING TOOL FOR DECISION MAKING UNITS Josef Jablonsy

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES

STATISTICAL SIGNIFICANCE OF RANKING PARADOXES STATISTICAL SIGNIFICANCE OF RANKING PARADOXES Anna E. Bargagliotti and Raymond N. Greenwell Department of Mathematical Sciences and Department of Mathematics University of Memphis and Hofstra University

More information

Am I Decisive? Handout for Government 317, Cornell University, Fall 2003. Walter Mebane

Am I Decisive? Handout for Government 317, Cornell University, Fall 2003. Walter Mebane Am I Decisive? Handout for Government 317, Cornell University, Fall 2003 Walter Mebane I compute the probability that one s vote is decisive in a maority-rule election between two candidates. Here, a decisive

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean

More information

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market

Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Technical Efficiency Accounting for Environmental Influence in the Japanese Gas Market Sumiko Asai Otsuma Women s University 2-7-1, Karakida, Tama City, Tokyo, 26-854, Japan asai@otsuma.ac.jp Abstract:

More information

Chapter 1 Introduction. 1.1 Introduction

Chapter 1 Introduction. 1.1 Introduction Chapter 1 Introduction 1.1 Introduction 1 1.2 What Is a Monte Carlo Study? 2 1.2.1 Simulating the Rolling of Two Dice 2 1.3 Why Is Monte Carlo Simulation Often Necessary? 4 1.4 What Are Some Typical Situations

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Principle Component Analysis: A statistical technique used to examine the interrelations among a set of variables in order to identify the underlying structure of those variables.

More information

Self Organizing Maps: Fundamentals

Self Organizing Maps: Fundamentals Self Organizing Maps: Fundamentals Introduction to Neural Networks : Lecture 16 John A. Bullinaria, 2004 1. What is a Self Organizing Map? 2. Topographic Maps 3. Setting up a Self Organizing Map 4. Kohonen

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Lecture 2: Homogeneous Coordinates, Lines and Conics

Lecture 2: Homogeneous Coordinates, Lines and Conics Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4

More information

Means, standard deviations and. and standard errors

Means, standard deviations and. and standard errors CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

Estimation of Unknown Comparisons in Incomplete AHP and It s Compensation

Estimation of Unknown Comparisons in Incomplete AHP and It s Compensation Estimation of Unknown Comparisons in Incomplete AHP and It s Compensation ISSN 0386-1678 Report of the Research Institute of Industrial Technology, Nihon University Number 77, 2005 Estimation of Unknown

More information

A simplified implementation of the least squares solution for pairwise comparisons matrices

A simplified implementation of the least squares solution for pairwise comparisons matrices A simplified implementation of the least squares solution for pairwise comparisons matrices Marcin Anholcer Poznań University of Economics Al. Niepodleg lości 10, 61-875 Poznań, Poland V. Babiy McMaster

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

INTERACTIVE DATA EXPLORATION USING MDS MAPPING

INTERACTIVE DATA EXPLORATION USING MDS MAPPING INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2

Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

On Marginal Effects in Semiparametric Censored Regression Models

On Marginal Effects in Semiparametric Censored Regression Models On Marginal Effects in Semiparametric Censored Regression Models Bo E. Honoré September 3, 2008 Introduction It is often argued that estimation of semiparametric censored regression models such as the

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Market Segmentation: A Strategic Management Tool

Market Segmentation: A Strategic Management Tool RICHARD M. JOHNSON* In the past, marketing research has largely been restricted to tactical questions. However, with the advent of new techniques, marketing research can contribute directly to the development

More information

AN EVALUATION OF FACTORY PERFORMANCE UTILIZED KPI/KAI WITH DATA ENVELOPMENT ANALYSIS

AN EVALUATION OF FACTORY PERFORMANCE UTILIZED KPI/KAI WITH DATA ENVELOPMENT ANALYSIS Journal of the Operations Research Society of Japan 2009, Vol. 52, No. 2, 204-220 AN EVALUATION OF FACTORY PERFORMANCE UTILIZED KPI/KAI WITH DATA ENVELOPMENT ANALYSIS Koichi Murata Hiroshi Katayama Waseda

More information

Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices

Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices Hervé Abdi 1 1 Overview Metric multidimensional scaling (MDS) transforms a distance matrix into a set of coordinates such that the (Euclidean)

More information

The Gravity Model: Derivation and Calibration

The Gravity Model: Derivation and Calibration The Gravity Model: Derivation and Calibration Philip A. Viton October 28, 2014 Philip A. Viton CRP/CE 5700 () Gravity Model October 28, 2014 1 / 66 Introduction We turn now to the Gravity Model of trip

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL

ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic

A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia

More information

Tiers, Preference Similarity, and the Limits on Stable Partners

Tiers, Preference Similarity, and the Limits on Stable Partners Tiers, Preference Similarity, and the Limits on Stable Partners KANDORI, Michihiro, KOJIMA, Fuhito, and YASUDA, Yosuke February 7, 2010 Preliminary and incomplete. Do not circulate. Abstract We consider

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information

Aims and structure of Japan Statistical Society Certificate

Aims and structure of Japan Statistical Society Certificate Aims and structure of Japan Statistical Society Certificate Akimichi Takemura University of Tokyo November 3, 2012 KSS meeting 1 Certification of statistical skills : JSSC Examination Japan Statistical

More information

Object Recognition and Template Matching

Object Recognition and Template Matching Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

More information

6. Vectors. 1 2009-2016 Scott Surgent (surgent@asu.edu)

6. Vectors. 1 2009-2016 Scott Surgent (surgent@asu.edu) 6. Vectors For purposes of applications in calculus and physics, a vector has both a direction and a magnitude (length), and is usually represented as an arrow. The start of the arrow is the vector s foot,

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

How to do AHP analysis in Excel

How to do AHP analysis in Excel How to do AHP analysis in Excel Khwanruthai BUNRUAMKAEW (D) Division of Spatial Information Science Graduate School of Life and Environmental Sciences University of Tsukuba ( March 1 st, 01) The Analytical

More information

VALIDITY EXAMINATION OF EFQM S RESULTS BY DEA MODELS

VALIDITY EXAMINATION OF EFQM S RESULTS BY DEA MODELS VALIDITY EXAMINATION OF EFQM S RESULTS BY DEA MODELS Madjid Zerafat Angiz LANGROUDI University Sains Malaysia (USM), Mathematical Group Penang, Malaysia E-mail: mzarafat@yahoo.com Gholamreza JANDAGHI,

More information

A note on companion matrices

A note on companion matrices Linear Algebra and its Applications 372 (2003) 325 33 www.elsevier.com/locate/laa A note on companion matrices Miroslav Fiedler Academy of Sciences of the Czech Republic Institute of Computer Science Pod

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Virtual Landmarks for the Internet

Virtual Landmarks for the Internet Virtual Landmarks for the Internet Liying Tang Mark Crovella Boston University Computer Science Internet Distance Matters! Useful for configuring Content delivery networks Peer to peer applications Multiuser

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

Airport Planning and Design. Excel Solver

Airport Planning and Design. Excel Solver Airport Planning and Design Excel Solver Dr. Antonio A. Trani Professor of Civil and Environmental Engineering Virginia Polytechnic Institute and State University Blacksburg, Virginia Spring 2012 1 of

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

A Review of Statistical Outlier Methods

A Review of Statistical Outlier Methods Page 1 of 5 A Review of Statistical Outlier Methods Nov 2, 2006 By: Steven Walfish Pharmaceutical Technology Statistical outlier detection has become a popular topic as a result of the US Food and Drug

More information

ANALYSIS OF TREND CHAPTER 5

ANALYSIS OF TREND CHAPTER 5 ANALYSIS OF TREND CHAPTER 5 ERSH 8310 Lecture 7 September 13, 2007 Today s Class Analysis of trends Using contrasts to do something a bit more practical. Linear trends. Quadratic trends. Trends in SPSS.

More information

Least-Squares Intersection of Lines

Least-Squares Intersection of Lines Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC Teacher s Guide Getting Started Nathan N. Alexander Charlotte, NC Purpose In this two-day lesson, students determine their best-matched college. They use decision-making strategies based on their preferences

More information

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables. FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

More information

SGL: Stata graph library for network analysis

SGL: Stata graph library for network analysis SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Symmetry of Nonparametric Statistical Tests on Three Samples

Symmetry of Nonparametric Statistical Tests on Three Samples Symmetry of Nonparametric Statistical Tests on Three Samples Anna E. Bargagliotti Donald G. Saari Department of Mathematical Sciences Institute for Math. Behavioral Sciences University of Memphis University

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

FACTOR ANALYSIS NASC

FACTOR ANALYSIS NASC FACTOR ANALYSIS NASC Factor Analysis A data reduction technique designed to represent a wide range of attributes on a smaller number of dimensions. Aim is to identify groups of variables which are relatively

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

How To Identify Noisy Variables In A Cluster

How To Identify Noisy Variables In A Cluster Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,

More information

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

A New Method of Estimating Locality of Industry Cluster Regions Using Large-scale Business Transaction Data

A New Method of Estimating Locality of Industry Cluster Regions Using Large-scale Business Transaction Data 347-Paper A New Method of Estimating Locality of Industry Cluster Regions Using Large-scale Business Transaction Data Yuki Akeyama and Yuki Akiyama and Ryosuke Shibasaki Abstract In an industry cluster

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Research on information propagation analyzing odds in horse racing

Research on information propagation analyzing odds in horse racing Challenges for Analysis of the Economy, the Businesses, and Social Progress Péter Kovács, Katalin Szép, Tamás Katona (editors) - Reviewed Articles Research on information propagation analyzing odds in

More information

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis

A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey

More information

Clustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence

Clustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence Clustering Time Series Based on Forecast Distributions Using Kullback-Leibler Divergence Taiyeong Lee, Yongqiao Xiao, Xiangxiang Meng, David Duling SAS Institute, Inc 100 SAS Campus Dr. Cary, NC 27513,

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic

More information

Introduction to Principal Component Analysis: Stock Market Values

Introduction to Principal Component Analysis: Stock Market Values Chapter 10 Introduction to Principal Component Analysis: Stock Market Values The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems

Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Decision Support System Methodology Using a Visual Approach for Cluster Analysis Problems Ran M. Bittmann School of Business Administration Ph.D. Thesis Submitted to the Senate of Bar-Ilan University Ramat-Gan,

More information

Math 202-0 Quizzes Winter 2009

Math 202-0 Quizzes Winter 2009 Quiz : Basic Probability Ten Scrabble tiles are placed in a bag Four of the tiles have the letter printed on them, and there are two tiles each with the letters B, C and D on them (a) Suppose one tile

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information