Homophily in Online Social Networks

Size: px
Start display at page:

Download "Homophily in Online Social Networks"

Transcription

1 Homophily in Online Social Networks Bassel Tarbush and Alexander Teytelboym Department of Economics, University of Oxford Department of Economics, University of Oxford Abstract. We develop a parsimonious and tractable dynamic social network formation model in which agents interact in overlapping social groups. The model allows us to analyse network properties and homophily patterns simultaneously. We derive analytical expressions for the distributions of degree and, importantly, of homophily indices, using mean-field approximations. We test our model using a large dataset from Facebook covering student friendship networks in 0 American colleges in 00. We find that our analytical expressions and simulations fit the homophily patterns, degree distribution, and individual clustering coefficients well with the data. Introduction Friendships are an essential part of economic life and social networks affect many areas of public policy. In many social network formation models in the economics literature agents are anonymous and the network structure depends entirely on the formation process. Yet we can think of numerous examples, such as information transmission, peer-to-peer lending, or sexual contacts, which suggest that the network topology is not only explained by the network formation process, but also by node characteristics. We develop a dynamic network formation model that uses information on node characteristics to explain friendship patterns in online social networks and we test it against the data on Facebook networks in American colleges. In our model, agents spend time interacting with others across various social categories, such as attending lectures and spending time in their dorm. Naturally, the time allocation could be established institutionally by timetables or geographical proximity. The time allocation determines who agents are likely to meet and with whom they document their resulting friendship on Facebook. Our parsimonious model has only three parameters and is simple enough to allow us to derive analytic solutions for structural properties of the network. Conceptually, the model is related to affiliation networks introduced by []. However, these models typically contain a large number of parameters and most, such as [,,] rely entirely on simulations. A particular focus of this paper is homophily the tendency of individuals to associate with those similar to themselves which has been well documented in sociology []. [] make it clear that the observed racial homophily patterns in American high schools do not necessarily arise from an exogenous bias in preferences towards people of the same race. In our model, we do not assume that agents have any preference bias.

2 Rather the entire process is governed by the allocation of time and by the relative size of the social groups in which agents interact. Homophily therefore emerges purely from the correlations in agents likelihood of interaction in similar social groups. The empirical part of this paper provides striking support for our model. Using the analytical expressions, we find the best-fitting parameter values, which determine the allocation of time across social categories, for ten separate Facebook networks. Students friendships reveal that they spend more time socialising in class than in their dorms. Despite its parsimony, the model closely matches the empirical degree and homophily distributions in gender and year at the best-fitting parameter values. Remarkably, the simulations run at these values show that the individual clustering distributions also match the empirical clustering patterns. Model. Characteristics of agents Let K = [K 0,...,K R ] be a finite ordered list of social categories. An element K r is the r th category and k K r is a characteristic within that category. Let R = {0,,...,R}. Every agent i N is represented by a vector k i = (ki 0,...,kR i ) of characteristics, where for each r R, ki r K r. For any pair i, j N, let ki 0 = k 0 j. For each r R, define a social group γi r = { j N ki r = kr j }\{i}, which is the set of all agents (other than i) that share the characteristic ki r within the social category r with i. Note that γi 0 = N\{i}. Finally, for each non-empty subset of social category indices S R, define π i (S) = r Sγ r i \ r R\(S {0}) γ r i, () which induces a partition Π i = {π i (S) S R,S /0} on N\{i}. Therefore, π i (S) is the set of agents (other than i) that share only the characteristics within the set of categories indexed by S with i. Example. In a university context, we could have K = [K 0,K,K,K,K ] = [student,class,dorm,gender,year o f graduation]. All agents are students (ki 0 = k 0 j for all i, j N). K K, which represents class, can include k {maths, literature, biology}. Suppose, that agent i is represented by a vector k i = (student,maths,campus, f emale,00). Let us consider S = {,}. γi is the set of all maths students other than i and γi is the set of all female students other than i. Therefore, π i (S) is the set of female maths students, who do not live on campus and are of a different graduating year than i. π i ({0}) would be the set of all male nonmathematicians, who do not live on campus and are of a different graduating year than i. Π i represents the partition into disjoint sets of students, who share exactly,,, or social categories with i. This does not restrict the characteristics space in any way. The zeroth category, which greatly simplifies notation, is one in which all agents share the same characteristic. Note that π i (S) = π i (S {0}) for all non-empty S R. Furthermore, since γ r i = π {πi (S) r S}π, a social group is a union of disjoint partition elements.

3 . Network formation process We model our network as a simple undirected graph with a finite set of nodes N (which represent agents), a finite set of edges (which represent friendships), and no self-loops. The degree of an agent is the number of the agent s friends. At time period t = 0 all agents are active and have no friends. Let q = (q 0,...,q R ) and r R q r =. In each period t {,,...}, an active agent interacts with agents in the social group γi r with probability q r 0. We can thus interpret q r as the proportion of time in period t that agent i spends with agents in the social group γi r (one can think of γi 0 = N\{i} as the social group that i interacts with during i s free time ). During the interaction in a social group, the agent is linked uniformly at random to another active agent in that group with whom the agent is not yet a friend. If the agent is already linked to every other active agent in that social group, the agent makes no friends in that period. Friendships are always reciprocal, so all links are undirected. Finally, in every period, an agent remains active with a given probability p (0,) until the following period and becomes inactive with probability p. If the agent i becomes inactive, i retains all friendships, but can no longer form any links with other agents in all subsequent periods. There must be reasons, other than having linked with every user in the network, for why people stop adding new friends online: losing interest, finding an alternative online social network, reaching a cognitive capacity for social interaction, and so on. Including all these explanations would require a much richer model, so we simply capture them as a random process with the inactivity probability p. We are interested in how the agents degrees change over time. Let us call d i (t) the expected degree of agent i in period t. We analyse a mean-field approximation to this dynamic system. This technique is commonly used in statistical mechanics in order to simplify many-body systems. Essentially, it assumes that the realisation of any random variable in any time period is its expected value. Hence, we chose to approximate our model by a discrete-time system, which changes deterministically at the rate proportional to the expected change (see [,]). The probability with which agent i interacts with an agent from π i (S) is given by [ ] q π i(s) = π i (S) r S {0} q r γ r i. () Indeed, with probability q r, an agent is assigned to social group γi r, and the probability that he meets an agent in π i (S) γi r is given by π i(s) γi r. Note that π Πi q π =. For every π Π i, let R π (t) be the number of remaining active agents in π at t (other than i) with whom i is not yet linked. Furthermore, recall that an agent makes a link in every period and on average receives a link with probability R π (t) from each of the Rπ (t) agents (in each π weighted by q π ). Since i interacts with agents in π with probability q π, i makes q π links with agents in π in every period until T π the expected number of periods for i to form links with every agent in π. We find T π by solving R π (t + ) = p[r π (t) q π ]. () This difference equation states that R π (t + ) is the number of agents who remain active in π out of R π (t) less the number of agents that i links with in π at t. Solving for

4 R π (t) with initial condition R π (0) = π and setting R π (T π ) = 0 gives us ( ) q ln π p T π q = π p+( p) π (except if q π = 0 then T π = 0). () ln(p) This allows us to obtain the expected degree of agent i at time t d i (t) = π Π i d π i (t) = π Π i q π [t(t T π ) + T π (t > T π )], () where di π(t) is the expected number of link i has with agents in π Π i in period t. Note that d i (t) is concave, piecewise linear, and strictly increasing in the range [0,max π Πi {T π }]. Hence, active agents make friends at a decreasing rate over time. Since an agent remains active exactly x periods with probability p x ( p), we have that Pr(t x) = t=x t=0 pt ( p) = p x+. Therefore, the probability that node i has degree at most d is given by G i (d) Pr(d i (t) d) = Pr(t t i (d)) = p ti(d)+, where t i (d) d i (d) = d π Π i q π T π (d > d i (T π )) π Πi q π (d d i (T π )) Finally, the overall average degree distribution is G(d) = N i N G i (d).. Homophily. () Homophily captures the tendency of agents to form links with those similar to themselves. Let Πi r = {π i (S) Π i r S} be the set of partition elements containing agents that share the characteristic ki r in category r with i. The individual homophily index in social category r of agent i in period t is defined as H r i (t) = number of friends of i at t that share kr i number of friends of i at t = π Πi r di π(t). () d i (t) This is a standard definition from which we can easily recover various other definitions of homophily given in []. Finally, it will be useful to define a composition function h r i (d) (Hr i t i )(d), which expresses individual homophily as a function of degree rather than as a function of time.. Test of the mean-field approximation Since we used a mean-field method to derive the analytical expressions, we must test the accuracy of its approximations against simulations []. We did this for degree distributions and the individual homophily distribution against an average of 00 runs of the simulation for multiple parameter values. In general, the fits were good. An example is illustrated in Fig.. There is some loss of accuracy at extreme values of the cumulative distribution of the individual homophily index: () makes it clear that the individual homophily index is unlikely to be near 0 or. Yet the mean-field approximation of the average is good.

5 Best fit time allocation! q 0 q q Average of the cumulative degree distribution! Fig.. Results for all colleges Average individual clustering coefficient! Average individual homophily coefficient (gender)! Average individual homophily coefficient (year)! Harvard! Columbia! Stanford! Yale! Cornell! Dartmouth! UPenn! MIT! NYU! BU! Empirical average with % and % Chebyshev confidence intervals! Analytic result at best fit! Simulation result at best fit! 0 Fig.. Detailed results for Harvard University Degree distribution (log-log plot of frequency distribution)! Cum. distribution of individual clustering coefficients! ln(f(x)) F(x) ln(x) x Cum. distribution of the individual homophily index (gender)! Cum. distribution of the individual homophily index (year)! F(x) 0. F(x) x Black: empirical Red: analytical Blue: simulation! x

6 Data We use the September 00 cross-section of the complete structures of social connections on within (but not across) the first ten American colleges that joined Facebook (see [0]). We observe six social categories for each user: gender, year of graduation, major, minor, dorm, and high school. Since all personal data were provided voluntarily, some users did not submit all their information. We dropped any user (and their links), who has not provided all the personal characteristics other than high school. We therefore look only at students graduating between 00 and 00, who have supplied all the relevant personal characteristics (except high school). Empirical strategy We test our model against the data using the social categories identified in the Example. Using the available information in our dataset, we define agents i and j to be in the same class if they are in the same year and major or in the same year and minor. We assume that every agent i interacts in i s class and dorm with respective probabilities q and q. The probabilities of interacting with the gender and year social categories are set to zero (q = q = 0) since it is unreasonable to suppose that agents allocate time specifically to interacting with agents in these categories. Meeting agents of the same gender or year happens only through the interactions in the other social groups. Finally, q 0 = q q is the proportion of time spent interacting with all other agents (their free time). Hence, the model has parameters and degrees of freedom. We focus on explaining empirical homophily patterns in gender and year of graduation. Measuring homophily in these social categories is appropriate because gender and year of graduation are entirely immutable agent categories: unlike class and dorm, there is no feedback loop between social category membership and homophily.. Fitting the model to data In order to fit the model to the data (degree distribution and homophily), we used a grid search on parameters q 0, q, q, and p. For the degree distribution, we computed the analytical degree distribution, and, for homophily, we found the analytical homophily index in gender and year as a function of i s empirical degree at each point in the grid. We then found the values q 0, q, q, and p that minimise an intuitive loss function, which measures the overall error of the fit by taking the product of the normalised sums of squared distances between the analytical and the empirical distributions for degree and homophily in gender and year at each point in the grid.. Results For each college, we ran 00 simulations at its best-fitting values of q 0, q, q, and p. Figure presents results for all colleges showing that our model closely matches For q 0, q and q we took values from 0 to in steps of 0.0. For p, we took values from 0. to 0. in steps of The results shown are averages over the 00 runs.

7 average degree, average homophily, and the average individual clustering coefficient (see [, p. ] for a standard definition). Unsurprisingly, students spend most of their time interacting with others in their class. Interestingly, q 0 is small, which suggests that friendship patterns are far from random. Figure shows the empirical, analytical, and simulated degree, homophily (in gender and year), and individual clustering distributions for Harvard University. These fits are representative of the other colleges. Conclusions We presented a network formation model, which provides rich microfoundations for the macroscopic properties of online social networks. The friendship and homophily patterns generated by the model find good support in data. We were also able to estimate how much time agents spend in particular social groups. There is still scope for further theoretical work, including finding accurate analytical approximations to the clustering measures and diameter. Acknowledgments. We would like to thank Edo Gallo, Manuel Mueller-Frank, and John Quah for valuable discussions and three anonymous referees for their excellent suggestions. Bernie Hogan introduced us to digital social science research. References. Breiger, R.L.: The duality of persons and groups. Social Forces () () 0. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: Densification laws, shrinking diameters and possible explanations. In: KDD 0 Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. (00). Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: WWW 0 Proceedings of the th international conference on World Wide Web. (00). Foudalis, I., Jain, K., Papadimitriou, C.H., Sideri, M.: Modeling social networks through user background and behavior. In: th International Workshop on Algorithms and Models for the Web Graph (WAW). (0) 0. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Annual Review of Sociology (00). Currarini, S., Jackson, M.O., Pin, P.: An Economic Model of Friendship: Homophily, Minorities, and Segregation. Econometrica () (00) Barabási, A.L., Albert, R., Jeong, H.: Mean-field theory for scale-free random networks. Physica A (-) (). Jackson, M.O., Rogers, B.W.: Meeting strangers and friends of friends: How random are social networks? American Economic Review 0() (00) 0. Jackson, M.O.: Social and Economic Networks. Princeton University Press (00) 0. Traud, A.L., Mucha, P.J., Porter, M.A.: Social structure of Facebook networks. Physica A () (0) 0 In order to avoid making any assumptions about the distributions, we estimated standard errors around the empirical averages non-parametrically. Figure therefore represents the Chebyshev confidence intervals at the % and % levels. Note that clustering appears to fit relatively well even though it did not appear in our loss function.

Online Appendix to Social Network Formation and Strategic Interaction in Large Networks

Online Appendix to Social Network Formation and Strategic Interaction in Large Networks Online Appendix to Social Network Formation and Strategic Interaction in Large Networks Euncheol Shin Recent Version: http://people.hss.caltech.edu/~eshin/pdf/dsnf-oa.pdf October 3, 25 Abstract In this

More information

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Network Analytics in Marketing

Network Analytics in Marketing Network Analytics in Marketing Prof. Dr. Daning Hu Department of Informatics University of Zurich Nov 13th, 2014 Introduction: Network Analytics in Marketing Marketing channels and business networks have

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003 Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing

More information

Module1. x 1000. y 800.

Module1. x 1000. y 800. Module1 1 Welcome to the first module of the course. It is indeed an exciting event to share with you the subject that has lot to offer both from theoretical side and practical aspects. To begin with,

More information

Social Media Mining. Network Measures

Social Media Mining. Network Measures Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS 1. INTRODUCTION ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

More information

MINFS544: Business Network Data Analytics and Applications

MINFS544: Business Network Data Analytics and Applications MINFS544: Business Network Data Analytics and Applications March 30 th, 2015 Daning Hu, Ph.D., Department of Informatics University of Zurich F Schweitzer et al. Science 2009 Stop Contagious Failures in

More information

Notes from Week 1: Algorithms for sequential prediction

Notes from Week 1: Algorithms for sequential prediction CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

The ebay Graph: How Do Online Auction Users Interact?

The ebay Graph: How Do Online Auction Users Interact? The ebay Graph: How Do Online Auction Users Interact? Yordanos Beyene, Michalis Faloutsos University of California, Riverside {yordanos, michalis}@cs.ucr.edu Duen Horng (Polo) Chau, Christos Faloutsos

More information

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging Mathematical Problems in Engineering, Article ID 578713, 6 pages http://dx.doi.org/10.1155/2014/578713 Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Analyzing the Facebook graph?

Analyzing the Facebook graph? Logistics Big Data Algorithmic Introduction Prof. Yuval Shavitt Contact: shavitt@eng.tau.ac.il Final grade: 4 6 home assignments (will try to include programing assignments as well): 2% Exam 8% Big Data

More information

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals

More information

I. Pointwise convergence

I. Pointwise convergence MATH 40 - NOTES Sequences of functions Pointwise and Uniform Convergence Fall 2005 Previously, we have studied sequences of real numbers. Now we discuss the topic of sequences of real valued functions.

More information

Worksheet for Teaching Module Probability (Lesson 1)

Worksheet for Teaching Module Probability (Lesson 1) Worksheet for Teaching Module Probability (Lesson 1) Topic: Basic Concepts and Definitions Equipment needed for each student 1 computer with internet connection Introduction In the regular lectures in

More information

Some questions... Graphs

Some questions... Graphs Uni Innsbruck Informatik - 1 Uni Innsbruck Informatik - 2 Some questions... Peer-to to-peer Systems Analysis of unstructured P2P systems How scalable is Gnutella? How robust is Gnutella? Why does FreeNet

More information

Jure Leskovec (@jure) Stanford University

Jure Leskovec (@jure) Stanford University Jure Leskovec (@jure) Stanford University KDD Summer School, Beijing, August 2012 8/10/2012 Jure Leskovec (@jure), KDD Summer School 2012 2 Graph: Kronecker graphs Graph Node attributes: MAG model Graph

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Self similarity of complex networks & hidden metric spaces

Self similarity of complex networks & hidden metric spaces Self similarity of complex networks & hidden metric spaces M. ÁNGELES SERRANO Departament de Química Física Universitat de Barcelona TERA-NET: Toward Evolutive Routing Algorithms for scale-free/internet-like

More information

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION Sérgio Pequito, Stephen Kruzick, Soummya Kar, José M. F. Moura, A. Pedro Aguiar Department of Electrical and Computer Engineering

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Graph Mining Techniques for Social Media Analysis

Graph Mining Techniques for Social Media Analysis Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

Economics 1011a: Intermediate Microeconomics

Economics 1011a: Intermediate Microeconomics Lecture 12: More Uncertainty Economics 1011a: Intermediate Microeconomics Lecture 12: More on Uncertainty Thursday, October 23, 2008 Last class we introduced choice under uncertainty. Today we will explore

More information

The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation. Lectures INF2320 p. 1/88 The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

minimal polyonomial Example

minimal polyonomial Example Minimal Polynomials Definition Let α be an element in GF(p e ). We call the monic polynomial of smallest degree which has coefficients in GF(p) and α as a root, the minimal polyonomial of α. Example: We

More information

-Duplication of Time-Varying Graphs

-Duplication of Time-Varying Graphs -Duplication of Time-Varying Graphs François Queyroi Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris CNRS, UMR 7606, LIP6, F-75005, Paris, France francois.queyroi@lip6.fr ABSTRACT.

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Testing Random- Number Generators

Testing Random- Number Generators Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland

More information

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS International Scientific Conference & International Workshop Present Day Trends of Innovations 2012 28 th 29 th May 2012 Łomża, Poland DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS Lubos Takac 1 Michal Zabovsky

More information

A discussion of Statistical Mechanics of Complex Networks P. Part I

A discussion of Statistical Mechanics of Complex Networks P. Part I A discussion of Statistical Mechanics of Complex Networks Part I Review of Modern Physics, Vol. 74, 2002 Small Word Networks Clustering Coefficient Scale-Free Networks Erdös-Rényi model cover only parts

More information

Expansion Properties of Large Social Graphs

Expansion Properties of Large Social Graphs Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop

A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY. Workshop A LONGITUDINAL AND SURVIVAL MODEL WITH HEALTH CARE USAGE FOR INSURED ELDERLY Ramon Alemany Montserrat Guillén Xavier Piulachs Lozada Riskcenter - IREA Universitat de Barcelona http://www.ub.edu/riskcenter

More information

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing! MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS

DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS DEGREES OF ORDERS ON TORSION-FREE ABELIAN GROUPS ASHER M. KACH, KAREN LANGE, AND REED SOLOMON Abstract. We construct two computable presentations of computable torsion-free abelian groups, one of isomorphism

More information

Inequality, Mobility and Income Distribution Comparisons

Inequality, Mobility and Income Distribution Comparisons Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

More information

Hacking-proofness and Stability in a Model of Information Security Networks

Hacking-proofness and Stability in a Model of Information Security Networks Hacking-proofness and Stability in a Model of Information Security Networks Sunghoon Hong Preliminary draft, not for citation. March 1, 2008 Abstract We introduce a model of information security networks.

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Efficient target control of complex networks based on preferential matching

Efficient target control of complex networks based on preferential matching Efficient target control of complex networks based on preferential matching Xizhe Zhang 1, Huaizhen Wang 1, Tianyang Lv 2,3 1 (School of Computer Science and Engineering, Northeastern University, Shenyang110819,

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Northumberland Knowledge

Northumberland Knowledge Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

More information

De Jaeger, Emmanuel ; Du Bois, Arnaud ; Martin, Benoît. Document type : Communication à un colloque (Conference Paper)

De Jaeger, Emmanuel ; Du Bois, Arnaud ; Martin, Benoît. Document type : Communication à un colloque (Conference Paper) "Hosting capacity of LV distribution grids for small distributed generation units, referring to voltage level and unbalance" De Jaeger, Emmanuel ; Du Bois, Arnaud ; Martin, Benoît Abstract This paper revisits

More information

Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test

Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test Tingting Chen Sheng Zhong Computer Science and Engineering Department State University of New york at Buffalo Amherst, NY

More information

High Throughput Network Analysis

High Throughput Network Analysis High Throughput Network Analysis Sumeet Agarwal 1,2, Gabriel Villar 1,2,3, and Nick S Jones 2,4,5 1 Systems Biology Doctoral Training Centre, University of Oxford, Oxford OX1 3QD, United Kingdom 2 Department

More information

OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME. Amir Gandomi *, Saeed Zolfaghari **

OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME. Amir Gandomi *, Saeed Zolfaghari ** OPTIMAL DESIGN OF A MULTITIER REWARD SCHEME Amir Gandomi *, Saeed Zolfaghari ** Department of Mechanical and Industrial Engineering, Ryerson University, Toronto, Ontario * Tel.: + 46 979 5000x7702, Email:

More information

Temporal Dynamics of Scale-Free Networks

Temporal Dynamics of Scale-Free Networks Temporal Dynamics of Scale-Free Networks Erez Shmueli, Yaniv Altshuler, and Alex Sandy Pentland MIT Media Lab {shmueli,yanival,sandy}@media.mit.edu Abstract. Many social, biological, and technological

More information

arxiv:cs.dm/0204001 v1 30 Mar 2002

arxiv:cs.dm/0204001 v1 30 Mar 2002 A Steady State Model for Graph Power Laws David Eppstein Joseph Wang arxiv:cs.dm/0000 v 0 Mar 00 Abstract Power law distribution seems to be an important characteristic of web graphs. Several existing

More information

The Effectiveness of Collaborative Learning in Group Projects

The Effectiveness of Collaborative Learning in Group Projects Collaborative Learning in Geographically Distributed and In-person Groups René F. Kizilcec Department of Communication, Stanford University, Stanford CA 94305 kizilcec@stanford.edu Abstract. Open online

More information

Understanding Graph Sampling Algorithms for Social Network Analysis

Understanding Graph Sampling Algorithms for Social Network Analysis Understanding Graph Sampling Algorithms for Social Network Analysis Tianyi Wang, Yang Chen 2, Zengbin Zhang 3, Tianyin Xu 2 Long Jin, Pan Hui 4, Beixing Deng, Xing Li Department of Electronic Engineering,

More information

Introduction to Networks and Business Intelligence

Introduction to Networks and Business Intelligence Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological

More information

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

1. Write the number of the left-hand item next to the item on the right that corresponds to it. 1. Write the number of the left-hand item next to the item on the right that corresponds to it. 1. Stanford prison experiment 2. Friendster 3. neuron 4. router 5. tipping 6. small worlds 7. job-hunting

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

Max-Min Representation of Piecewise Linear Functions

Max-Min Representation of Piecewise Linear Functions Beiträge zur Algebra und Geometrie Contributions to Algebra and Geometry Volume 43 (2002), No. 1, 297-302. Max-Min Representation of Piecewise Linear Functions Sergei Ovchinnikov Mathematics Department,

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Exploring contact patterns between two subpopulations

Exploring contact patterns between two subpopulations Exploring contact patterns between two subpopulations Winfried Just Hannah Callender M. Drew LaMar December 23, 2015 In this module 1 we introduce a construction of generic random graphs for a given degree

More information

A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

More information

http://www.elsevier.com/copyright

http://www.elsevier.com/copyright This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities

More information

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH ZACHARY ABEL 1. Introduction In this survey we discuss properties of the Higman-Sims graph, which has 100 vertices, 1100 edges, and is 22 regular. In fact

More information

Analysis of Internet Topologies

Analysis of Internet Topologies Analysis of Internet Topologies Ljiljana Trajković ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British

More information

CSV886: Social, Economics and Business Networks. Lecture 2: Affiliation and Balance. R Ravi ravi+iitd@andrew.cmu.edu

CSV886: Social, Economics and Business Networks. Lecture 2: Affiliation and Balance. R Ravi ravi+iitd@andrew.cmu.edu CSV886: Social, Economics and Business Networks Lecture 2: Affiliation and Balance R Ravi ravi+iitd@andrew.cmu.edu Granovetter s Puzzle Resolved Strong Triadic Closure holds in most nodes in social networks

More information

GRADES 7, 8, AND 9 BIG IDEAS

GRADES 7, 8, AND 9 BIG IDEAS Table 1: Strand A: BIG IDEAS: MATH: NUMBER Introduce perfect squares, square roots, and all applications Introduce rational numbers (positive and negative) Introduce the meaning of negative exponents for

More information

You Are What You Bet: Eliciting Risk Attitudes from Horse Races

You Are What You Bet: Eliciting Risk Attitudes from Horse Races You Are What You Bet: Eliciting Risk Attitudes from Horse Races Pierre-André Chiappori, Amit Gandhi, Bernard Salanié and Francois Salanié March 14, 2008 What Do We Know About Risk Preferences? Not that

More information

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test

Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information