# Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity

Save this PDF as:

Size: px
Start display at page:

Download "Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity"

## Transcription

5 Average Number of People in Annulus of Width 0.1 Miles People vs. Distance for Varying Densities Low Density Medium Density High Density Probability of Friendship e-05 1e-06 Probability of Friendship versus Distance Combined Best fit ( x) Miles 1e Miles Figure 6: Number of individuals as a function of distance. Here we show how many people there are on average who live x miles away. We divide the US into low, medium and high density areas, and compute the curves independently for each. the average population density is quite unrelated to the density at the center of the annulus, and becomes more closely related to the average population density in the US. This causes the three curves to meet and overlap from 50 miles onward. 3.2 Friendship and Distance We now turn to an investigation of the probability of friendship as a function of distance. Naturally, we expect the probability to go down with distance and this is what we observe in Figure 7. To generate this curve, we aggregate over all individuals, computing the distance between all pairs of individuals with known addresses. We then bucket by intervals of 0.1 miles to compute the total number of pairs and the number of pairs for which an edge is present, plotting the ratio. It turns out that we can get a good fit to a curve of the form a(b + x) c. The exponent very close to c = 1 suggests that, at medium to long-range distances, the probability of friendship is roughly inversely proportional to distance. At shorter scales the curve is flatter, suggesting that there is less sensitivity to short distances than a power-law with exponent 1 would produce. The 1 exponent has been observed in other datasets as well [15], suggesting that there is a more general principle at work here. However, this does not tell the full story, as it aggregates people together from very different settings. When we break it down by population density in Figure 8, a somewhat different account emerges; for short distances the probability is higher in lower density areas as you are more likely to be friends with a person a few miles away if you live in a less dense area. Interestingly, as the distance increases, the three curves converge. At about 50 miles, we see that the probability of knowing someone is no longer dependent on density. In fact, as we go further away, the order inverts, with people in high density areas being more likely to be friends with people at greater distances. This supports the intuition that Figure 7: Probability of friendship as a function of distance. By computing the number of pairs of individuals at varying distances, along with the number of friends at those distances, we are able to compute the probability of two people at distance d knowing each other. We see here that it is a reasonably good fit to a power-law with exponent near 1. people living in metropolitan areas are more cosmopolitan; their ties to distant places are more likely, probably arising from increased movement between cities and greater capacity to travel. An alternative to observing friendship probability as a function of distance is to look instead as a function of rank. As described in Liben-Nowell et al., we define rank as the number of people who live closer than a user. For user u, we rank users by distance from u. For user v, the number of people living in the area between u and v is defined by rank u(v) := {w : d(u, w) < d(u, v)}. The hope here is that despite the differences in population density, the probability of being friends with someone at a given rank should be independent of where you live. Figure 9 shows friendship probability as a function of rank. Here we do see a nice smooth curve, again with an exponent of close to 1 (as previously observed). Even though using rank should mitigate the effect of density on our probability calculation, it does not control for the behaviors of users in different areas. Figure 10 shows the probability of friendship as a function of rank, this time broken down by our three density groups. Though the curves do overlap somewhat more when we calculate things this way (all with exponent about 1), we still see similar effects. The probability is higher at low ranks for people in less dense areas, and higher at high ranks for people in more dense areas (cosmopolitan effect). This reinforces the notion that people who live in urban areas tend to have more dispersed friends. 4. PREDICTING LOCATION A practical application of the observations made thus far is that they allow us to predict the locations of people who have not provided this information. If we can accurately predict an individual s location, we can improve services for them

6 0.1 Probability of Friendship for Varying Densities Low Density Medium Density High Density 0.1 Number of Friends at Different Ranks Total connections at ranks Best Fit (rank+104) Probability of Friendship e-05 Count e-06 1e-05 1e Miles 1e e+06 Rank Figure 8: Looking at the people living in low, medium and high density regions separately, we see that if you live in a high density region (a city), you are less likely to know a nearby individual, since there are so many of them. However, you are more likely to have contact with someone far away. in a number of ways. The most obvious application is that we can provide them with better local content. Providing a more local, personalized experience is likely to make a site more useful for users. We can also use a person s location to help prevent security breaches if an individual accesses the site from a location far from home (where the individual s current location is approximated via IP geolocation), and they have never been there before, we might ask them an additional security question to ensure that their account has not been compromised. Thus, our goal here is, given the locations of a user s contacts, to compute that user s home location. In the simplest case, all of one s friends would live in a small region, and then the prediction task would be very simple, with any reasonable algorithm returning a good approximation. Things get more complicated and difficult as one s friends become more spread out. The distributions from the previous sections tell us that one will typically not have too many friends at great distances, but that there will be too many for naive algorithms to work well. For instance, a first attempt would be to take the mean location of one s friends. However, this will give laughably bad results for people living on either coast. An individual with 10 friends in San Francisco and one friend in New York will be placed an eleventh of the way from San Francisco to New York, somewhere in Nevada. Other simple statistics, like median (whatever that would mean in two dimensions) do better, but still fail, especially for people living on the coasts. To achieve better performance, we must develop a more sophisticated model using the observations from the proceeding sections. Figure 7 shows the probability of an edge being present as a function of distance, which suggests a maximum likelihood approach. We consider an individual u with k friends. Using the distribution from Figure 7, we can computed the likelihood of a given location l u = (lat, long). Figure 9: The rank of a person v relative to u is the number of individuals w such that d(u, w) < d(u, v). Here we show the probability of friendship as a function of rank. For each friend v of u whose location l v is known, we can compute the probability of the edge being present given the distance between u and v, p( l u l v ) = ( l u l v ) 1.05, as empirically determined. Multiplying these probabilities together for all such v, we obtain a likelihood for all the edges. To complete the calculation, we must also multiply the probabilities of all the other edges not being present: 1 p( l u l v ) for all v such that v / E. Because all of the probabilities are very small for any particular edge, this term serves mostly as a tiebreaker and typically plays a small role. Thus, we can write down the likelihood of a particular location l u as Y (u,v) E p( l u l v ) Y (u,v) / E 1 p( l u l v ) This model gives us a way to evaluate any point l u. From a practical point of view, however, the algorithm as stated is very expensive. In a naive implementation, to find the best location for one individual, we would have to compute the probability terms for every other user, at an expense of O(N) per location evaluated. Finding the best location would require an additional search, making this impractical in a large graph. With two optimizations, however, we can develop an efficient algorithm which computes the (near) optimal locations for all individuals in O(M log N) assuming that the maximum degree in the graph is O(log N) (where M is the number of edges and N is the number of users). The first important observation is that, for any location, the second part of the product, containing 1 p( ), is very nearly independent of u. Thus, we can precompute a constant γ l = Q v V 1 p( lu lv ) for each location l. We can then rewrite the above formula as: γ lu = Y p( l u l v ) 1 p( l u l v ) (u,v) E The other important optimization comes from the form

10 ally applicable and will allow us to improve our location estimates in countries where IP-address often tells no more than the name of the country. 6. ACKNOWLEDGEMENTS We would like to thank Stephen Heise for his work on building a geocoder service that made our experimentation possible. 7. REFERENCES [1] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman. Search in power-law networks. Physical review E, 64(4):46135, [2] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD 06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 44 54, New York, NY, USA, ACM Press. [3] L. Backstrom, J. Kleinberg, R. Kumar, and J. Novak. Spatial variation in search engine queries. In WWW 08: Proceeding of the 17th international conference on World Wide Web, pages , New York, NY, USA, ACM. [4] C. Butts. Predictability of large-scale spatially embedded networks. In Dynamic Social Network Modeling and Analysis: workshop summary and papers, pages , [5] D. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world s photos. In WWW, [6] R. Dhamija, J. D. Tygar, and M. Hearst. Why phishing works. In CHI 06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages , New York, NY, USA, ACM. [7] L. Festinger, S. Schachter, and K. Back. Social pressures in informal groups: A study of human factors in housing. Stanford Univ Pr, [8] E. Gilbert, K. Karahalios, and C. Sandvig. The network in the garden: an empirical analysis of social media in rural life. In CHI 08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages , New York, NY, USA, ACM. [9] S. Graham. The end of geography or the explosion of place? Conceptualizing space, place and information technology. Progress in human geography, 22(2):165, [10] A. Khalil and K. Connelly. Context-aware telephony: privacy preferences and sharing patterns. In CSCW 06: Proceedings of the th anniversary conference on Computer supported cooperative work, pages , New York, NY, USA, ACM. [11] J. Kleinberg. Navigation in a small world. Nature, 406(6798): , [12] J. Kleinberg. The small-world phenomenon: an algorithm perspective. In STOC 00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages , New York, NY, USA, ACM Press. [13] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, page 617. ACM, [14] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages ACM, [15] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. Proceedings of the National Academy of Sciences, 102(33):11623, [16] MaxMind, Inc. GeoIP City Accuracy for Selected Countries, November [17] B. Mayhew and R. Levinger. Size and the density of interaction in human aggregates. The American Journal of Sociology, 82(1):86 110, [18] D. Mok and B. Wellman. Did distance matter before the Internet? Interpersonal contact and support in the 1970s. Social networks, 29(3): , [19] L. Nahemow and M. Lawton. Similarity and propinquity in friendship formation. Journal of Personality and Social Psychology, 32(2): , [20] J. Q. Stewart. An inverse distance variation for certain social influences. Science, 93(2404):89 90, [21] U.S. Census Bureau. Census 2000 Summary File 1, DCGeoSelectServlet?ds_name=DEC_2000_SF1_U. [22] U.S. Census Bureau. Redistricting Census 2000 TIGER/Line Files, geo/www/tiger/tiger2k/tgr2000.html.

### 1 Six Degrees of Separation

Networks: Spring 2007 The Small-World Phenomenon David Easley and Jon Kleinberg April 23, 2007 1 Six Degrees of Separation The small-world phenomenon the principle that we are all linked by short chains

### Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

### SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION. Marc-Olivier Briat, Jean-Luc Monnot, Edith M.

SCALABILITY OF CONTEXTUAL GENERALIZATION PROCESSING USING PARTITIONING AND PARALLELIZATION Abstract Marc-Olivier Briat, Jean-Luc Monnot, Edith M. Punt Esri, Redlands, California, USA mbriat@esri.com, jmonnot@esri.com,

### Ground-truthing spatial activities through online social networking data

Ground-truthing spatial activities through online social networking data G. D. McKenzie* and M. Raubal** *Department of Geography, University of California Santa Barbara 1832 Ellison Hall, Santa Barbara,

### IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

### Link Prediction in Social Networks

CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

### Minesweeper as a Constraint Satisfaction Problem

Minesweeper as a Constraint Satisfaction Problem by Chris Studholme Introduction To Minesweeper Minesweeper is a simple one player computer game commonly found on machines with popular operating systems

### IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

### CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance

CS224W Project Report: Finding Top UI/UX Design Talent on Adobe Behance Susanne Halstead, Daniel Serrano, Scott Proctor 6 December 2014 1 Abstract The Behance social network allows professionals of diverse

### Cost effective Outbreak Detection in Networks

Cost effective Outbreak Detection in Networks Jure Leskovec Joint work with Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance Diffusion in Social Networks One of

### Beyond Friendship Graphs: A Study of User Interactions in Flickr

Beyond Friendship Graphs: A Study of User Interactions in Flickr ABSTRACT Masoud Valafar, Reza Rejaie Department of Computer & Information Science University of Oregon Eugene, OR 9743 {masoud,reza}@cs.uoregon.edu

### Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

### Gender-based Models of Location from Flickr

Gender-based Models of Location from Flickr Neil O Hare Yahoo! Research, Barcelona, Spain nohare@yahoo-inc.com Vanessa Murdock Microsoft vanessa.murdock@yahoo.com ABSTRACT Geo-tagged content from social

### Tap Unexplored Markets Using Segmentation The Advantages of Real-Time Dynamic Segmentation

Tap Unexplored Markets Using Segmentation The Advantages of Real-Time Dynamic Segmentation What is Segmentation and How Does it Apply to Website Traffic? Market segmentation, the practice of breaking down

### Applying Machine Learning to Stock Market Trading Bryce Taylor

Applying Machine Learning to Stock Market Trading Bryce Taylor Abstract: In an effort to emulate human investors who read publicly available materials in order to make decisions about their investments,

### VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR

VOLATILITY AND DEVIATION OF DISTRIBUTED SOLAR Andrew Goldstein Yale University 68 High Street New Haven, CT 06511 andrew.goldstein@yale.edu Alexander Thornton Shawn Kerrigan Locus Energy 657 Mission St.

### Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

### Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

Evaluation of a New Method for Measuring the Internet Distribution: Simulation Results Christophe Crespelle and Fabien Tarissan LIP6 CNRS and Université Pierre et Marie Curie Paris 6 4 avenue du président

### Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598. Keynote, Outlier Detection and Description Workshop, 2013

Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### An On-Line Algorithm for Checkpoint Placement

An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute

### (Refer Slide Time: 00:00:56 min)

Numerical Methods and Computation Prof. S.R.K. Iyengar Department of Mathematics Indian Institute of Technology, Delhi Lecture No # 3 Solution of Nonlinear Algebraic Equations (Continued) (Refer Slide

### Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Graph models for the Web and the Internet Elias Koutsoupias University of Athens and UCLA Crete, July 2003 Outline of the lecture Small world phenomenon The shape of the Web graph Searching and navigation

Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

### Chapter 20. The Small-World Phenomenon. 20.1 Six Degrees of Separation

From the book Networks, Crowds, and Markets: Reasoning about a Highly Connected World. By David Easley and Jon Kleinberg. Cambridge University Press, 2010. Complete preprint on-line at http://www.cs.cornell.edu/home/kleinber/networks-book/

### Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

### BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project

BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project Advisor: Sharon Goldberg Adam Udi 1 Introduction Interdomain routing, the primary method of communication on the internet,

### Credit Card Market Study Interim Report: Annex 4 Switching Analysis

MS14/6.2: Annex 4 Market Study Interim Report: Annex 4 November 2015 This annex describes data analysis we carried out to improve our understanding of switching and shopping around behaviour in the UK

### On Correlating Performance Metrics

On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

### Demographics of Atlanta, Georgia:

Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From

### BENCHMARKING PERFORMANCE AND EFFICIENCY OF YOUR BILLING PROCESS WHERE TO BEGIN

BENCHMARKING PERFORMANCE AND EFFICIENCY OF YOUR BILLING PROCESS WHERE TO BEGIN There have been few if any meaningful benchmark analyses available for revenue cycle management performance. Today that has

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

### Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca

Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?

### Measurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science*

Measurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science* Nathalie E. Williams 1, Timothy Thomas 2, Matt Dunbar 3, Nathan Eagle 4 and Adrian Dobra 5 Working Paper

### Estimating Twitter User Location Using Social Interactions A Content Based Approach

2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing Estimating Twitter User Location Using Social Interactions A Content Based

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### 6.207/14.15: Networks Lecture 6: Growing Random Networks and Power Laws

6.207/14.15: Networks Lecture 6: Growing Random Networks and Power Laws Daron Acemoglu and Asu Ozdaglar MIT September 28, 2009 1 Outline Growing random networks Power-law degree distributions: Rich-Get-Richer

### In the analysis and appraisal of real estate,

RESIDENTIAL APPRAISING Adjusting Market Value over Time by Mike Wolff In the analysis and appraisal of real estate, evaluating prices over time is often required. This article will attempt to show how

### Managing Incompleteness, Complexity and Scale in Big Data

Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:

### An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

### Gas Dynamics Prof. T. M. Muruganandam Department of Aerospace Engineering Indian Institute of Technology, Madras. Module No - 12 Lecture No - 25

(Refer Slide Time: 00:22) Gas Dynamics Prof. T. M. Muruganandam Department of Aerospace Engineering Indian Institute of Technology, Madras Module No - 12 Lecture No - 25 Prandtl-Meyer Function, Numerical

### Can you design an algorithm that searches for the maximum value in the list?

The Science of Computing I Lesson 2: Searching and Sorting Living with Cyber Pillar: Algorithms Searching One of the most common tasks that is undertaken in Computer Science is the task of searching. We

### Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

### Automated Process for Generating Digitised Maps through GPS Data Compression

Automated Process for Generating Digitised Maps through GPS Data Compression Stewart Worrall and Eduardo Nebot University of Sydney, Australia {s.worrall, e.nebot}@acfr.usyd.edu.au Abstract This paper

### Strategic Online Advertising: Modeling Internet User Behavior with

2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

### A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

### Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at

Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How

### Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

### Graph Database Proof of Concept Report

Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

### Monotonicity Hints. Abstract

Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology

Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that

### large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### PREDICTIVE ANALYTICS vs HOT SPOTTING

PREDICTIVE ANALYTICS vs HOT SPOTTING A STUDY OF CRIME PREVENTION ACCURACY AND EFFICIENCY 2014 EXECUTIVE SUMMARY For the last 20 years, Hot Spots have become law enforcement s predominant tool for crime

### Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

### Technology White Paper Capacity Constrained Smart Grid Design

Capacity Constrained Smart Grid Design Smart Devices Smart Networks Smart Planning EDX Wireless Tel: +1-541-345-0019 I Fax: +1-541-345-8145 I info@edx.com I www.edx.com Mark Chapman and Greg Leon EDX Wireless

### Inflation. Chapter 8. 8.1 Money Supply and Demand

Chapter 8 Inflation This chapter examines the causes and consequences of inflation. Sections 8.1 and 8.2 relate inflation to money supply and demand. Although the presentation differs somewhat from that

### COST PROXY MODELS IN RURAL TELEPHONE COMPANIES

COST PROXY MODELS IN RURAL TELEPHONE COMPANIES Prepared for the United States Telephone Association by Austin Communications Education Services, Inc. 2937 Landmark Way Palm Harbor, FL 34684-5019 USA Tel:

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Identifying Market Price Levels using Differential Evolution

Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand mmayo@waikato.ac.nz WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary

### Sanjeev Kumar. contribute

RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Exploring spatial decay effect in mass media and social media: a case study of China

Exploring spatial decay effect in mass media and social media: a case study of China Yihong Yuan Department of Geography, Texas State University, San Marcos, TX, USA, 78666. Tel: +1(512)-245-3208 Email:yuan@txtate.edu

### Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### MASCOT Search Results Interpretation

The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### 3 More on Accumulation and Discount Functions

3 More on Accumulation and Discount Functions 3.1 Introduction In previous section, we used 1.03) # of years as the accumulation factor. This section looks at other accumulation factors, including various

### Understanding crime hotspot maps

NSW Bureau of Crime Statistics and Research Bureau Brief Issue paper no. 60 April 2011 Understanding crime hotspot maps Melissa Burgess The distribution of crime across a region is not random. A number

### I. CENSUS SURVIVAL METHOD

I. CENSUS SURVIVAL METHOD Census survival methods are the oldest and most widely applicable methods of estimating adult mortality. These methods assume that mortality levels can be estimated from the survival

### OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

### Evolution of a Location-based Online Social Network: Analysis and Models

Evolution of a Location-based Online Social Network: Analysis and Models Miltiadis Allamanis Computer Laboratory University of Cambridge ma536@cam.ac.uk Salvatore Scellato Computer Laboratory University

### 4. Introduction to Statistics

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

### LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

### The Rational Method. David B. Thompson Civil Engineering Deptartment Texas Tech University. Draft: 20 September 2006

The David B. Thompson Civil Engineering Deptartment Texas Tech University Draft: 20 September 2006 1. Introduction For hydraulic designs on very small watersheds, a complete hydrograph of runoff is not

### Making Sense of the Mayhem: Machine Learning and March Madness

Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University atran3@stanford.edu ginzberg@stanford.edu I. Introduction III. Model The goal of our research

### Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

### Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

### Global Seasonal Phase Lag between Solar Heating and Surface Temperature

Global Seasonal Phase Lag between Solar Heating and Surface Temperature Summer REU Program Professor Tom Witten By Abstract There is a seasonal phase lag between solar heating from the sun and the surface

### Efficient visual search of local features. Cordelia Schmid

Efficient visual search of local features Cordelia Schmid Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million images

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### REDUCING UNCERTAINTY IN SOLAR ENERGY ESTIMATES

REDUCING UNCERTAINTY IN SOLAR ENERGY ESTIMATES Mitigating Energy Risk through On-Site Monitoring Marie Schnitzer, Vice President of Consulting Services Christopher Thuman, Senior Meteorologist Peter Johnson,

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### Organizing Your Approach to a Data Analysis

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

### KNOWLEDGE BASE ARTICLE Zero Voltage Transmission (ZVT) Technology. Basics of the GPZ 7000 Technology: Zero Voltage Transmission (ZVT)

KNOWLEDGE BASE ARTICLE Zero Voltage Transmission (ZVT) Technology Basics of the GPZ 7000 Technology: Zero Voltage Transmission (ZVT) By Bruce Candy Basic Metal Detection Principles 1 2 3 4 Simplified representation

### Collaborative Forecasting

Collaborative Forecasting By Harpal Singh What is Collaborative Forecasting? Collaborative forecasting is the process for collecting and reconciling the information from diverse sources inside and outside

### POWER-LAW FITTING AND LOG-LOG GRAPHS

5 POWER-LAW FITTING AND LOG-LOG GRAPHS She had taken up the idea, she supposed, and made everything bend to it. --- Emma 5.1 DEALING WITH POWER LAWS Although many relationships in nature are linear, some

### Report on the Scaling of the 2012 NSW Higher School Certificate

Report on the Scaling of the 2012 NSW Higher School Certificate NSW Vice-Chancellors Committee Technical Committee on Scaling Universities Admissions Centre (NSW & ACT) Pty Ltd 2013 ACN 070 055 935 ABN

### Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements

### An Introduction to Point Pattern Analysis using CrimeStat

Introduction An Introduction to Point Pattern Analysis using CrimeStat Luc Anselin Spatial Analysis Laboratory Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

### Measuring Line Edge Roughness: Fluctuations in Uncertainty

Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as