Examining the Six Degrees in Film: George Clooney as the New Kevin Bacon

Size: px
Start display at page:

Download "Examining the Six Degrees in Film: George Clooney as the New Kevin Bacon"

Transcription

1 ExaminingtheSixDegreesinFilm: GeorgeClooneyastheNewKevinBacon Introduction SixDegreesofKevinBacon isaparlorgamestatingthatanyactorcanbelinkedtothe eponymouscelebritywithinsixdegreesofseparationorfewerthroughassociationinfilmor television1.theconceptisalsoseenintheerdősnumber,whichmeasuresdegreesofseparation throughmathematicalpaperauthorshiptomathematicianpaulerdős,andcanbegeneralizedto thetheoryofsixdegreesofseparation,whichdescribesallhumanbeings.thisprojectaimedto algorithmicallyfindactorsoractresseswhoexhibithightraitsof Bacon ability,orcentrality withinthenetworkofactors,usingtheinternetmoviedatabase(imdb).2.theendgoalwasto findasuitablereplacementorrunner uptokevinbaconinthecontextofhisownparlorgame

2 Methodology To address this problem, we treated the data as a graph of actors and films. This was implemented through the use of nested lists: a list of actors each with his own list of associated works, and a list of films each with its own list of participating actors. One could imagine our data structure as nodes of actors connected by edges of common works, or as a graph of two colors of nodes connected by simple association. Either way, our program works the same way algorithmically. To assess the centrality of each actor, we considered using criteria such as number of actors of direct association (the degree centrality of the actor), number of unique works of the actor, and degrees of separation (or distance) from Bacon. We decided to calculate the percentage of all actors connected within a certain depth of separation to the actor. We applied this assessment to 188 popular actors, which were found by running our algorithm on one thousand randomly generated subsets. The IMDb provides two data files which list millions of pairings of actors and their films. Every actor film association has its own line in the data file (meaning multiple lines for each actor and multiple lines for each work). We cleaned this data of extraneous information like character roles and television episodes, and further simplified it by assigning each actor and each film a unique integer index that could later be referenced. This allowed functions within our project to not have to handle the text strings directly. The program used to format data is included, but our data is not, due to size constraints. The core coverage calculating algorithm we implemented finds a set of actors connected to a particular actor within a depth of association of three using a breadth first graph search. A depth of three was found to produce the most meaningful results without obscuring data by 2

3 completely covering the graph. The algorithm iteratively adds the actor and each of his associates to a result array, whose size is then taken. In order to avoid duplicates through circulation, we tracked the actors already added to the array and ignored them if they had been counted. Because naïvely evaluating every actor individually would have taken an impractical amount of time it is O(n) in actors to evaluate a single actor, so O(n 2 ) for all actors we first took many small, randomized subsets of the data and ran our algorithms on each actor in them. Since the most central actors will be redundantly connected (a reasonable assumption considering the nature of the television and film industry), with a large enough number of samples, the top actors were fairly consistent. The 188 most cumulatively popular actors in all samples were selected and evaluated individually against the entire data set. Note that our task is very parallelizable; a modified version of the program which does in fact evaluate every single actor is included. It can be run on a fast computer with speedup roughly proportional to the number of processors available. With minor modification the program can be run on multiple computers with similar speed improvement. 3

4 Results Top10Best ConnectedActors andkevinbacon Rank 1 Name Associations GeorgeClooney 2,950,145 SamuelL.Jackson 2,948,974 WillSmith 2,947,396 TomHanks 2,947,162 ArnoldSchwarzenegger 2,947,153 HughJackman 2,946,940 MorganFreeman 2,945,567 BillClinton 2,945,504 HarrisonFord 2,945,390 MattDamon 2,945,

5 55 Kevin Bacon 2,935,433 Of the 188 actors we evaluated against the data set, the top ten most popular actors are presented in the table above. A list of complete results for all 188 is included with this document. George Clooney tops our list, with 2,950,145 associated actors. Surprisingly, former President Bill Clinton appears at eighth in our list, possibly due to his presence on television as himself. The lack of diversity amongst the most popular actors should be noted. Of the top ten actors, no women are present, only two races are represented (seven of the ten are white), and only one was born outside of the United States. The first woman to appear in our rankings is Whoopi Goldberg, who places fourteenth with a score of 2,944,303. Jennifer Lopez, the highest ranked Latina in our list, comes in at number fifteen. The most compelling result of our project was that Kevin Bacon himself actually ranks only 55th on our list. This suggests that Kevin Bacon is not the best subject for the six degrees of film experiment, and that George Clooney is in fact, for this purpose, a more suitable Kevin Bacon than Bacon himself. 5

Symbol Tables. Introduction

Symbol Tables. Introduction Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

Database Design Patterns. Winter 2006-2007 Lecture 24

Database Design Patterns. Winter 2006-2007 Lecture 24 Database Design Patterns Winter 2006-2007 Lecture 24 Trees and Hierarchies Many schemas need to represent trees or hierarchies of some sort Common way of representing trees: An adjacency list model Each

More information

Curriculum Map. Discipline: Computer Science Course: C++

Curriculum Map. Discipline: Computer Science Course: C++ Curriculum Map Discipline: Computer Science Course: C++ August/September: How can computer programs make problem solving easier and more efficient? In what order does a computer execute the lines of code

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Machine Learning for Naive Bayesian Spam Filter Tokenization

Machine Learning for Naive Bayesian Spam Filter Tokenization Machine Learning for Naive Bayesian Spam Filter Tokenization Michael Bevilacqua-Linn December 20, 2003 Abstract Background Traditional client level spam filters rely on rule based heuristics. While these

More information

Network Algorithms for Homeland Security

Network Algorithms for Homeland Security Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially

More information

Predicting the Stock Market with News Articles

Predicting the Stock Market with News Articles Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is

More information

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

Link Prediction in Social Networks

Link Prediction in Social Networks CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily

More information

Social Relationship Analysis with Data Mining

Social Relationship Analysis with Data Mining Social Relationship Analysis with Data Mining John C. Hancock Microsoft Corporation www.johnchancock.net November 2005 Abstract: The data mining algorithms in Microsoft SQL Server 2005 can be used as a

More information

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs

SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large

More information

Parallelization: Binary Tree Traversal

Parallelization: Binary Tree Traversal By Aaron Weeden and Patrick Royal Shodor Education Foundation, Inc. August 2012 Introduction: According to Moore s law, the number of transistors on a computer chip doubles roughly every two years. First

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Chapter 2 Data Storage

Chapter 2 Data Storage Chapter 2 22 CHAPTER 2. DATA STORAGE 2.1. THE MEMORY HIERARCHY 23 26 CHAPTER 2. DATA STORAGE main memory, yet is essentially random-access, with relatively small differences Figure 2.4: A typical

More information

Factoring & Primality

Factoring & Primality Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

EFFICIENT KNOWLEDGE BASE MANAGEMENT IN DCSP

EFFICIENT KNOWLEDGE BASE MANAGEMENT IN DCSP EFFICIENT KNOWLEDGE BASE MANAGEMENT IN DCSP Hong Jiang Mathematics & Computer Science Department, Benedict College, USA jiangh@benedict.edu ABSTRACT DCSP (Distributed Constraint Satisfaction Problem) has

More information

Raima Database Manager Version 14.0 In-memory Database Engine

Raima Database Manager Version 14.0 In-memory Database Engine + Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

More information

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Load balancing in a heterogeneous computer system by self-organizing Kohonen network Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Protocols for Efficient Inference Communication

Protocols for Efficient Inference Communication Protocols for Efficient Inference Communication Carl Andersen and Prithwish Basu Raytheon BBN Technologies Cambridge, MA canderse@bbncom pbasu@bbncom Basak Guler and Aylin Yener and Ebrahim Molavianjazi

More information

The English translation Of MBA Standard 0301

The English translation Of MBA Standard 0301 MBA 文 書 0603 号 MBA Document 0603 The English translation Of MBA Standard 0301 MISAUTH Protocol Specification The authoritive specification is Japansese one, MBA Standard 0203 (June 2004). The Protocol

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)

Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding) Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

WAN Wide Area Networks. Packet Switch Operation. Packet Switches. COMP476 Networked Computer Systems. WANs are made of store and forward switches.

WAN Wide Area Networks. Packet Switch Operation. Packet Switches. COMP476 Networked Computer Systems. WANs are made of store and forward switches. Routing WAN Wide Area Networks WANs are made of store and forward switches. To there and back again COMP476 Networked Computer Systems A packet switch with two types of I/O connectors: one type is used

More information

5. Binary objects labeling

5. Binary objects labeling Image Processing - Laboratory 5: Binary objects labeling 1 5. Binary objects labeling 5.1. Introduction In this laboratory an object labeling algorithm which allows you to label distinct objects from a

More information

Data Visualization in Julia

Data Visualization in Julia Introduction: Data Visualization in Julia Andres Lopez-Pineda December 13th, 211 I am Course 6 and 18, and I am primarily interested in Human Interfaces and Graphics. I took this class because I believe

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

String Search. Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan

String Search. Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan String Search Brute force Rabin-Karp Knuth-Morris-Pratt Right-Left scan String Searching Text with N characters Pattern with M characters match existence: any occurence of pattern in text? enumerate: how

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

INFO 2950 Intro to Data Science. Lecture 17: Power Laws and Big Data

INFO 2950 Intro to Data Science. Lecture 17: Power Laws and Big Data INFO 2950 Intro to Data Science Lecture 17: Power Laws and Big Data Paul Ginsparg Cornell University, Ithaca, NY 29 Oct 2013 1/25 Power Laws in log-log space y = cx k (k=1/2,1,2) log 10 y = k log 10 x

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Voronoi Treemaps in D3

Voronoi Treemaps in D3 Voronoi Treemaps in D3 Peter Henry University of Washington phenry@gmail.com Paul Vines University of Washington paul.l.vines@gmail.com ABSTRACT Voronoi treemaps are an alternative to traditional rectangular

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

A Multi-Objective Approach for the Project Allocation Problem

A Multi-Objective Approach for the Project Allocation Problem Volume 69.20, May 2013 A Multi-Objective Approach for the Project Allocation Problem Sameerchand Pudaruth University Of Port Louis, Munish Bhugowandeen University Of Quatre Bornes, Vishika Beepur University

More information

Issues in Identification and Linkage of Patient Records Across an Integrated Delivery System

Issues in Identification and Linkage of Patient Records Across an Integrated Delivery System Issues in Identification and Linkage of Patient Records Across an Integrated Delivery System Max G. Arellano, MA; Gerald I. Weber, PhD To develop successfully an integrated delivery system (IDS), it is

More information

A Comparison of General Approaches to Multiprocessor Scheduling

A Comparison of General Approaches to Multiprocessor Scheduling A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA jing@jolt.mt.att.com Michael A. Palis Department of Computer Science Rutgers University

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS International Scientific Conference & International Workshop Present Day Trends of Innovations 2012 28 th 29 th May 2012 Łomża, Poland DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS Lubos Takac 1 Michal Zabovsky

More information

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math

WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Textbook Correlation WESTMORELAND COUNTY PUBLIC SCHOOLS 2011 2012 Integrated Instructional Pacing Guide and Checklist Computer Math Following Directions Unit FIRST QUARTER AND SECOND QUARTER Logic Unit

More information

Small Maximal Independent Sets and Faster Exact Graph Coloring

Small Maximal Independent Sets and Faster Exact Graph Coloring Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected

More information

A Non-Linear Schema Theorem for Genetic Algorithms

A Non-Linear Schema Theorem for Genetic Algorithms A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case. Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity

More information

Task Description for English Slot Filling at TAC- KBP 2014

Task Description for English Slot Filling at TAC- KBP 2014 Task Description for English Slot Filling at TAC- KBP 2014 Version 1.1 of May 6 th, 2014 1. Changes 1.0 Initial release 1.1 Changed output format: added provenance for filler values (Column 6 in Table

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

Max Flow, Min Cut, and Matchings (Solution)

Max Flow, Min Cut, and Matchings (Solution) Max Flow, Min Cut, and Matchings (Solution) 1. The figure below shows a flow network on which an s-t flow is shown. The capacity of each edge appears as a label next to the edge, and the numbers in boxes

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Lecture 6 Online and streaming algorithms for clustering

Lecture 6 Online and streaming algorithms for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 6 Online and streaming algorithms for clustering 6.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

More information

The LCA Problem Revisited

The LCA Problem Revisited The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Design of LDPC codes

Design of LDPC codes Design of LDPC codes Codes from finite geometries Random codes: Determine the connections of the bipartite Tanner graph by using a (pseudo)random algorithm observing the degree distribution of the code

More information

Fast Sequential Summation Algorithms Using Augmented Data Structures

Fast Sequential Summation Algorithms Using Augmented Data Structures Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik vadim.stadnik@gmail.com Abstract This paper provides an introduction to the design of augmented data structures that offer

More information

AP Computer Science A - Syllabus Overview of AP Computer Science A Computer Facilities

AP Computer Science A - Syllabus Overview of AP Computer Science A Computer Facilities AP Computer Science A - Syllabus Overview of AP Computer Science A Computer Facilities The classroom is set up like a traditional classroom on the left side of the room. This is where I will conduct my

More information

Crawling and Detecting Community Structure in Online Social Networks using Local Information

Crawling and Detecting Community Structure in Online Social Networks using Local Information Crawling and Detecting Community Structure in Online Social Networks using Local Information TU Delft - Network Architectures and Services (NAS) 1/12 Outline In order to find communities in a graph one

More information

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results

Evaluation of a New Method for Measuring the Internet Degree Distribution: Simulation Results Evaluation of a New Method for Measuring the Internet Distribution: Simulation Results Christophe Crespelle and Fabien Tarissan LIP6 CNRS and Université Pierre et Marie Curie Paris 6 4 avenue du président

More information

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly

More information

Runtime and Implementation of Factoring Algorithms: A Comparison

Runtime and Implementation of Factoring Algorithms: A Comparison Runtime and Implementation of Factoring Algorithms: A Comparison Justin Moore CSC290 Cryptology December 20, 2003 Abstract Factoring composite numbers is not an easy task. It is classified as a hard algorithm,

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Computing Concepts with Java Essentials

Computing Concepts with Java Essentials 2008 AGI-Information Management Consultants May be used for personal purporses only or by libraries associated to dandelon.com network. Computing Concepts with Java Essentials 3rd Edition Cay Horstmann

More information

The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

More information

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples

More information

Cloud Computing is NP-Complete

Cloud Computing is NP-Complete Working Paper, February 2, 20 Joe Weinman Permalink: http://www.joeweinman.com/resources/joe_weinman_cloud_computing_is_np-complete.pdf Abstract Cloud computing is a rapidly emerging paradigm for computing,

More information

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

SIMS 255 Foundations of Software Design. Complexity and NP-completeness SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 mdw@cs.berkeley.edu 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity

More information

Introduction to IBM Watson Analytics Data Loading and Data Quality

Introduction to IBM Watson Analytics Data Loading and Data Quality Introduction to IBM Watson Analytics Data Loading and Data Quality December 16, 2014 Document version 2.0 This document applies to IBM Watson Analytics. Licensed Materials - Property of IBM Copyright IBM

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Broadcast 1997-2005Pegasus Communications, Inc,

Broadcast 1997-2005Pegasus Communications, Inc, Pegasus DataLinks is a highly developed data interface processor designed exclusively for horse racing. This interface system provides accurate, instantaneous, comprehensive wagering information for broadcast

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

KEYWORD SEARCH IN RELATIONAL DATABASES

KEYWORD SEARCH IN RELATIONAL DATABASES KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to

More information

FPGA area allocation for parallel C applications

FPGA area allocation for parallel C applications 1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University

More information

IBM SPSS Modeler Social Network Analysis 15 User Guide

IBM SPSS Modeler Social Network Analysis 15 User Guide IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Rateless Codes and Big Downloads

Rateless Codes and Big Downloads Rateless Codes and Big Downloads Petar Maymounkov and David Mazières NYU Department of Computer Science Abstract This paper presents a novel algorithm for downloading big files from multiple sources in

More information

Ratings, Audiences, & Failed Shows

Ratings, Audiences, & Failed Shows Ratings, Audiences, & Failed Shows TELEVISION RATINGS Before the Show Airs A New Season TV networks test programs Subjects view pilots or season finales for current shows in theaters. Meters record their

More information

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)? Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090

More information

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits Outline NP-completeness Examples of Easy vs. Hard problems Euler circuit vs. Hamiltonian circuit Shortest Path vs. Longest Path 2-pairs sum vs. general Subset Sum Reducing one problem to another Clique

More information

Fault-Tolerant Application Placement in Heterogeneous Cloud Environments. Bart Spinnewyn, prof. Steven Latré

Fault-Tolerant Application Placement in Heterogeneous Cloud Environments. Bart Spinnewyn, prof. Steven Latré Fault-Tolerant Application Placement in Heterogeneous Cloud Environments Bart Spinnewyn, prof. Steven Latré Cloud Application Placement Problem (CAPP) Application Placement admission control: decide on

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo

Detecting Email Spam. MGS 8040, Data Mining. Audrey Gies Matt Labbe Tatiana Restrepo Detecting Email Spam MGS 8040, Data Mining Audrey Gies Matt Labbe Tatiana Restrepo 5 December 2011 INTRODUCTION This report describes a model that may be used to improve likelihood of recognizing undesirable

More information

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

More information

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas

Software Pipelining by Modulo Scheduling. Philip Sweany University of North Texas Software Pipelining by Modulo Scheduling Philip Sweany University of North Texas Overview Opportunities for Loop Optimization Software Pipelining Modulo Scheduling Resource and Dependence Constraints Scheduling

More information

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

More information

Introduction to Computers and Programming. Testing

Introduction to Computers and Programming. Testing Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 13 April 16 2004 Testing Goals of Testing Classification Test Coverage Test Technique Blackbox vs Whitebox Real bugs and software

More information